springboot data mongodb 分页查询 10000 mongodb分页查询太慢

转载

mob6454cc6d5f87 2024-08-13 16:57:11

文章标签 mongodb 数据库数据分页分页查询 文章分类 MongoDB 数据库

MongoDB查询优化方案

最近测试提出来一个问题单，认为我们某一个接口查询的时间太慢，前端获取数据要4s左右，很影响用户体验。故，和负责对应接口的同事进行了漫长的排查、设计，现贴出心路历程。

1.当前接口设计失误

查看源码，发现条件查询和统计记录的时候用的是同一个接口，通过是否携带额外的查询条件来区分。这就导致了一个问题，条件查询的时候会自带页数和当前页，通过使用query.skip()和query.limit()来进行查询，这两个数据在统计记录的时候，默认为一页，因此，在统计数据时也会进行分页的流程。

在下面，贴出网络上查找到的有关MongoDB分页查询速度的相关资料。

官方文档对skip的描述：

The skip() method requires the server to scan from the beginning of the input results set before beginning to return results. As the offset increases, skip() will become slower.

skip()方法要求服务器在开始返回结果之前从输入结果集的开头开始扫描。随着偏移量的增加，skip()将变得更慢。

在这种情况下，进行分页查询的时候，虽然到了后面的页数查询会比较慢，但是影响不大。但是，如果是用于统计记录，将使用skip()方法逐步添加偏移量进行查询，导致统计效率极低。

针对于此，官网也给出了优化的部分方案：

Using Range Queries，使用范围查询：

function printStudents(startValue, nPerPage) {
  let endValue = null;
  db.students.find( { _id: { $lt: startValue } } )
             .sort( { _id: -1 } )
             .limit( nPerPage )
             .forEach( student => {
               print( student.name );
               endValue = student._id;
             } );

  return endValue;
}

这种方案是先使用唯一键定位，然后基于此进行降序排序，记录下本次查询到的最后一个值，下一次查询在此基础上继续查询。

这个方案也是有一定弊端，最大的问题是不可以跳表查询。在具体实现当中，也是不采用跳表查询的。eg:在主页面提供查询10页，每页20个数据，则后端查询200个数据，由前端进行切页。表面上看上去可以跳表，实际上没有跳表查询。

故，建议在查询分页数据时，性能影响不大，如需要性能优化，可在Java调用接口在非分页情况下查询MongoDB所有数据后进行分页；在统计所有数据时，慎用分页查询。

2.查询什么就返回什么，避免冗余

由于是统计数据，因此只需要返回统计的个数，不需要其余的数据结构，不用在意内部的数据是什么。
当前使用的接口返回了需要数据的很多个无用成员，虽然查询的时候对于性能影响不大，但是从后端获取到数据后需要多一层函数进行返回值的处理，在数据量很多的情况下，还是很影响这个接口的响应速度的。

3.如何选取合适的索引

一般来说，_id在MongoDB会自动生成索引，但是不提供优化效果。
在网络上查询到一个函数，find().explain()，可以用于查看查询时的相关数据，以下面返回值为例:(数据已手动脱敏)

// 1
{
    "queryPlanner": {
        "mongosPlannerVersion": NumberInt("1"),
        "winningPlan": {
            "stage": "SINGLE_SHARD",
            "shards": [
                {
                    "shardName": "shard_2",
                    "connectionString": "███",
                    "serverInfo": {
                        "host": "███",
                        "port": NumberInt("8637"),
                        "version": "4.0.3",
                        "gitVersion": "███"
                    },
                    "plannerVersion": NumberInt("1"),
                    "namespace": "███",
                    "indexFilterSet": false,
                    "parsedQuery": {
                        "id": {
                            "$eq": "███"
                        }
                    },
                    "winningPlan": {
                        "stage": "FETCH",
                        "inputStage": {
                            "stage": "IXSCAN",
                            "keyPattern": {
                                "id": NumberInt("1")
                            },
                            "indexName": "███",
                            "isMultiKey": false,
                            "multiKeyPaths": {
                                "id": [ ]
                            },
                            "isUnique": false,
                            "isSparse": false,
                            "isPartial": false,
                            "indexVersion": NumberInt("2"),
                            "direction": "forward",
                            "indexBounds": {
                                "tenantId": [
                                    "[\"███\", \"███\"]"
                                ]
                            }
                        }
                    },
                    "rejectedPlans": [ ]
                }
            ]
        }
    },
    "executionStats": {
        "nReturned": NumberInt("34212"),
        "executionTimeMillis": NumberInt("63"),
        "totalKeysExamined": NumberInt("34212"),
        "totalDocsExamined": NumberInt("34212"),
        "executionStages": {
            "stage": "SINGLE_SHARD",
            "nReturned": NumberInt("34212"),
            "executionTimeMillis": NumberInt("63"),
            "totalKeysExamined": NumberInt("34212"),
            "totalDocsExamined": NumberInt("34212"),
            "totalChildMillis": NumberLong("62"),
            "shards": [
                {
                    "shardName": "shard_2",
                    "executionSuccess": true,
                    "executionStages": {
                        "stage": "FETCH",
                        "nReturned": NumberInt("34212"),
                        "executionTimeMillisEstimate": NumberInt("60"),
                        "works": NumberInt("34213"),
                        "advanced": NumberInt("34212"),
                        "needTime": NumberInt("0"),
                        "needYield": NumberInt("0"),
                        "saveState": NumberInt("267"),
                        "restoreState": NumberInt("267"),
                        "isEOF": NumberInt("1"),
                        "invalidates": NumberInt("0"),
                        "docsExamined": NumberInt("34212"),
                        "alreadyHasObj": NumberInt("0"),
                        "inputStage": {
                            "stage": "IXSCAN",
                            "nReturned": NumberInt("34212"),
                            "executionTimeMillisEstimate": NumberInt("40"),
                            "works": NumberInt("34213"),
                            "advanced": NumberInt("34212"),
                            "needTime": NumberInt("0"),
                            "needYield": NumberInt("0"),
                            "saveState": NumberInt("267"),
                            "restoreState": NumberInt("267"),
                            "isEOF": NumberInt("1"),
                            "invalidates": NumberInt("0"),
                            "keyPattern": {
                                "tenantId": NumberInt("1")
                            },
                            "indexName": "███",
                            "isMultiKey": false,
                            "multiKeyPaths": {
                                "tenantId": [ ]
                            },
                            "isUnique": false,
                            "isSparse": false,
                            "isPartial": false,
                            "indexVersion": NumberInt("2"),
                            "direction": "forward",
                            "indexBounds": {
                                "tenantId": [
                                    "[\"███\", \"███\"]"
                                ]
                            },
                            "keysExamined": NumberInt("34212"),
                            "seeks": NumberInt("1"),
                            "dupsTested": NumberInt("0"),
                            "dupsDropped": NumberInt("0"),
                            "seenInvalidated": NumberInt("0")
                        }
                    }
                }
            ]
        }
    },
    "ok": 1,
    "operationTime": Timestamp(1694431036, 1),
    "$clusterTime": {
        "clusterTime": Timestamp(1694431040, 1),
        "signature": {
            "hash": BinData(0, "qKKYO9SHWsYX9X7RIU+fbptoQK4="),
            "keyId": NumberLong("7229581633173585921")
        }
    }
}

可以看到，上述返回值中主要给出了查询全阶段流程中的查询方式。
在其中主要注意以下几个值：

"nReturned": NumberInt("34212"),
"totalKeysExamined": NumberInt("34212"),
"totalDocsExamined": NumberInt("34212")

官方文档给出以下解释：
explain.executionStats.nReturned
Number of documents that match the query condition.
符合查询条件的文档个数

explain.executionStats.totalKeysExamined：Number of index entries scanned.
扫描的索引项数。

explain.executionStats.totalDocsExamined：Number of documents examined during query execution.
查询执行期间检查的文档数。

因此，最理想的状态是nReturned=totalKeysExamined=totalDocsExamined。在这种情况下，index和doc文档都没有多余的扫描。
在sort()之后，可能存在totalKeysExamined>nRetured和totalDocsExamined的情况，但是因为数据量较大时，排序需要耗费大量性能，故，暂不考虑。

除此之外，反馈查询时间的主要参数：

explain.executionStats.executionTimeMillis
Total time in milliseconds required for query plan selection and query execution.
查询计划选择和查询执行所需的总时间(以毫秒计)。

它的重要性可想而知。因此我们着重关注以上几个参数进行比较。

无索引情况：

"executionTimeMillis": NumberInt("80"),
"nReturned": NumberInt("34212"),
"totalKeysExamined": NumberInt("0"),
"totalDocsExamined": NumberInt("34212")

单索引情况：

stage: SINGLE_SHARD-> FETCH-> IXSCAN
单分片-> 文档检索 ->检索索引

"executionTimeMillis": NumberInt("63"),
"totalKeysExamined": NumberInt("34212"),
"totalDocsExamined": NumberInt("34212"),
        "inputStage": {
          "executionTimeMillisEstimate": NumberInt("40"),
        }

针对可能出现的参数，进行复合索引情况（因为MongoDB满足最左匹配原则，所以只要进行查询，肯定至少可以匹配到id的索引）
stage: SINGLE_SHARD-> FETCH-> IXSCAN
单分片-> 文档检索 ->检索索引

"executionTimeMillis": NumberInt("65"),
"nReturned": NumberInt("34212"),
"totalKeysExamined": NumberInt("34212"),
"totalDocsExamined": NumberInt("34212"),
        "inputStage": {
          "executionTimeMillisEstimate": NumberInt("10"),
        }

可以看出复合索引和单索引的总体性能差不多。
在复合索引中，在IXSCAN这个子模块内，索引检索的比单索引快很多（原因未知，欢迎交流），但是总体性能差距不大。
并且，创建单列索引在100ms左右，创建复合索引在200ms左右，因此最后还是采用单列索引。

经优化后，接口响应速度在800ms以内，考虑到数据量和原本4s的速度，已经优化很多，当然，还可以继续优化。