随着大数据海量数据的不断涌现,分布式,横向扩展是系统扩展的重要方式之一。基于文档的NoSQL领头羊mongoDB正是这样一个分布式系统,通过分片集群将所有数据分成数据段,并将每个数据段写入不同的节点。本文简要描述mongoDB分片特性,以及给出演示,快速体验mongoDB分片方式。

一、为什么需要shard

存储容量需求超出单机磁盘容量
    活跃的数据集超出单机内存容量,导致很多请求都要从磁盘读取数据,影响性能
    写IOPS超出单个mongoDB节点的写服务能力
    mongoDB支持自动分片以及手动分片,分片的基本单位是集合

二、Shared cluster架构

Mongos
            客户端访问路由节点,mongos进行数据读写
    Config Server
            保存元数据以及集群配置信息
    Shard Server
            每一个shard包含特定集合数据的一部分,且shard可以配置为复制集

三、什么是主分片

主分片用于存储所有未开启分片集合的数据
    每一个数据库都有一个主分片
    通过movePrimary命令改变主分片
    基于已经使用了复制集的环境,在开启一个分片集群的情形下,已经存在的数据库依旧位于原有的分片
    可以创建指向单个分片的片键

四、演示mongodb分片

//演示环境,在同一台机器上使用不同的端口来实现
//包含2个分片节点,端口号(28000,28001),一个config server,端口号(29000)
//本演示未涉及到副本集
# more /etc/redhat-release 
CentOS release 6.7 (Final)

# mongod --version
db version v3.0.12
git version: 33934938e0e95d534cebbaff656cde916b9c3573

//创建相应目录
# mkdir -pv /data/{s1,s2,con1}

//以下分别启动2个mongod分片实例
# mongod --shardsvr --dbpath  /data/s1 --port 28000 --logpath /data/s1/s1.log --smallfiles --oplogSize 128 --fork
# mongod --shardsvr --dbpath  /data/s2 --port 28001 --logpath /data/s2/s2.log --smallfiles --oplogSize 128 --fork

//以下分别启动config server以及mongos
# mongod --configsvr --dbpath  /data/con1 --port 29000 --logpath  /data/con1/config.log --fork 
# mongos --configdb localhost:29000 --logpath  /var/log/mongos.log --fork 

//登陆到mongo客户端
# mongo
MongoDB shell version: 3.0.12
connecting to: test
mongos> show dbs;
admin   (empty)
config  0.016GB

//添加分片
mongos> sh.addShard("localhost:28000") 
{ "shardAdded" : "shard0000", "ok" : 1 }
mongos> sh.addShard("localhost:28001")
{ "shardAdded" : "shard0001", "ok" : 1 }

//插入文档
mongos> db
test
mongos> for (i=0;i<1000000;i++){
    db.users.insert(
        {
                i:i,
                username:"user"+i,
                age:Math.floor(Math.random()*120),
                created:new Date()
        }
    );
}
WriteResult({ "nInserted" : 1 })

//查看分片状态
mongos> sh.status()
--- Sharding Status --- 
  sharding version: {
        "_id" : 1,
        "minCompatibleVersion" : 5,
        "currentVersion" : 6,
        "clusterId" : ObjectId("57c64b0e0ea3f71b79b3fb8e")
}
  shards:
        {  "_id" : "shard0000",  "host" : "localhost:28000" }
        {  "_id" : "shard0001",  "host" : "localhost:28001" }
  balancer:
        Currently enabled:  yes
        Currently running:  no
        Failed balancer rounds in last 5 attempts:  0
        Migration Results for the last 24 hours: 
                No recent migrations
  databases:
        {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
        {  "_id" : "test",  "partitioned" : false,  "primary" : "shard0000" }

//通过sh.status()可知,当前的集群环境有2个分片为shard0000,shard0001,在当前主机的28000以及28001端口
//当前有2个数据库,一个是admin,一个test数据库
//可以看到当前我们刚刚创建的集合users位于test库,shard0000,即当前shard0000为主分片

//要对一个集合分片,需要先开启库级分片,如下
mongos> sh.enableSharding("test")
{ "ok" : 1 }

mongos> sh.status()
--- Sharding Status --- 
  sharding version: {
       .........
  databases:
        {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
        {  "_id" : "test",  "partitioned" : true,  "primary" : "shard0000" }

//在上面的查询中可以看出test数据库partitioned的value已经变成了true        

//为集合users添加索引,sharding会根据索引将其数据分布到不同的片上,所以索引是必须的
//这个sharding与RDBMS的分区表的实质一样
//RDBMS的分区表是将数据分布在单台机器的多个磁盘上
//MongoDB则是将数据分布在不同机器的不同磁盘上
mongos> db.users.ensureIndex({username:1})
{
        "raw" : {
                "localhost:28000" : {
                        "createdCollectionAutomatically" : false,
                        "numIndexesBefore" : 1,
                        "numIndexesAfter" : 2,
                        "ok" : 1
                }
        },
        "ok" : 1
}

//查看已经创建的索引
mongos> db.users.getIndexes()
[
        {
                "v" : 1,
                "key" : {
                        "_id" : 1
                },
                "name" : "_id_",
                "ns" : "test.users"
        },
        {
                "v" : 1,
                "key" : {
                        "username" : 1
                },
                "name" : "username_1",
                "ns" : "test.users"
        }
]

//开启集合分片
mongos> sh.shardCollection("test.users",{"username":1})
{ "collectionsharded" : "test.users", "ok" : 1 }

//查看集合的状态
mongos> sh.status()
--- Sharding Status --- 
  sharding version: {
        "_id" : 1,
        "minCompatibleVersion" : 5,
        "currentVersion" : 6,
        "clusterId" : ObjectId("57c689d2425d38d84702dbf4")
}
  shards:
        {  "_id" : "shard0000",  "host" : "localhost:28000" }
        {  "_id" : "shard0001",  "host" : "localhost:28001" }
  balancer:
        Currently enabled:  yes
        Currently running:  no
        Failed balancer rounds in last 5 attempts:  0
        Migration Results for the last 24 hours: 
                10 : Success
                1 : Failed with error 'chunk too big to move', from shard0000 to shard0001
  databases:
        {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
        {  "_id" : "test",  "partitioned" : true,  "primary" : "shard0000" }
                test.users
                        shard key: { "username" : 1 }
                        chunks:
                                shard0000       4
                                shard0001       3
                        { "username" : { "$minKey" : 1 } } -->> { "username" : "user167405" } on : shard0001 Timestamp(2, 0) 
                        { "username" : "user167405" } -->> { "username" : "user234814" } on : shard0001 Timestamp(3, 0) 
                        { "username" : "user234814" } -->> { "username" : "user302222" } on : shard0001 Timestamp(4, 0) 
                        { "username" : "user302222" } -->> { "username" : "user369631" } on : shard0000 Timestamp(4, 1) 
                        { "username" : "user369631" } -->> { "username" : "user639266" } on : shard0000 Timestamp(1, 1) 
                        { "username" : "user639266" } -->> { "username" : "user908900" } on : shard0000 Timestamp(1, 2) 
                        { "username" : "user908900" } -->> { "username" : { "$maxKey" : 1 } } on : shard0000 Timestamp(1, 3) 
//从上面的结果可知,基于username分片,分成了7个片,或者说7个chunk,其中shard0000 4个,shard0001 3个
//$minKey,$maxKey表示最小值或最大值,也可以理解为负无穷和正无穷。
//$minKey到user167405位于shard0001,user167405到user234814位于shard0001,其余依此类推

//查看集合的状态
mongos> db.users.stats()
{
        "sharded" : true,
        "paddingFactorNote" : "paddingFactor is unused and unmaintained in 3.0.
         It remains hard coded to 1.0 for compatibility only.",
        "userFlags" : 1,
        "capped" : false,        //Author : Leshami
        "ns" : "test.users",     //Blog   : 
        "count" : 1000000,
        "numExtents" : 20,
        "size" : 112000000,
        "storageSize" : 212533248,
        "totalIndexSize" : 66863328,
        "indexSizes" : {
                "_id_" : 32916576,
                "username_1" : 33946752
        },
        "avgObjSize" : 112,
        "nindexes" : 2,
        "nchunks" : 7,
        "shards" : {
                "shard0000" : {
                        "ns" : "test.users",
                        "count" : 775304,
                        "size" : 86834048,
                        "avgObjSize" : 112,
                        "numExtents" : 12,
                        "storageSize" : 174735360,
                        "lastExtentSize" : 50798592,
                        "paddingFactor" : 1,
                        "paddingFactorNote" : "paddingFactor is unused and unmaintained in 3.0. 
                        It remains hard coded to 1.0 for compatibility only.",
                        "userFlags" : 1,
                        "capped" : false,
                        "nindexes" : 2,
                        "totalIndexSize" : 47020176,
                        "indexSizes" : {
                                "_id_" : 25214784,
                                "username_1" : 21805392
                        },
                        "ok" : 1
                },
                "shard0001" : {
                        "ns" : "test.users",
                        "count" : 224696,
                        "size" : 25165952,
                        "avgObjSize" : 112,
                        "numExtents" : 8,
                        "storageSize" : 37797888,
                        "lastExtentSize" : 15290368,
                        "paddingFactor" : 1,
                        "paddingFactorNote" : "paddingFactor is unused and unmaintained in 3.0. 
                        It remains hard coded to 1.0 for compatibility only.",
                        "userFlags" : 1,
                        "capped" : false,
                        "nindexes" : 2,
                        "totalIndexSize" : 19843152,
                        "indexSizes" : {
                                "_id_" : 7701792,
                                "username_1" : 12141360
                        },
                        "ok" : 1
                }
        },
        "ok" : 1
}
//上面的结果中可以看到不同shard上对应的记录数,如shard0000为775304,shard0001为224696
//也可以看到很对与当前集合相关的信息,如大小,平均队列长度,索引等等

//下面测试文档查找
mongos> db.users.find({username:"user1000"},{_id:0})
{ "i" : 1000, "username" : "user1000", "age" : 87, "created" : ISODate("2016-08-31T08:31:28.233Z") }
mongos> db.users.find({username:"user639266"},{_id:0})
{ "i" : 639266, "username" : "user639266", "age" : 6, "created" : ISODate("2016-08-31T08:38:30.122Z") }
mongos> db.users.count()
1000000

//以下查看执行计划
//从执行计划可知,当前查询到了分片shard0001,即实现了基于分片的隔离
mongos> db.users.find({username:"user1000"}).explain()
{
        "queryPlanner" : {
                "mongosPlannerVersion" : 1,
                "winningPlan" : {
                        "stage" : "SINGLE_SHARD",
                        "shards" : [
                                {
                                        "shardName" : "shard0001",
                                        "connectionString" : "localhost:28001",
                                        "serverInfo" : {
                                                "host" : "node1.edq.com",
                                                "port" : 28001,
                                                "version" : "3.0.12",
                                                "gitVersion" : "33934938e0e95d534cebbaff656cde916b9c3573"
                                        },
                                        "plannerVersion" : 1,
                                        "namespace" : "test.users",
                                        "indexFilterSet" : false,
                                        "parsedQuery" : {
                                                "username" : {
                                                        "$eq" : "user1000"
                                                }
                                        },
                                        "winningPlan" : {
                                                "stage" : "FETCH",
                                                "inputStage" : {
                                                        "stage" : "SHARDING_FILTER",
                                                        "inputStage" : {
                                                                "stage" : "IXSCAN",
                                                                "keyPattern" : {
                                                                        "username" : 1
                                                                },
                                                                "indexName" : "username_1",
                                                                "isMultiKey" : false,
                                                                "direction" : "forward",
                                                                "indexBounds" : {
                                                                        "username" : [
                                                                                "[\"user1000\", \"user1000\"]"
                                                                        ]
                                                                }
                                                        }
                                                }
                                        },
                                        "rejectedPlans" : [ ]
                                }
                        ]
                }
        },
        "ok" : 1
}

五、小结

1、mongodb分片的实质是将数据分散到不同的物理机器,以分散IO,提供并发与吞吐量
2、mongodb分片依赖于片键,即任意一个需要开启的集合都需要创建索引
3、开启分片的集合需要首先在DB级别启用库级分片
4、mongodb的分片由分片服务器,配置服务器以及路由服务器组成
5、基于分片可以结合副本集(replicate set)来实现高可用