mongodb shell 统计相关命令

1 count

db.tbPosition.find().count();  #集合的总记录数

db.tbPosition.find({Acnt_Id:437}).count()  #符合条件的记录总数

db.tbPosition.count({Acnt_Id:407});

2 distinct

db.tbPosition.distinct('Acnt_Id');  #以集合的方式返回不重复Acnt_Id的总记录数

3 group

db.tbPosition.group({'key':{'Acnt_Id':1},'$reduce':function(doc,prev)


{


    if(doc._id>prev._id)


        prev._id= doc._id


    },


    'initial':{'_id':1}


});

#类似 select Acnt_Id,count(1) from tbPosition group by Acnt_Id

db.runCommand({'group':{'ns':'tbPosition','key':{'Acnt_Id':1},'$reduce':function(doc,prev)


{


    if(doc._id>prev._id)


        prev._id= doc._id


    },


    'initial':{'_id':1}


}


});



db.tbPosition.group({'key':{'Acnt_Id':1},'$reduce':function(doc,prev)


{


    if(doc._id>prev._id)


        prev._id= doc._id


    },


    'initial':{'_id':1},


    'condition':{'Acnt_Id':{$lt:100000,$gt:500}}


});


#类似 select Acnt_Id,count(1) from tbPosition  where Acnt_Id<10000 and Acnt_Id>500 group by Acnt_Id


db.tbPosition.group({'key':{'Acnt_Id':1},'$reduce':function(doc,prev)


{


    if(doc._id>prev._id)


        prev._id= doc._id


    },


    'initial':{'_id':1},


    'condition':{'Acnt_Id':{$lt:100000,$gt:100}},


    'finalize':function(prev)


    {


        prev._id=''+parseInt(prev._id)


        }


});

#finalize: 可以对输出的结果修正

4 mapReduce

db.tbPosition.mapReduce(function(){var key=this.Acnt_Id;emit(key,{count:1})},


function(key,emits)


{


    total=0;


    for(var i in emits)


    {


        total+=emits[i].count;


        }


        return {"count":total}


    },


    


    {out:'mr'},


    {query:{'Acnt_Id':{$lt:100000,$gt:100}}}


);


#结果输出到了mr集合中,可在集合mr中看结果


db.mr.find();



5 aggregate

db.tbJobPosition.aggregate([


    {$match:{'Acnt_Id':{$lt:100000,$gt:100}}},


    {$group:{_id:"$Acnt_Id",sum:{'$sum':"$_id"}}}


    ]);

$project: 


用于选择从收集的一些具体字段。 数据投影,主要用于重命名、增加和删除字段 如:



db.article.aggregate(


{ $project : {


title : 1 ,


author : 1 ,


}}


);


这样的话结果中就只还有_id,tilte和author三个字段了,默认情况下_id字段是被包含的,如果要想不包含_id话可以这样:


db.article.aggregate(


{ $project : {


_id : 0 ,


title : 1 ,


author : 1


}});


也可以在$project内使用算术类型表达式操作符,例如:


db.article.aggregate(


{ $project : {


title : 1,


doctoredPageViews : { $add:["$pageViews", 10] }


}});


通过使用$add给pageViews字段的值加10,然后将结果赋值给一个新的字段:doctoredPageViews


注:必须将$add计算表达式放到中括号里面


除此之外使用$project还可以重命名字段名和子文档的字段名:


db.article.aggregate(


{ $project : {


title : 1 ,


page_views : "$pageViews" ,


bar : "$other.foo"


}});



也可以添加子文档:


db.article.aggregate(


{ $project : {


title : 1 ,


stats : {


pv : "$pageViews",


foo : "$other.foo",


dpv : { $add:["$pageViews", 10] }


}


}});



产生了一个子文档stats,里面包含pv,foo,dpv三个字段。


$match:

        这是一个滤波操作,因此可以减少量,作为下一阶段的输入给定的文档。相当于 where


$match的语法和查询表达式(db.collection.find())的语法相同



db.articles.aggregate( [


{ $match : { score : { $gt : 70, $lte : 90 } } },


{ $group: { _id: null, count: { $sum: 1 } } }


] );



$match用于获取分数大于70小于或等于90记录,然后将符合条件的记录送到下一阶段$group管道操作符进行处理。



注意:1.不能在$match操作符中使用$where表达式操作符。


            2.$match尽量出现在管道的前面,这样可以提早过滤文档,加快聚合速度。


            3.如果$match出现在最前面的话,可以使用索引来加快查询。

 


$group: 

        _id必须的,"$Acnt_Id"是指定的统计字段 #sum 随便定的输出字段名 $sum 求和 “$_id"求和的字段
$group的时候必须要指定一个_id域,同时也可以包含一些算术类型的表达式操作符:
db.article.aggregate(
{ $group : {
_id : "$author",
docsPerAuthor : { $sum : 1 },
viewsPerAuthor : { $sum : "$pageViews" }
}});

注意:  1.$group的输出是无序的。
          2.$group操作目前是在内存中进行的,所以不能用它来对大量个数的文档进行分组。 

$sort: 


         db.users.aggregate( { $sort : { age : -1, posts: 1 } });


按照年龄进行降序操作,按照posts进行升序操作


注意:1.如果将$sort放到管道前面的话可以利用索引,提高效率


          2.MongoDB 24.对内存做了优化,在管道中如果$sort出现在$limit之前的话,$sort只会对前$limit个文档进行操作,这样在内存中也只会保留前$limit个文档,从而可以极大的节省内存


          3.$sort操作是在内存中进行的,如果其占有的内存超过物理内存的10%,程序会产生错误

 


$skip: 

        与此有可能向前跳过的文件列表中的一个给定的的文档数量。
$skip参数也只能为一个正整数

db.article.aggregate(
{ $skip : 5 });

经过$skip管道操作符处理后,前五个文档被“过滤”掉

$limit: 

        这限制了的文档数量看一下由从当前位置开始的给定数
$limit的参数只能是一个正整数
db.article.aggregate(
{ $limit : 5 });

这样的话经过$limit管道操作符处理后,管道内就只剩下前5个文档了

$unwind:


         这是用来平仓文档的中使用数组。使用数组时,数据是一种pre-joinded,再次有个别文件,此操作将被取消。因此,这个阶段,数量会增加文件的下一阶段。


例如:article文档中有一个名字为tags数组字段:



> db.article.find() 


 { "_id" : ObjectId("528751b0e7f3eea3d1412ce2"),


"author" : "Jone", "title" : "Abook",


"tags" : [  "good",  "fun",  "good" ] }



使用$unwind操作符后:


> db.article.aggregate({$project:{author:1,title:1,tags:1}},{$unwind:"$tags"}) 




"result" : [ 




"_id" : ObjectId("528751b0e7f3eea3d1412ce2"), 


"author" : "Jone", 


"title" : "A book", 


"tags" : "good" 


}, 




"_id" : ObjectId("528751b0e7f3eea3d1412ce2"), 


"author" : "Jone", 


"title" : "A book", 


"tags" : "fun" 


}, 




"_id" : ObjectId("528751b0e7f3eea3d1412ce2"), 


"author" : "Jone", 


"title" : "A book", 


"tags" : "good" 




], 


"ok" : 1 


}



注意:a.{$unwind:"$tags"})不要忘了$符号



b.如果$unwind目标字段不存在的话,那么该文档将被忽略过滤掉,例如:



> db.article.aggregate({$project:{author:1,title:1,tags:1}},{$unwind:"$tag"}) 


{ "result" : [ ], "ok" : 1 } 


将$tags改为$tag因不存在该字段,该文档被忽略,输出的结果为空



c.如果$unwind目标字段不是一个数组的话,将会产生错误,例如:


 > db.article.aggregate({$project:{author:1,title:1,tags:1}},{$unwind:"$title"})



Error: Printing Stack Trace 


at printStackTrace (src/mongo/shell/utils.js:37:15) 


at DBCollection.aggregate (src/mongo/shell/collection.js:897:9) 


at (shell):1:12 


Sat Nov 16 19:16:54.488 JavaScript execution failed: aggregate failed: { 


"errmsg" : "exception: $unwind:  value at end of field path must be an array", 


"code" : 15978, 


"ok" : 0 


} at src/mongo/shell/collection.js:L898



   d.如果$unwind目标字段数组为空的话,该文档也将会被忽略。

$goNear

    $goNear会返回一些坐标值,这些值以按照距离指定点距离由近到远进行排序

具体使用参数见下表:

Field

Type

Description

near

GeoJSON point or​​legacy coordinate pairs​

The point for which to find the closest documents.

distanceField

string

The output field that contains the calculated distance. To specify a field within a subdocument, use ​​dot notation​​.

limit

number

Optional. The maximum number of documents to return. The default value is 100. See also the num option.

num

number

Optional. The num option provides the same function as the limitoption. Both define the maximum number of documents to return. If both options are included, the num value overrides the limit value.

maxDistance

number

Optional. A distance from the center point. Specify the distance in radians. MongoDB limits the results to those documents that fall within the specified distance from the center point.

query

document

Optional. Limits the results to the documents that match the query. The query syntax is the usual MongoDB ​​read operation query​​ syntax.

spherical

Boolean

Optional. If true, MongoDB references points using a spherical surface. The default value is false.

distanceMultiplier

number

Optional. The factor to multiply all distances returned by the query. For example, use the distanceMultiplier to convert radians, as returned by a spherical query, to kilometers by multiplying by the radius of the Earth.

includeLocs

string

Optional. This specifies the output field that identifies the location used to calculate the distance. This option is useful when a location field contains multiple locations. To specify a field within a subdocument, use​​dot notation​​.

uniqueDocs

Boolean

Optional. If this value is true, the query returns a matching document once, even if more than one of the document’s location fields match the query. If this value is false, the query returns a document multiple times if the document has multiple matching location fields. See ​​$uniqueDocs​​for more information.

例如:db.places.aggregate([


{$geoNear: {


near: [40.724, -73.997],


distanceField: "dist.calculated",


maxDistance: 0.008,


query: { type: "public" },


includeLocs: "dist.location",


uniqueDocs: true,


num: 5


}


}


])


其结果为:


{


"result" : [


{ "_id" : 7,


"name" : "Washington Square",


"type" : "public",


"location" : [


[ 40.731, -73.999 ],


[ 40.732, -73.998 ],


[ 40.730, -73.995 ],


[ 40.729, -73.996 ]


],


"dist" : {


"calculated" : 0.0050990195135962296,


"location" : [ 40.729, -73.996 ]


}


},


{ "_id" : 8,"name" : "Sara D. Roosevelt Park",


"type" : "public",


"location" : [


[ 40.723, -73.991 ],


[ 40.723, -73.990 ],


[ 40.715, -73.994 ],


[ 40.715, -73.994 ]


],


"dist" : {


"calculated" : 0.006082762530298062,


"location" : [ 40.723, -73.991 ]


}


}


],


"ok" : 1}


其中,dist.calculated中包含了计算的结果,而dist.location中包含了计算距离时实际用到的坐标



注意: 1.使用$goNear只能在管道处理的开始第一个阶段进行


      2.必须指定distanceField,该字段用来决定是否包含距离字段


3.$gonNear和geoNear命令比较相似,但是也有一些不同:distanceField在$geoNear中是必选的,而在

​geoNear​​中是可选的;includeLocs在$geoNear中是string类型,而在geoNear中是boolen类型。


============


SQL Terms MongoDB Aggregation Operators


WHERE $match


GROUP BY $group


HAVING $match


SELECT $project


ORDER BY $sort


LIMIT $limit


SUM() $sum


COUNT() $sum



实例:



SQL Example

MongoDB Example

SELECT COUNT(*) AS count FROM mycol

db.mycol.aggregate( [{ $group: { _id: null,count: { $sum: 1 } } }] )

SELECT SUM(price) AS total FROM mycol

db.mycol.aggregate( [{ $group: { _id: null,total: { $sum: "$price" } } }] )

SELECT cust_id, SUM(price) AS totalFROM mycolGROUP BY cust_id

db.mycol.aggregate( [{ $group: { _id: "$cust_id",total: { $sum: "$price" } } }] ) #_ID是必须,但可以为NULL,

SELECT cust_id,SUM(price) AS total FROM mycol GROUP BY cust_id ORDER BY total

db.mycol.aggregate( [{ $group: { _id: "$cust_id",total: { $sum: "$price" } } },{ $sort: { total: 1 } }] )

SELECT cust_id,ord_date, SUM(price) AS total FROM mycol GROUP BY cust_id, ord_date

db.mycol.aggregate( [{ $group: { _id: { cust_id: "$cust_id", ord_date: "$ord_date" },total: { $sum: "$price" } } }] )

SELECT cust_id, count(*)FROM mycol GROUP BY cust_id HAVING count(*) > 1

db.mycol.aggregate( [{ $group: { _id: "$cust_id",count: { $sum: 1 } } },{ $match: { count: { $gt: 1 } } }] )

SELECT cust_id,ord_date,SUM(price) AS total FROM mycol GROUP BY cust_id, ord_date HAVING total > 250

db.mycol.aggregate( [{ $group: { _id: { cust_id: "$cust_id", ord_date: "$ord_date" },total: { $sum: "$price" } } },{ $match: { total: { $gt: 250 } } }] )

SELECT cust_id, SUM(price) as total FROM mycol WHERE status = 'A' GROUP BY cust_id

db.mycol.aggregate( [{ $match: { status: 'A' } },{ $group: { _id: "$cust_id",total: { $sum: "$price" } } }] )

SELECT cust_id,SUM(price) as total FROM mycol WHERE status = 'A' GROUP BY cust_id HAVING total > 250

db.mycol.aggregate( [{ $match: { status: 'A' } },{ $group: { _id: "$cust_id",total: { $sum: "$price" } } },{ $match: { total: { $gt: 250 } } }] )

SELECT cust_id,SUM(li.qty) as qty FROM mycol o, order_lineitem li WHERE li.order_id = o.id GROUP BY cust_id

db.mycol.aggregate( [{ $unwind: "$items" },{ $group: { _id: "$cust_id",qty: { $sum: "$items.qty" } } }] )

SELECT COUNT(*) FROM (SELECT cust_id, ord_date FROM mycol GROUP BY cust_id, ord_date) as DerivedTable

db.mycol.aggregate( [{ $group: { _id: { cust_id: "$cust_id", ord_date: "$ord_date" } } },{ $group: { _id: null, count: { $sum: 1 } } }])


$sum 总结从集合中的所有文件所定义的值. db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$sum : "$likes"}}}])


$avg 从所有文档集合中所有给定值计算的平均. db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$avg : "$likes"}}}])


$min 获取集合中的所有文件中的相应值最小. db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$min : "$likes"}}}])


$max 获取集合中的所有文件中的相应值的最大. db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$max : "$likes"}}}])


$push 值插入到一个数组生成文档中. db.mycol.aggregate([{$group : {_id : "$by_user", url : {$push: "$url"}}}])


$addToSet 值插入到一个数组中所得到的文档,但不会创建重复. db.mycol.aggregate([{$group : {_id : "$by_user", url : {$addToSet : "$url"}}}])


$first 根据分组从源文档中获取的第一个文档。通常情况下,这才有意义,连同以前的一些应用 “$sort”-stage. db.mycol.aggregate([{$group : {_id : "$by_user", first_url : {$first : "$url"}}}])


$last 根据分组从源文档中获取最后的文档。通常,这才有意义,连同以前的一些应用 “$sort”-stage. db.mycol.aggregate([{$group : {_id : "$by_user", last_url : {$last : "$url"}}}])

聚合操作符
比较类型聚合操作符

Name

Description

​$cmp​

Compares two values and returns the result of the comparison as an integer.

​$eq​

Takes two values and returns true if the values are equivalent.

​$gt​

Takes two values and returns true if the first is larger than the second.

​$gte​

Takes two values and returns true if the first is larger than or equal to the second.

​$lt​

Takes two values and returns true if the second value is larger than the first.

​$lte​

Takes two values and returns true if the second value is larger than or equal to the first.

​$ne​

Takes two values and returns true if the values are not equivalent.

算术类型聚合操作符

Name

Description

​$add​

Computes the sum of an array of numbers.

​$divide​

Takes two numbers and divides the first number by the second.

​$mod​

Takes two numbers and calcualtes the modulo of the first number divided by the second.

​$multiply​

Computes the product of an array of numbers.

​$subtract​

Takes two numbers and subtracts the second number from the first.

字符串类型聚合操作符

Name

Description

​$concat​

Concatenates two strings.

​$strcasecmp​

Compares two strings and returns an integer that reflects the comparison.

​$substr​

Takes a string and returns portion of that string.

​$toLower​

Converts a string to lowercase.

​$toUpper​

Converts a string to uppercase.

日期类型聚合操作符

Name

Description

​$dayOfYear​

Converts a date to a number between 1 and 366.

​$dayOfMonth​

Converts a date to a number between 1 and 31.

​$dayOfWeek​

Converts a date to a number between 1 and 7.

​$year​

Converts a date to the full year.

​$month​

Converts a date into a number between 1 and 12.

​$week​

Converts a date into a number between 0 and 53

​$hour​

Converts a date into a number between 0 and 23.

​$minute​

Converts a date into a number between 0 and 59.

​$second​

Converts a date into a number between 0 and 59. May be 60 to account for leap seconds.

​$millisecond​

Returns the millisecond portion of a date as an integer between 0 and 999.

条件类型聚合操作符

Name

Description

​$cond​

A ternary operator that evaluates one expression, and depending on the result returns the value of one following expressions.

​$ifNull​

Evaluates an expression and returns a value.

注:以上操作符都必须在管道操作符的表达式内来使用。各个表达式操作符的具体使用方式参见:

​http://docs.mongodb.org/manual/reference/operator/aggregation-group/​

聚合管道的优化

1.$sort  +  $skip  +  $limit顺序优化
如果在执行管道聚合时,如果$sort、$skip、$limit依次出现的话,例如:

{ $sort: { age : -1 } },
{ $skip: 10 },
{ $limit: 5 }

那么实际执行的顺序为:

{ $sort: { age : -1 } },
{ $limit: 15 },
{ $skip: 10 }

$limit会提前到$skip前面去执行。

此时$limit = 优化前$skip+优化前$limit

这样做的好处有两个:1.在经过$limit管道后,管道内的文档数量个数会“提前”减小,这样会节省内存,提高内存利用效率。2.$limit提前后,$sort紧邻$limit这样的话,当进行$sort的时候当得到前“$limit”个文档的时候就会停止。

2.$limit + $skip + $limit + $skip Sequence Optimization

如果聚合管道内反复出现下面的聚合序列:

{ $limit: 100 },
{ $skip: 5 },
{ $limit: 10},
{ $skip: 2 }

首先进行局部优化为:可以按照上面所讲的先将第二个$limit提前:

{ $limit: 100 },
{ $limit: 15},
{ $skip: 5 },
{ $skip: 2 }

进一步优化:两个$limit可以直接取最小值 ,两个$skip可以直接相加:

{ $limit: 15 },
{ $skip: 7 }

3.Projection Optimization

过早的使用$project投影,设置需要使用的字段,去掉不用的字段,可以大大减少内存。除此之外也可以过早使用我们也应该过早使用$match、$limit、$skip操作符,他们可以提前减少管道内文档数量,减少内存占用,提供聚合效率。除此之外,$match尽量放到聚合的第一个阶段,如果这样的话$match相当于一个按条件查询的语句,这样的话可以使用索引,加快查询效率。

聚合管道的限制

1.类型限制
在管道内不能操作 Symbol, MinKey, MaxKey, DBRef, Code, CodeWScope类型的数据( 2.4版本解除了对二进制数据的限制).

2.结果大小限制
管道线的输出结果不能超过BSON 文档的大小(16M),如果超出的话会产生错误.

3.内存限制
如果一个管道操作符在执行的过程中所占有的内存超过系统内存容量的10%的时候,会产生一个错误。当$sort和$group操作符执行的时候,整个输入都会被加载到内存中,如果这些占有内存超过系统内存的%5的时候,会将一个warning记录到日志文件。同样,所占有的内存超过系统内存容量的10%的时候,会产生一个错误。

分片上使用聚合管道
聚合管道支持在已分片的集合上进行聚合操作。当分片集合上进行聚合操纵的时候,聚合管道被分为两成两个部分,分别在mongod实例和mongos上进行操作。

聚合管道使用
首先下载测试数据:http://media.mongodb.org/zips.json 并导入到数据库中。

1.查询各州的人口数

var connectionString = ConfigurationManager.AppSettings["MongodbConnection"];
var client = new MongoClient(connectionString);
var DatabaseName = ConfigurationManager.AppSettings["DatabaseName"];
string collName = ConfigurationManager.AppSettings["collName"];
MongoServer mongoDBConn = client.GetServer();
MongoDatabase db = mongoDBConn.GetDatabase(DatabaseName);
MongoCollection<BsonDocument> table = db[collName];
var group = new BsonDocument
{
{"$group", new BsonDocument
{
{
"_id","$state"
},
{
"totalPop", new BsonDocument
{
{ "$sum","$pop" }
}
}
}
}
};

var sort = new BsonDocument
{
{"$sort", new BsonDocument{ { "_id",1 }}}
};
var pipeline = new[] { group, sort };
var result = table.Aggregate(pipeline);
var matchingExamples = result.ResultDocuments.Select(x => x.ToDynamic()).ToList();
foreach (var example in matchingExamples)
{
var message = string.Format("{0}- {1}", example["_id"], example["totalPop"]);
Console.WriteLine(message);
}

2.计算每个州平均每个城市打人口数

> db.zipcode.aggregate({$group:{_id:{state:"$state",city:"$city"},pop:{$sum:"$pop"}}},
{$group:{_id:"$_id.state",avCityPop:{$avg:"$pop"}}},
{$sort:{_id:1}})

var group1 = new BsonDocument
{
{"$group", new BsonDocument
{
{
"_id",new BsonDocument
{
{"state","$state"},
{"city","$city"}
}
},
{
"pop", new BsonDocument
{
{ "$sum","$pop" }
}
}
}
}
};

var group2 = new BsonDocument
{
{"$group", new BsonDocument
{
{
"_id","$_id.state"
},

{
"avCityPop", new BsonDocument
{
{ "$avg","$pop" }
}
}
}
}
};

var pipeline1 = new[] { group1,group2, sort };
var result1 = table.Aggregate(pipeline1);
var matchingExamples1 = result1.ResultDocuments.Select(x => x.ToDynamic()).ToList();
foreach (var example in matchingExamples1)
{
var message = string.Format("{0}- {1}", example["_id"], example["avCityPop"]);
Console.WriteLine(message);
}

3.计算每个州人口最多和最少的城市名字

>db.zipcode.aggregate({$group:{_id:{state:"$state",city:"$city"},pop:{$sum:"$pop"}}},
{$sort:{pop:1}},
{$group:{_id:"$_id.state",biggestCity:{$last:"$_id.city"},biggestPop:{$last:"$pop"},smallestCity:{$first:"$_id.city"},smallestPop:{$first:"$pop"}}},
{$project:{_id:0,state:"$_id",biggestCity:{name:"$biggestCity",pop:"$biggestPop"},smallestCity:{name:"$smallestCity",pop:"$smallestPop"}}})

var sort1 = new BsonDocument
{
{"$sort", new BsonDocument{ { "pop",1 }}}
};

var group3 = new BsonDocument
{
{
"$group", new BsonDocument
{
{
"_id","$_id.state"
},
{
"biggestCity",new BsonDocument
{
{"$last","$_id.city"}
}
},


{
"biggestPop",new BsonDocument
{
{"$last","$pop"}
}
},


{
"smallestCity",new BsonDocument
{
{"$first","$_id.city"}
}
},


{
"smallestPop",new BsonDocument
{
{"$first","$pop"}
}
}
}
}
};

var project = new BsonDocument
{
{
"$project", new BsonDocument
{
{"_id",0},
{"state","$_id"},
{"biggestCity",new BsonDocument
{
{"name","$biggestCity"},
{"pop","$biggestPop"}
}},


{"smallestCity",new BsonDocument
{
{"name","$smallestCity"},
{"pop","$smallestPop"}
}
}
}
}
};

var pipeline2 = new[] { group1,sort1 ,group3, project };
var result2 = table.Aggregate(pipeline2);
var matchingExamples2 = result2.ResultDocuments.Select(x => x.ToDynamic()).ToList();
foreach (var example in matchingExamples2)
{
Console.WriteLine(example.ToString());
//var message = string.Format("{0}- {1}", example["_id"], example["avCityPop"]);
//Console.WriteLine(message);
}

总结

对于大多数的聚合操作,聚合管道可以提供很好的性能和一致的接口,使用起来比较简单, 和MapReduce一样,它也可以作用于分片集合,但是输出的结果只能保留在一个文档中,要遵守BSONDocument大小限制(当前是16M)。

管道对数据的类型和结果的大小会有一些限制,对于一些简单的固定的聚集操作可以使用管道,但是对于一些复杂的、大量数据集的聚合任务还是使用MapReduce。
=========