MongoDB学习笔记12-MongoDB聚合

MongoDB MongoDB

创建时间:2017-07-17 22:21

字数:2.7k 阅读:

count函数

查询文档数

> db.person.find().count()
6

distinct

https://docs.mongodb.com/manual/reference/command/distinct/index.html

去重，用法：db.runCommand({distinct:"集合名", key:"查询的键", query: { "查询条件": "A"}})

> db.runCommand({distinct:"person", key:"age", query: { name: "A"}})

指定排序规则，New in version 3.4.

倒序中文，仅执行基本字符的比较，忽略其他差异，如变音符号和大小写，添加参数 collation: { locale: "zh", strength: 1, backwards: true }，参数说明详见：https://docs.mongodb.com/manual/reference/collation/#collation-document-fields

另一种方式，db.collection.distinct()，使用方式与 distinct 类似。

group

文档：https://docs.mongodb.com/manual/reference/operator/aggregation/group/

参数包含一个 _id 字段，该字段按键包含不同的组。相当于 MySQL 中 group by 后的字段，不同的是 _id 后面可以跟 null。

参数还可以包含计算字段，这些字段包含按 $group 的 _id 字段分组的某些累加器表达式的值。

$group 不会对其输出文档进行排序。

集合数据如下：

{ "_id" : 1, "item" : "abc", "price" : 10, "quantity" : 2, "date" : ISODate("2014-03-01T08:00:00Z") }
{ "_id" : 2, "item" : "jkl", "price" : 20, "quantity" : 1, "date" : ISODate("2014-03-01T09:00:00Z") }
{ "_id" : 3, "item" : "xyz", "price" : 5, "quantity" : 10, "date" : ISODate("2014-03-15T09:00:00Z") }
{ "_id" : 4, "item" : "xyz", "price" : 5, "quantity" : 20, "date" : ISODate("2014-04-04T11:21:39.736Z") }
{ "_id" : 5, "item" : "abc", "price" : 10, "quantity" : 10, "date" : ISODate("2014-04-04T21:23:13.331Z") }

按年，月，日分组

并计算每组的总价格和平均数量

db.sales.aggregate(
   [
      {
        $group : {
           _id : { month: { $month: "$date" }, day: { $dayOfMonth: "$date" }, year: { $year: "$date" } },
           totalPrice: { $sum: { $multiply: [ "$price", "$quantity" ] } },
           averageQuantity: { $avg: "$quantity" },
           count: { $sum: 1 }
        }
      }
   ]
)

结果

{ "_id" : { "month" : 3, "day" : 15, "year" : 2014 }, "totalPrice" : 50, "averageQuantity" : 10, "count" : 1 }
{ "_id" : { "month" : 4, "day" : 4, "year" : 2014 }, "totalPrice" : 200, "averageQuantity" : 15, "count" : 2 }
{ "_id" : { "month" : 3, "day" : 1, "year" : 2014 }, "totalPrice" : 40, "averageQuantity" : 1.5, "count" : 2 }

Group by null

_id 为null，计算总价格，平均数量，集合中文档的总数

db.sales.aggregate(
   [
      {
        $group : {
           _id : null,
           totalPrice: { $sum: { $multiply: [ "$price", "$quantity" ] } },
           averageQuantity: { $avg: "$quantity" },
           count: { $sum: 1 }
        }
      }
   ]
)

结果

{ "_id" : null, "totalPrice" : 290, "averageQuantity" : 8.6, "count" : 5 }

检索不同的值

db.sales.aggregate( [ { $group : { _id : "$item" } } ] )

结果

{ "_id" : "xyz" }
{ "_id" : "jkl" }
{ "_id" : "abc" }

将书籍集合按作者分组

原始集合：

{ "_id" : 8751, "title" : "The Banquet", "author" : "Dante", "copies" : 2 }
{ "_id" : 8752, "title" : "Divine Comedy", "author" : "Dante", "copies" : 1 }
{ "_id" : 8645, "title" : "Eclogues", "author" : "Dante", "copies" : 2 }
{ "_id" : 7000, "title" : "The Odyssey", "author" : "Homer", "copies" : 10 }
{ "_id" : 7020, "title" : "Iliad", "author" : "Homer", "copies" : 10 }

操作

db.books.aggregate(
   [
     { $group : { _id : "$author", books: { $push: "$title" } } }
   ]
)

结果

{ "_id" : "Homer", "books" : [ "The Odyssey", "Iliad" ] }
{ "_id" : "Dante", "books" : [ "The Banquet", "Divine Comedy", "Eclogues" ] }

按作者分组集合，$$ROOT

db.books.aggregate(
   [
     { $group : { _id : "$author", books: { $push: "$$ROOT" } } }
   ]
)

结果

{
  "_id" : "Homer",
  "books" :
     [
       { "_id" : 7000, "title" : "The Odyssey", "author" : "Homer", "copies" : 10 },
       { "_id" : 7020, "title" : "Iliad", "author" : "Homer", "copies" : 10 }
     ]
}

{
  "_id" : "Dante",
  "books" :
     [
       { "_id" : 8751, "title" : "The Banquet", "author" : "Dante", "copies" : 2 },
       { "_id" : 8752, "title" : "Divine Comedy", "author" : "Dante", "copies" : 1 },
       { "_id" : 8645, "title" : "Eclogues", "author" : "Dante", "copies" : 2 }
     ]
}

aggregate

https://docs.mongodb.com/manual/reference/operator/aggregation-pipeline/
https://docs.mongodb.com/manual/reference/operator/aggregation/

集合中的数据如下：

{
   _id: ObjectId(7df78ad8902c)
   title: 'MongoDB Overview',
   description: 'MongoDB is no sql database',
   by_user: 'w3cschool.cc',
   url: 'http://www.w3cschool.cc',
   tags: ['mongodb', 'database', 'NoSQL'],
   likes: 100
},
{
   _id: ObjectId(7df78ad8902d)
   title: 'NoSQL Overview',
   description: 'No sql database is very fast',
   by_user: 'w3cschool.cc',
   url: 'http://www.w3cschool.cc',
   tags: ['mongodb', 'database', 'NoSQL'],
   likes: 10
},
{
   _id: ObjectId(7df78ad8902e)
   title: 'Neo4j Overview',
   description: 'Neo4j is no sql database',
   by_user: 'Neo4j',
   url: 'http://www.neo4j.com',
   tags: ['neo4j', 'database', 'NoSQL'],
   likes: 750
},

现在我们通过以上集合计算每个作者所写的文章数，使用 aggregate() 计算结果如下：

> db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$sum : 1}}}])
{
   "result" : [
      {
         "_id" : "w3cschool.cc",
         "num_tutorial" : 2
      },
      {
         "_id" : "Neo4j",
         "num_tutorial" : 1
      }
   ],
   "ok" : 1
}
>

以上实例类似 sql 语句：

select by_user as _id, count(*) as num_tutorial from mycol group by by_user

在上面的例子中，我们通过字段 by_user 字段对数据进行分组，并计算 by_user 字段相同值的总和。

$sum

计算总和。

db.mycol.aggregate([{
    $group: {
        _id: "$by_user",
        num_tutorial: {
            $sum: "$likes"
        }
    }
}])

$avg

计算平均值

db.mycol.aggregate([{
    $group: {
        _id: "$by_user",
        num_tutorial: {
            $avg: "$likes"
        }
    }
}])

$min

获取集合中所有文档对应值得最小值。

db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$min : "$likes"}}}])

$max

获取集合中所有文档对应值得最大值。

db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$max : "$likes"}}}])

$push

在结果文档中插入值到一个数组中。

db.mycol.aggregate([{$group : {_id : "$by_user", url : {$push: "$url"}}}])

$addToSet

在结果文档中插入值到一个数组中，但不创建副本。

db.mycol.aggregate([{$group : {_id : "$by_user", url : {$addToSet : "$url"}}}])

$first

根据资源文档的排序获取第一个文档数据。

db.mycol.aggregate([{$group : {_id : "$by_user", first_url : {$first : "$url"}}}])

$last

根据资源文档的排序获取最后一个文档数据

db.mycol.aggregate([{$group : {_id : "$by_user", last_url : {$last : "$url"}}}])

substr 截取字符串

从版本3.4开始不推荐使用：$substr 是 $substrBytes 的别名。

{ $substrBytes: [ <string expression>, <byte index>, <byte count> ] }

注意是 字节数，如果是中文，可能会报错，索引不能是 UTF-8 字符中间索引。

{ $substrBytes: [ "abcde", 1, 2 ] }

"bc"

{ $substrBytes: [ "Hello World!", 6, 5 ] }

"World"

{ $substrBytes: [ "cafétéria", 7, 3 ] }

"Error: Invalid range, starting index is a UTF-8 continuation byte."

计算出 cardnumber 字段，从第 6 位取 4 个字符，并大于 1960 的数量。

db.mycol.aggregate(
    [
        {
            "$project": {
                "_id": 0,
                "cardnumber": 1,
                "yearSubstring": {
                    "$substr": ["$cardnumber", 6, 4]
                }
            }
        },
        {
            "$match": {
                "cardnumber": {
                    "$exists": "true"
                },
                "yearSubstring": {
                    "$gt": "1960"
                }
            }
        },
        {
            $count: "count"
        }
    ]
)

count

https://docs.mongodb.com/manual/reference/operator/aggregation/count/

两种方法，根据过滤条件查询总数

// 不使用 count
db.mycol.aggregate( [
    {"$project": {"_id": 0, "cardnumber": 1, "yearSubstring": {"$substr": ["$cardnumber", 6, 4]}}},
    {"$match": {"cardnumber": {"$exists": "true"}, "yearSubstring": {"$gt": "1960"}}},
    {$group : {_id: null, myCount: {$sum : 1}}},
    {"$project": {"_id": 0, "myCount": 1}}
])

db.mycol.aggregate( [
    {"$project": {"_id": 0, "cardnumber": 1, "yearSubstring": {"$substr": ["$cardnumber", 6, 4]}}},
    {"$match": {"cardnumber": {"$exists": "true"}, "yearSubstring": {"$gt": "1960"}}},
    {$count: "count"}
])

管道的概念

管道在 Unix 和 Linux 中一般用于将当前命令的输出结果作为下一个命令的参数。

MongoDB 的聚合管道将 MongoDB 文档在一个管道处理完毕后将结果传递给下一个管道处理。管道操作是可以重复的。

表达式：处理输入文档并输出。表达式是无状态的，只能用于计算当前聚合管道的文档，不能处理其它的文档。

这里我们介绍一下聚合框架中常用的几个操作：

$project：修改输入文档的结构。可以用来重命名、增加或删除域，也可以用于创建计算结果以及嵌套文档。
$match：用于过滤数据，只输出符合条件的文档。$match 使用 MongoDB 的标准查询操作。
$limit：用来限制 MongoDB 聚合管道返回的文档数。
$skip：在聚合管道中跳过指定数量的文档，并返回余下的文档。
$unwind：将文档中的某一个数组类型字段拆分成多条，每条包含数组中的一个值。
$group：将集合中的文档分组，可用于统计结果。
$sort：将输入文档排序后输出。
$geoNear：输出接近某一地理位置的有序文档。

管道操作符实例

$project实例

db.article.aggregate({
    $project : {
        title : 1 ,
        author : 1 ,
    }
});

这样的话结果中就只还有 _id , tilte 和 author 三个字段了，默认情况下 _id 字段是被包含的，如果要想不包含 _id 话可以这样:

db.article.aggregate(
    { $project : {
        _id : 0 ,
        title : 1 ,
        author : 1
    }});

$match实例

db.articles.aggregate( [
    { $match : { score : { $gt : 70, $lte : 90 } } },
    { $group: { _id: null, count: { $sum: 1 } } }
] );

$match 用于获取分数大于 70 小于或等于 90 记录，然后将符合条件的记录送到下一阶段 $group 管道操作符进行处理。

$skip实例

db.article.aggregate({ $skip : 5 });

经过 $skip 管道操作符处理后，前五个文档被"过滤"掉。

应用

方式一

db.article.aggregate([
    { "$match" : { "identifytime" : { "$gte" : 1495555200 , "$lte" : 1503372850}}} ,
    { "$project" : { "aptype" : 1}} ,
    { "$group" : { "_id" :  "$aptype"  , "totalNum" : { "$sum" : 1}}} ,
    { "$sort" : { "totalNum" : -1}} ,
    { "$skip" : 0} ,
    { "$limit" : 15}
])

{"_id" : 1, "totalNum" : 5.0}
{"_id" : 2, "totalNum" : 2.0}
{"_id" : 3, "totalNum" : 1.0}
{"_id" : 0, "totalNum" : 1.0}

Spring Date MongoDB

TypedAggregation<T> agg = Aggregation.newAggregation(
        entityClass,
        Aggregation.match(criteria),
        Aggregation.project("aptype"),
        Aggregation.group("aptype").count().as("totalNum"),
        Aggregation.sort(Sort.Direction.DESC, "totalNum"),
        Aggregation.skip((long) ((thispage - 1) * 15)),//分页
        Aggregation.limit(15)
    );
AggregationResults<T> result = mongoTemplate.aggregate(agg,collectionName, entityClass);
List<T> resultList = result.getMappedResults();

方式二：为分组的字段`_id`建立别名

db.article.aggregate([
    { "$match" : { "identifytime" : { "$gte" : 1495555200 , "$lte" : 1503372850}}} ,
    { "$group" : { "_id" :  "$aptype" , "totalNum" : { "$sum" : 1}}} ,
    { "$project" : {"totalNum" : 1 , "_id" : 0, "aptype" : "$_id" }} ,
    { "$sort" : { "totalNum" : -1}} ,
    { "$skip" : 0} ,
    { "$limit" : 15}
])

{"totalNum" : 5.0, "aptype" : 1}
{"totalNum" : 2.0, "aptype" : 2}
{"totalNum" : 1.0, "aptype" : 3}
{"totalNum" : 1.0, "aptype" : 0}