您好,登錄后才能下訂單哦!
Aggregate
MongoDB中聚合(aggregate)主要用于處理數(shù)據(jù)(諸如統(tǒng)計平均值,求和等),并返回計算后的數(shù)據(jù)結(jié)果,類似sql語句中的 count(*)
語法如下:
db.collection.aggregate()
db.collection.aggregate(pipeline,options)
db.runCommand({
aggregate: "<collection>",
pipeline: [ <stage>, <...> ],
explain: <boolean>,
allowDiskUse: <boolean>,
cursor: <document>
})
在使用aggregate實現(xiàn)聚合操作之前,我們首先來認識下幾個常用的聚合操作符。
$project::可以對結(jié)果集中的鍵 重命名,控制鍵是否顯示,對列進行計算。
$match: 過濾結(jié)果集,只輸出符合條件的文檔。
$skip: 在顯示結(jié)果的時候跳過前幾行,并返回余下的文檔。
$sort: 對即將顯示的結(jié)果集排序
$limit: 控制結(jié)果集的大小
$unwind:將文檔中的某一個數(shù)組類型字段拆分成多條,每條包含數(shù)組中的一個值。
$geoNear:輸出接近某一地理位置的有序文檔。
$group: 分組,聚合,求和,平均數(shù),最大值,最小值,第一個,最后一個,等
表達式 描述 實例
$sum 計算總和 db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$sum : "$likes"}}}])
$avg 計算平均值 db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$avg : "$likes"}}}])
$min 獲取集合中所有文檔對應值得最小值 db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$min : "$likes"}}}])
$max 獲取集合中所有文檔對應值得最大值 db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$max : "$likes"}}}])
$push 在結(jié)果文檔中插入值到一個數(shù)組中 db.mycol.aggregate([{$group : {_id : "$by_user", url : {$push: "$url"}}}])
$addToSet在結(jié)果文檔中插入值到一個數(shù)組中,但不創(chuàng)建副本 db.mycol.aggregate([{$group : {_id : "$by_user", url : {$addToSet : "$url"}}}])
$first 根據(jù)資源文檔的排序獲取第一個文檔數(shù)據(jù) db.mycol.aggregate([{$group : {_id : "$by_user", first_url : {$first : "$url"}}}])
$last 根據(jù)資源文檔的排序獲取最后一個文檔數(shù)據(jù) db.mycol.aggregate([{$group : {_id : "$by_user", last_url : {$last : "$url"}}}])
實例:
db.createCollection("emp")
db.emp.insert({_id:1,"ename":"tom","age":25,"department":"Sales","salary":6000})
db.emp.insert({_id:2,"ename":"eric","age":24,"department":"HR","salary":4500})
db.emp.insert({_id:3,"ename":"robin","age":30,"department":"Sales","salary":8000})
db.emp.insert({_id:4,"ename":"jack","age":28,"department":"Development","salary":8000})
db.emp.insert({_id:5,"ename":"Mark","age":22,"department":"Development","salary":6500})
db.emp.insert({_id:6,"ename":"marry","age":23,"department":"Planning","salary":5000})
db.emp.insert({_id:7,"ename":"hellen","age":32,"department":"HR","salary":6000})
db.emp.insert({_id:8,"ename":"sarah","age":24,"department":"Development","salary":7000})
> use company switched to db company > db.emp.aggregate( ... {$group:{_id:"$department",dpct:{$sum:1}}} ... ) { "_id" : "Development", "dpct" : 3 } { "_id" : "HR", "dpct" : 2 } { "_id" : "Planning", "dpct" : 1 } { "_id" : "Sales", "dpct" : 2 } > db.emp.aggregate( ... {$group:{_id:"$department",salct:{$sum:"$salary"},salavg:{$avg:"$salary"}}} ... ) { "_id" : "Development", "salct" : 21500, "salavg" : 7166.666666666667 } { "_id" : "HR", "salct" : 10500, "salavg" : 5250 } { "_id" : "Planning", "salct" : 5000, "salavg" : 5000 } { "_id" : "Sales", "salct" : 14000, "salavg" : 7000 } > db.emp.aggregate( ... {$match:{age:{$lt:25}}} ... ) { "_id" : 2, "ename" : "eric", "age" : 24, "department" : "HR", "salary" : 4500 } { "_id" : 5, "ename" : "Mark", "age" : 22, "department" : "Development", "salary" : 6500 } { "_id" : 6, "ename" : "marry", "age" : 23, "department" : "Planning", "salary" : 5000 } { "_id" : 8, "ename" : "sarah", "age" : 24, "department" : "Development", "salary" : 7000 } > db.emp.aggregate( ... {$match:{age:{$gt:25}}}, ... {$group:{_id:"$department",salct:{$sum:"$salary"},salavg:{$avg:"$salary"}}} ... ) { "_id" : "HR", "salct" : 6000, "salavg" : 6000 } { "_id" : "Development", "salct" : 8000, "salavg" : 8000 } { "_id" : "Sales", "salct" : 8000, "salavg" : 8000 } > db.emp.aggregate( ... {$group:{_id:"$department",salct:{$sum:"$salary"},salavg:{$avg:"$salary"}}}, ... {$match:{salavg:{$gt:6000}}} ... ) { "_id" : "Development", "salct" : 21500, "salavg" : 7166.666666666667 } { "_id" : "Sales", "salct" : 14000, "salavg" : 7000 } > > db.emp.aggregate( ... {$sort:{age:1}},{$limit:3} ... ) { "_id" : 5, "ename" : "Mark", "age" : 22, "department" : "Development", "salary" : 6500 } { "_id" : 6, "ename" : "marry", "age" : 23, "department" : "Planning", "salary" : 5000 } { "_id" : 2, "ename" : "eric", "age" : 24, "department" : "HR", "salary" : 4500 } > db.emp.aggregate( {$sort:{age:-1}},{$limit:3} ) { "_id" : 7, "ename" : "hellen", "age" : 32, "department" : "HR", "salary" : 6000 } { "_id" : 3, "ename" : "robin", "age" : 30, "department" : "Sales", "salary" : 8000 } { "_id" : 4, "ename" : "jack", "age" : 28, "department" : "Development", "salary" : 8000 } > db.emp.aggregate( {$sort:{age:-1}},{$skip:4} ) { "_id" : 2, "ename" : "eric", "age" : 24, "department" : "HR", "salary" : 4500 } { "_id" : 8, "ename" : "sarah", "age" : 24, "department" : "Development", "salary" : 7000 } { "_id" : 6, "ename" : "marry", "age" : 23, "department" : "Planning", "salary" : 5000 } { "_id" : 5, "ename" : "Mark", "age" : 22, "department" : "Development", "salary" : 6500 } > > db.emp.aggregate( {$project:{"姓名":"$ename","年齡":"$age","部門":"$department","工資":"$salary",_id:0}}) { "姓名" : "tom", "年齡" : 25, "部門" : "Sales", "工資" : 6000 } { "姓名" : "eric", "年齡" : 24, "部門" : "HR", "工資" : 4500 } { "姓名" : "robin", "年齡" : 30, "部門" : "Sales", "工資" : 8000 } { "姓名" : "jack", "年齡" : 28, "部門" : "Development", "工資" : 8000 } { "姓名" : "Mark", "年齡" : 22, "部門" : "Development", "工資" : 6500 } { "姓名" : "marry", "年齡" : 23, "部門" : "Planning", "工資" : 5000 } { "姓名" : "hellen", "年齡" : 32, "部門" : "HR", "工資" : 6000 } { "姓名" : "sarah", "年齡" : 24, "部門" : "Development", "工資" : 7000 } > db.emp.aggregate( {$project:{"姓名":"$ename","年齡":"$age","部門":"$department","工資":"$salary",_id:0}},{$match:{"工資":{$gt:6000}}}) { "姓名" : "robin", "年齡" : 30, "部門" : "Sales", "工資" : 8000 } { "姓名" : "jack", "年齡" : 28, "部門" : "Development", "工資" : 8000 } { "姓名" : "Mark", "年齡" : 22, "部門" : "Development", "工資" : 6500 } { "姓名" : "sarah", "年齡" : 24, "部門" : "Development", "工資" : 7000 } >
Map Reduce
Map-Reduce是一種計算模型,簡單的說就是將大批量的工作(數(shù)據(jù))分解(MAP)執(zhí)行,然后再將結(jié)果合并成最終結(jié)果(REDUCE)
MongoDB提供的Map-Reduce非常靈活,對于大規(guī)模數(shù)據(jù)分析也相當實用。
以下是MapReduce的基本語法:
>db.collection.mapReduce(
function() {emit(key,value);}, //map 函數(shù)
function(key,values) {return reduceFunction}, //reduce 函數(shù)
{
out: collection,
query: document,
sort: document,
limit: number
}
)
使用 MapReduce 要實現(xiàn)兩個函數(shù) Map 函數(shù)和 Reduce 函數(shù),Map 函數(shù)調(diào)用 emit(key, value), 遍歷 collection 中所有的記錄, 將key 與 value 傳遞給 Reduce 函數(shù)進行處理。
Map 函數(shù)必須調(diào)用 emit(key, value) 返回鍵值對。
參數(shù)說明:
map :映射函數(shù) (生成鍵值對序列,作為 reduce 函數(shù)參數(shù))。
reduce 統(tǒng)計函數(shù),reduce函數(shù)的任務(wù)就是將key-values變成key-value,也就是把values數(shù)組變成一個單一的值value。。
out 統(tǒng)計結(jié)果存放集合 (不指定則使用臨時集合,在客戶端斷開后自動刪除)。
query 一個篩選條件,只有滿足條件的文檔才會調(diào)用map函數(shù)。(query。limit,sort可以隨意組合)
sort 和limit結(jié)合的sort排序參數(shù)(也是在發(fā)往map函數(shù)前給文檔排序),可以優(yōu)化分組機制
limit 發(fā)往map函數(shù)的文檔數(shù)量的上限(要是沒有limit,單獨使用sort的用處不大)
> db.emp.mapReduce( function() { emit(this.department,1); }, function(key,values) { return Array.sum(values) }, { out:"depart_summary" } ).find() { "_id" : "Development", "value" : 3 } { "_id" : "HR", "value" : 2 } { "_id" : "Planning", "value" : 1 } { "_id" : "Sales", "value" : 2 } 利用內(nèi)置的sum函數(shù)返回每個部門的人數(shù) > db.emp.mapReduce( function() { emit(this.department,this.salary); }, function(key,values) { return Array.avg(values) }, { out:"depart_summary" } ).find() { "_id" : "Development", "value" : 7166.666666666667 } { "_id" : "HR", "value" : 5250 } { "_id" : "Planning", "value" : 5000 } { "_id" : "Sales", "value" : 7000 } 利用內(nèi)置的avg函數(shù)返回每個部門的工資平均數(shù) > db.emp.mapReduce( function() { emit(this.department,this.salary); }, function(key,values) { return Array.avg(values).toFixed(2) }, { out:"depart_summary" } ).find() { "_id" : "Development", "value" : "7166.67" } { "_id" : "HR", "value" : "5250.00" } { "_id" : "Planning", "value" : 5000 } { "_id" : "Sales", "value" : "7000.00" } > 保留兩位小數(shù) > db.emp.mapReduce( function() { emit(this.department,this.salary); }, function(key,values) { return Array.sum(values) }, { out:"depart_summary" } ).find() { "_id" : "Development", "value" : 21500 } { "_id" : "HR", "value" : 10500 } { "_id" : "Planning", "value" : 5000 } { "_id" : "Sales", "value" : 14000 } > 利用內(nèi)置的sum函數(shù)返回每個部門的工資總和 > db.emp.mapReduce( function() { emit(this.department,{count:1}); }, function(key,values) { var sum=0; values.forEach(function(val){sum+=val.count}); return sum; }, { out:"depart_summary" } ).find() { "_id" : "Development", "value" : 3 } { "_id" : "HR", "value" : 2 } { "_id" : "Planning", "value" : { "count" : 1 } } { "_id" : "Sales", "value" : 2 } > 手工計算每個部門的員工總數(shù) > db.emp.mapReduce( function() { emit(this.department,{salct:this.salary,count:1}); }, function(key,values) { var res={salct:0,sum:0}; values.forEach(function(val){res.sum+=val.count;res.salct+=val.salct}); return res; }, { out:"depart_summary" } ).find() { "_id" : "Development", "value" : { "salct" : 21500, "sum" : 3 } } { "_id" : "HR", "value" : { "salct" : 10500, "sum" : 2 } } { "_id" : "Planning", "value" : { "salct" : 5000, "count" : 1 } } { "_id" : "Sales", "value" : { "salct" : 14000, "sum" : 2 } } > 手工計算每個部門的員工總數(shù)和工資總數(shù) > db.emp.mapReduce( function() { emit(this.department,{salct:this.salary,count:1}); }, function(key,values) { var res={salct:0,sum:0}; values.forEach(function(val){res.sum+=val.count;res.salct+=val.salct}); return res.salct/res.sum; }, { out:"depart_summary" } ).find() { "_id" : "Development", "value" : 7166.666666666667 } { "_id" : "HR", "value" : 5250 } { "_id" : "Planning", "value" : { "salct" : 5000, "count" : 1 } } { "_id" : "Sales", "value" : 7000 } > 手工計算每個部門的工資平均值 > db.emp.mapReduce( function() { emit(this.department,this.salary); }, function(key,values) { return Array.avg(values) }, { out:"depart_summary" } ).find({value:{$gt:5000}}) { "_id" : "Development", "value" : 7166.666666666667 } { "_id" : "HR", "value" : 5250 } { "_id" : "Sales", "value" : 7000 } 將分組計算后的值進行過濾顯示,只顯示工資平均數(shù)大于5000的部門 > db.emp.mapReduce( function() { emit(this.department,this.salary); }, function(key,values) { return Array.avg(values) }, { out:"depart_summary" } ).find({value:{$gt:5000}}).sort({value:1}) { "_id" : "HR", "value" : 5250 } { "_id" : "Sales", "value" : 7000 } { "_id" : "Development", "value" : 7166.666666666667 } 將分組計算后的值進行排序,默認為升序 > db.emp.mapReduce( function() { emit(this.department,this.salary); }, function(key,values) { return Array.avg(values) }, { out:"depart_summary" } ).find({value:{$gt:5000}}).sort({value:-1}) { "_id" : "Development", "value" : 7166.666666666667 } { "_id" : "Sales", "value" : 7000 } { "_id" : "HR", "value" : 5250 } > 將分組計算后的值進行排序,手工指定降序 > db.emp.mapReduce( function() { emit(this.department,this.salary); }, function(key,values) { return Array.avg(values) }, { out:"depart_summary" } ).find({value:{$gt:5000}}).sort({value:-1}).limit(2) { "_id" : "Development", "value" : 7166.666666666667 } { "_id" : "Sales", "value" : 7000 } > 將分組計算后的值進行降序排序后,取其中的兩個值 > db.emp.mapReduce( function() { emit(this.department,{count:1}); }, function(key,values) { var sum=0; values.forEach(function(val){sum+=val.count}); return sum; }, { out:"depart_summary",query:{age:{$gt:25}} } ).find() { "_id" : "Development", "value" : { "count" : 1 } } { "_id" : "HR", "value" : { "count" : 1 } } { "_id" : "Sales", "value" : { "count" : 1 } } > 分組前過濾數(shù)據(jù),然后再分組計算 > db.emp.mapReduce( function() { emit(this.department,{count:1}); }, function(key,values) { var sum=0; values.forEach(function(val){sum+=val.count}); return sum; }, { out:"depart_summary",query:{age:{$gt:22}},sort:{age:1} } ).find() { "_id" : "Development", "value" : 2 } { "_id" : "HR", "value" : 2 } { "_id" : "Planning", "value" : { "count" : 1 } } { "_id" : "Sales", "value" : 2 } > 分組前過濾數(shù)據(jù),并排序,然后再分組計算 (本示例無意義)
Group
基本語法如下:
db.runCommand({group:{
ns:集合名稱,
key:分組的鍵對象,
initial:初始化累加器,
$reduce:組分解器,
condition:條件,
finalize:組完成器}})
分組首先會按照key進行分組,每組的每個文檔全要執(zhí)行$reduce方法,該方法接收2 個參數(shù):一個是組內(nèi)本條記錄,一個是累加器數(shù)據(jù)
實例:
按照部門分組,計算每個部門的工資總和,如下所示:
> db.runCommand( ... {group:{ns:"emp",key:{"department":true},initial:{salct:0}, ... $reduce:function(oriDoc,prev){ prev.salct+=oriDoc.salary} ... }} ... ) { "waitedMS" : NumberLong(0), "retval" : [ { "department" : "Sales", "salct" : 14000 }, { "department" : "HR", "salct" : 10500 }, { "department" : "Development", "salct" : 21500 }, { "department" : "Planning", "salct" : 5000 } ], "count" : NumberLong(8), "keys" : NumberLong(4), "ok" : 1 } > 統(tǒng)計每個部門的員工總量和工資總和,如下所示: > db.runCommand( {group:{ns:"emp",key:{"department":true},initial:{salct:0,count:0}, $reduce:function(oriDoc,prev){ prev.salct+=oriDoc.salary;prev.count+=1} }} ) { "waitedMS" : NumberLong(0), "retval" : [ { "department" : "Sales", "salct" : 14000, "count" : 2 }, { "department" : "HR", "salct" : 10500, "count" : 2 }, { "department" : "Development", "salct" : 21500, "count" : 3 }, { "department" : "Planning", "salct" : 5000, "count" : 1 } ], "count" : NumberLong(8), "keys" : NumberLong(4), "ok" : 1 } > 統(tǒng)計每個部門的員工總量、工資總和及平均值,如下所示: > db.runCommand( {group:{ns:"emp",key:{"department":true},initial:{salct:0,count:0,avg:0}, $reduce:function(oriDoc,prev){ prev.salct+=oriDoc.salary;prev.count+=1; prev.avg=(prev.salct/prev.count).toFixed(2) } }} ) { "waitedMS" : NumberLong(0), "retval" : [ { "department" : "Sales", "salct" : 14000, "count" : 2, "avg" : "7000.00" }, { "department" : "HR", "salct" : 10500, "count" : 2, "avg" : "5250.00" }, { "department" : "Development", "salct" : 21500, "count" : 3, "avg" : "7166.67" }, { "department" : "Planning", "salct" : 5000, "count" : 1, "avg" : "5000.00" } ], "count" : NumberLong(8), "keys" : NumberLong(4), "ok" : 1 } > 統(tǒng)計每個部門的最高工資是多少,如下所示: > db.runCommand( {group:{ns:"emp",key:{"department":true},initial:{salct:0}, $reduce:function(oriDoc,prev){ if(oriDoc.salary>prev.salct){prev.salct=oriDoc.salary}} }} ) { "waitedMS" : NumberLong(0), "retval" : [ { "department" : "Sales", "salct" : 8000 }, { "department" : "HR", "salct" : 6000 }, { "department" : "Development", "salct" : 8000 }, { "department" : "Planning", "salct" : 5000 } ], "count" : NumberLong(8), "keys" : NumberLong(4), "ok" : 1 } > 統(tǒng)計每個部門的最高工資,并對結(jié)果過濾,只顯示大于5000的部門,如下所示: > db.runCommand( {group:{ns:"emp",key:{"department":true},initial:{salct:0}, $reduce:function(oriDoc,prev){ if(oriDoc.salary>prev.salct){prev.salct=oriDoc.salary}},condition:{salary:{$gt:5000}} }} ) { "waitedMS" : NumberLong(0), "retval" : [ { "department" : "Sales", "salct" : 8000 }, { "department" : "Development", "salct" : 8000 }, { "department" : "HR", "salct" : 6000 } ], "count" : NumberLong(6), "keys" : NumberLong(3), "ok" : 1 } > 將統(tǒng)計后的結(jié)果加上描述,如下所示: > db.runCommand( {group:{ns:"emp",key:{"department":true},initial:{salct:0}, ... $reduce:function(oriDoc,prev){ if(oriDoc.salary>prev.salct){prev.salct=oriDoc.salary}}, ... condition:{salary:{$gt:5000}}, ... finalize:function(prev){prev.salct="Department of the highest salary is "+prev.salct} ... }}) { "waitedMS" : NumberLong(0), "retval" : [ { "department" : "Sales", "salct" : "Department of the highest salary is 8000" }, { "department" : "Development", "salct" : "Department of the highest salary is 8000" }, { "department" : "HR", "salct" : "Department of the highest salary is 6000" } ], "count" : NumberLong(6), "keys" : NumberLong(3), "ok" : 1 } > 用函數(shù)格式化分組的鍵:如果集合中出現(xiàn)鍵Department和department同時存在,那么分組有點麻煩,解決方法如下: > db.emp.insert({ ... "_id":9,"ename":"sophie","age":28,"Department":"HR","salary":18000 ... }) WriteResult({ "nInserted" : 1 }) > db.emp.find() { "_id" : 1, "ename" : "tom", "age" : 25, "department" : "Sales", "salary" : 6000 } { "_id" : 2, "ename" : "eric", "age" : 24, "department" : "HR", "salary" : 4500 } { "_id" : 3, "ename" : "robin", "age" : 30, "department" : "Sales", "salary" : 8000 } { "_id" : 4, "ename" : "jack", "age" : 28, "department" : "Development", "salary" : 8000 } { "_id" : 5, "ename" : "Mark", "age" : 22, "department" : "Development", "salary" : 6500 } { "_id" : 6, "ename" : "marry", "age" : 23, "department" : "Planning", "salary" : 5000 } { "_id" : 7, "ename" : "hellen", "age" : 32, "department" : "HR", "salary" : 6000 } { "_id" : 8, "ename" : "sarah", "age" : 24, "department" : "Development", "salary" : 7000 } { "_id" : 9, "ename" : "sophie", "age" : 28, "Department" : "HR", "salary" : 18000 } > > db.runCommand( {group:{ns:"emp", ... $keyf:function(oriDoc){if(oriDoc.Department){return{department:oriDoc.Department}}else{return{department:oriDoc.department}}}, ... initial:{salct:0}, ... $reduce:function(oriDoc,prev){ if(oriDoc.salary>prev.salct){prev.salct=oriDoc.salary}}, ... condition:{salary:{$gt:5000}}, ... finalize:function(prev){prev.salct="Department of the highest salary is "+prev.salct} ... }} ) { "waitedMS" : NumberLong(0), "retval" : [ { "department" : "Sales", "salct" : "Department of the highest salary is 8000" }, { "department" : "Development", "salct" : "Department of the highest salary is 8000" }, { "department" : "HR", "salct" : "Department of the highest salary is 18000" } ], "count" : NumberLong(7), "keys" : NumberLong(3), "ok" : 1 } >
免責聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點不代表本網(wǎng)站立場,如果涉及侵權(quán)請聯(lián)系站長郵箱:is@yisu.com進行舉報,并提供相關(guān)證據(jù),一經(jīng)查實,將立刻刪除涉嫌侵權(quán)內(nèi)容。