【MongoDB】03、MongoDB索引及分片基礎(chǔ)

發(fā)布時(shí)間：2020-06-30 21:04:50 來(lái)源：網(wǎng)絡(luò) 閱讀：12262 作者：xiexiaojun 欄目：數(shù)據(jù)庫(kù)

一、MongoDB配置

mongodb配置文件/etc/mongodb.conf中的配置項(xiàng)，其實(shí)都是mongod啟動(dòng)選項(xiàng)（和memcached一樣）

[root@Node7 ~]# mongod --help
Allowed options:

General options:
  -h [ --help ]               show this usage information
  --version                   show version information
  -f [ --config ] arg         configuration file specifying additional options
  -v [ --verbose ]            be more verbose (include multiple times for more 
                              verbosity e.g. -vvvvv)
  --quiet                     quieter output
  --port arg                  specify port number - 27017 by default
  --bind_ip arg               comma separated list of ip addresses to listen on
                              - all local ips by default
  --maxConns arg              max number of simultaneous connections - 20000 by
                              default
  --logpath arg               log file to send write to instead of stdout - has
                              to be a file, not directory
  --logappend                 append to logpath instead of over-writing
  --pidfilepath arg           full path to pidfile (if not set, no pidfile is 
                              created)
  --keyFile arg               private key for cluster authentication
  --setParameter arg          Set a configurable parameter
  --nounixsocket              disable listening on unix sockets
  --unixSocketPrefix arg      alternative directory for UNIX domain sockets 
                              (defaults to /tmp)
  --fork                      fork server process
  --syslog                    log to system's syslog facility instead of file 
                              or stdout
  --auth                      run with security
  --cpu                       periodically show cpu and iowait utilization
  --dbpath arg                directory for datafiles - defaults to /data/db/
  --diaglog arg               0=off 1=W 2=R 3=both 7=W+some reads
  --directoryperdb            each database will be stored in a separate 
                              directory
  --ipv6                      enable IPv6 support (disabled by default)
  --journal                   enable journaling
  --journalCommitInterval arg how often to group/batch commit (ms)
  --journalOptions arg        journal diagnostic options
  --jsonp                     allow JSONP access via http (has security 
                              implications)
  --noauth                    run without security
  --nohttpinterface           disable http interface
  --nojournal                 disable journaling (journaling is on by default 
                              for 64 bit)
  --noprealloc                disable data file preallocation - will often hurt
                              performance
  --noscripting               disable scripting engine
  --notablescan               do not allow table scans
  --nssize arg (=16)          .ns file size (in MB) for new databases
  --profile arg               0=off 1=slow, 2=all
  --quota                     limits each database to a certain number of files
                              (8 default)
  --quotaFiles arg            number of files allowed per db, requires --quota
  --repair                    run repair on all dbs
  --repairpath arg            root directory for repair files - defaults to 
                              dbpath
  --rest                      turn on simple rest api
  --shutdown                  kill a running server (for init scripts)
  --slowms arg (=100)         value of slow for profile and console log
  --smallfiles                use a smaller default file size
  --syncdelay arg (=60)       seconds between disk syncs (0=never, but not 
                              recommended)
  --sysinfo                   print some diagnostic system information
  --upgrade                   upgrade db if needed

Replication options:
  --oplogSize arg       size to use (in MB) for replication op log. default is 
                        5% of disk space (i.e. large is good)

Master/slave options (old; use replica sets instead):
  --master              master mode
  --slave               slave mode
  --source arg          when slave: specify master as <server:port>
  --only arg            when slave: specify a single database to replicate
  --slavedelay arg      specify delay (in seconds) to be used when applying 
                        master ops to slave
  --autoresync          automatically resync if slave data is stale

Replica set options:
  --replSet arg           arg is <setname>[/<optionalseedhostlist>]
  --replIndexPrefetch arg specify index prefetching behavior (if secondary) 
                          [none|_id_only|all]

Sharding options:
  --configsvr           declare this is a config db of a cluster; default port 
                        27019; default dir /data/configdb
  --shardsvr            declare this is a shard db of a cluster; default port 
                        27018

SSL options:
  --sslOnNormalPorts              use ssl on configured ports
  --sslPEMKeyFile arg             PEM file for ssl
  --sslPEMKeyPassword arg         PEM file password
  --sslCAFile arg                 Certificate Authority file for SSL
  --sslCRLFile arg                Certificate Revocation List file for SSL
  --sslWeakCertificateValidation  allow client to connect without presenting a 
                                  certificate
  --sslFIPSMode                   activate FIPS 140-2 mode at startup

常用配置參數(shù)：

fork={true|false} mongod是否運(yùn)行于后臺(tái)

bind_ip=IP 指定監(jiān)聽(tīng)地址

port=PORT 指定監(jiān)聽(tīng)的端口，默認(rèn)為27017

maxConns=N 指定最大并發(fā)連接數(shù)

syslog=/PATH/TO/SAME_FILE 指定日志文件

httpinterface=true 是否啟動(dòng)web監(jiān)控功能，端口為mongod端口 + 1000

journal 是否啟動(dòng)事務(wù)日志，默認(rèn)已啟動(dòng)

slowms arg (=100) 設(shè)定慢查詢，單位為ms，超過(guò)設(shè)定的時(shí)間就為慢查詢，默認(rèn)100ms

repair 意外關(guān)閉時(shí)，應(yīng)該啟用這樣來(lái)修復(fù)數(shù)據(jù)

二、索引

索引通常能夠極大的提高查詢的效率，如果沒(méi)有索引，MongoDB在讀取數(shù)據(jù)時(shí)必須掃描集合中的每個(gè)文件并選取那些符合查詢條件的記錄。這種掃描全集合的查詢效率是非常低的，特別在處理大量的數(shù)據(jù)時(shí)，查詢可以要花費(fèi)幾十秒甚至幾分鐘，這對(duì)網(wǎng)站的性能是非常致命的。

索引是特殊的數(shù)據(jù)結(jié)構(gòu)，索引存儲(chǔ)在一個(gè)易于遍歷讀取的數(shù)據(jù)集合中，索引是對(duì)數(shù)據(jù)庫(kù)表中一列或多列的值進(jìn)行排序的一種結(jié)構(gòu)

1、索引的類型

B+ Tree、hash、空間索引、全文索引

MongoDB支持的索引：

單鍵索引、組合索引（多字段索引）、

多鍵索引：索引創(chuàng)建在值為鍵值對(duì)上的索引

空間索引：基于位置查找

文本索引：相當(dāng)于全文索引

hash索引：精確查找，不適用于范圍查找

2、索引的管理

創(chuàng)建：

db.mycoll.ensureIndex(keypattern[,options])

查看幫助信息：

db.mycoll.ensureIndex(keypattern[,options]) - options is an object with these possible fields: name, unique, dropDups

db.COLLECTION_NAME.ensureIndex({KEY:1})

語(yǔ)法中 Key 值為你要?jiǎng)?chuàng)建的索引字段，1為指定按升序創(chuàng)建索引，如果你想按降序來(lái)創(chuàng)建索引指定為-1即可。ensureIndex() 方法中你也可以設(shè)置使用多個(gè)字段創(chuàng)建索引（關(guān)系型數(shù)據(jù)庫(kù)中稱作復(fù)合索引）。db.col.ensureIndex({"title":1,"description":-1})

ensureIndex() 接收可選參數(shù)，可選參數(shù)列表如下：

Parameter	Type	Description
background	Boolean	建索引過(guò)程會(huì)阻塞其它數(shù)據(jù)庫(kù)操作，background可指定以后臺(tái)方式創(chuàng)建索引，即增加 "background" 可選參數(shù)。 "background" 默認(rèn)值為false。
unique	Boolean	建立的索引是否唯一。指定為true創(chuàng)建唯一索引。默認(rèn)值為false.
name	string	索引的名稱。如果未指定，MongoDB的通過(guò)連接索引的字段名和排序順序生成一個(gè)索引名稱。
dropDups	Boolean	在建立唯一索引時(shí)是否刪除重復(fù)記錄,指定 true 創(chuàng)建唯一索引。默認(rèn)值為false.
sparse	Boolean	對(duì)文檔中不存在的字段數(shù)據(jù)不啟用索引；這個(gè)參數(shù)需要特別注意，如果設(shè)置為true的話，在索引字段中不會(huì)查詢出不包含對(duì)應(yīng)字段的文檔.。默認(rèn)值為false.
expireAfterSeconds	integer	指定一個(gè)以秒為單位的數(shù)值，完成 TTL設(shè)定，設(shè)定集合的生存時(shí)間。
v	index version	索引的版本號(hào)。默認(rèn)的索引版本取決于mongod創(chuàng)建索引時(shí)運(yùn)行的版本。
weights	document	索引權(quán)重值，數(shù)值在 1 到 99,999 之間，表示該索引相對(duì)于其他索引字段的得分權(quán)重。
default_language	string	對(duì)于文本索引，該參數(shù)決定了停用詞及詞干和詞器的規(guī)則的列表。默認(rèn)為英語(yǔ)
language_override	string	對(duì)于文本索引，該參數(shù)指定了包含在文檔中的字段名，語(yǔ)言覆蓋默認(rèn)的language，默認(rèn)值為 language.

查詢：

db.mycoll.getIndex()

刪除：

db.mycoll.dropIndexes() 刪除當(dāng)前集合的所有索引

db.mycoll.dropIndexes("index") 刪除指定索引

db.mycoll.reIndex() 重新構(gòu)建索引，

實(shí)例：

> db.students.find()
> for (i=1;i<=100;i++) db.students.insert({name:"student"+i, age:(i%100)}) 
                                                                  #  使用for循環(huán) 
> db.students.find().count()
100

> db.students.find()
{ "_id" : ObjectId("58d613021e8383d30814f846"), "name" : "student1", "age" : 1 }
{ "_id" : ObjectId("58d613021e8383d30814f847"), "name" : "student2", "age" : 2 }
{ "_id" : ObjectId("58d613021e8383d30814f848"), "name" : "student3", "age" : 3 }
{ "_id" : ObjectId("58d613021e8383d30814f849"), "name" : "student4", "age" : 4 }
{ "_id" : ObjectId("58d613021e8383d30814f84a"), "name" : "student5", "age" : 5 }
{ "_id" : ObjectId("58d613021e8383d30814f84b"), "name" : "student6", "age" : 6 }
{ "_id" : ObjectId("58d613021e8383d30814f84c"), "name" : "student7", "age" : 7 }
{ "_id" : ObjectId("58d613021e8383d30814f84d"), "name" : "student8", "age" : 8 }
{ "_id" : ObjectId("58d613021e8383d30814f84e"), "name" : "student9", "age" : 9 }
{ "_id" : ObjectId("58d613021e8383d30814f84f"), "name" : "student10", "age" : 10 }
{ "_id" : ObjectId("58d613021e8383d30814f850"), "name" : "student11", "age" : 11 }
{ "_id" : ObjectId("58d613021e8383d30814f851"), "name" : "student12", "age" : 12 }
{ "_id" : ObjectId("58d613021e8383d30814f852"), "name" : "student13", "age" : 13 }
{ "_id" : ObjectId("58d613021e8383d30814f853"), "name" : "student14", "age" : 14 }
{ "_id" : ObjectId("58d613021e8383d30814f854"), "name" : "student15", "age" : 15 }
{ "_id" : ObjectId("58d613021e8383d30814f855"), "name" : "student16", "age" : 16 }
{ "_id" : ObjectId("58d613021e8383d30814f856"), "name" : "student17", "age" : 17 }
{ "_id" : ObjectId("58d613021e8383d30814f857"), "name" : "student18", "age" : 18 }
{ "_id" : ObjectId("58d613021e8383d30814f858"), "name" : "student19", "age" : 19 }
{ "_id" : ObjectId("58d613021e8383d30814f859"), "name" : "student20", "age" : 20 }
Type "it" for more      # 只顯示前20個(gè)，it顯示更多

> db.students.ensureIndex({name:1})   #　在name鍵上構(gòu)建索引，1表示升序，-1表示降序
> show collections
students
system.indexes
t1

> db.students.getIndexes()
[
	{                               # 默認(rèn)的索引
		"v" : 1,              
		"name" : "_id_",
		"key" : {
			"_id" : 1
		},
		"ns" : "students.students"　　# 數(shù)據(jù)庫(kù).集合
	},
	{
		"v" : 1,
		"name" : "name_1",      #　自動(dòng)生成的索引名
		"key" : {　　　
			"name" : 1　　　# 在name鍵上創(chuàng)建的索引
		},
		"ns" : "students.students"  
	}
]

> db.students.dropIndexes("name_1")      #　刪除指定索引
{
	"nIndexesWas" : 2,
	"msg" : "non-_id indexes dropped for collection",
	"ok" : 1
}
> db.students.getIndexes()
[
	{
		"v" : 1,
		"name" : "_id_",
		"key" : {
			"_id" : 1
		},
		"ns" : "students.students"
	}
]
> db.students.dropIndexes()　　　　　　　　# 默認(rèn)的索引無(wú)法刪除，
{
	"nIndexesWas" : 1,
	"msg" : "non-_id indexes dropped for collection",
	"ok" : 1
}
> db.students.getIndexes()
[
	{
		"v" : 1,
		"name" : "_id_",
		"key" : {
			"_id" : 1
		},
		"ns" : "students.students"
	}
	
> db.students.find({age:"90"}).explain()       # 顯示查詢過(guò)程
{
	"cursor" : "BtreeCursor t1",
	"isMultiKey" : false,
	"n" : 0,
	"nscannedObjects" : 0,　　　　　
	"nscanned" : 0,
	"nscannedObjectsAllPlans" : 0,
	"nscannedAllPlans" : 0,
	"scanAndOrder" : false,
	"indexOnly" : false,
	"nYields" : 0,
	"nChunkSkips" : 0,
	"millis" : 17,
	"indexBounds" : {               #　使用的索引
		"age" : [
			[
				"90",
				"90"
			]
		]
	},
	"server" : "Node7:27017"
}

三、MongoDB的分片

1、分片簡(jiǎn)介

隨著業(yè)務(wù)發(fā)展，當(dāng)數(shù)據(jù)集越來(lái)越大，CPU、Memory、IO出現(xiàn)瓶頸，就需要對(duì)mongodb進(jìn)行擴(kuò)展。

增加mongodb只能均衡讀壓力，不能均衡寫(xiě)壓力，就需要對(duì)數(shù)據(jù)集分片。

mongodb原生支持分片

MySQL的分片解決方案（框架），需要資深DBA（5年以上經(jīng)驗(yàn)）

Gizzard, HiveDB, MySQL Proxy + HSACLE, Hibernate Shard, Pyshards

2、分片架構(gòu)中的角色

【MongoDB】03、MongoDB索引及分片基礎(chǔ)

mongos：Router

相當(dāng)于代理，將用戶請(qǐng)求路由到合適的分片上執(zhí)行，本身不存儲(chǔ)數(shù)據(jù)也不查詢數(shù)據(jù)，

config server：元數(shù)據(jù)服務(wù)器，也需要多個(gè)，但不是副本集，需要借助其它工具實(shí)現(xiàn)如zookeeper

存放的是shard服務(wù)器上存儲(chǔ)的數(shù)據(jù)集的索引

shard: 數(shù)據(jù)節(jié)點(diǎn)，也稱mongod實(shí)例

zookeeper：

常用于實(shí)現(xiàn)分布式系統(tǒng)中心節(jié)點(diǎn)協(xié)調(diào)，能夠提供選舉并選舉出主節(jié)點(diǎn)機(jī)制；zookeeper本身也可以自行做分布式。

3、分片的方式

分片是基于collection

為保證每個(gè)shard節(jié)點(diǎn)上數(shù)據(jù)集均衡，將每個(gè)collectin切割成大小固定的chunk（塊），然后逐個(gè)分配給shard節(jié)點(diǎn)。

基于范圍切片：

range，所用到的索引一定是順序索引，支持排序如：B tree 索引

根據(jù)索引平均分配chunk

基于列表切片：

list，離散的方式，將值放在列表中

基于hash切片：

hash，按鍵對(duì)shard服務(wù)器的個(gè)數(shù)取模，分散存放，實(shí)現(xiàn)熱點(diǎn)數(shù)據(jù)發(fā)散

具體使用哪種切片方式需要根據(jù)自己的業(yè)務(wù)模型來(lái)定

切片的原則：

寫(xiě)離散，讀集中

db.enableSharding("testdb")

四、實(shí)戰(zhàn)案例

1、架構(gòu)

【MongoDB】03、MongoDB索引及分片基礎(chǔ)

2、配置過(guò)程

1）應(yīng)先配置config server節(jié)點(diǎn)

使用configsvr=true配置，無(wú)需加入副本集，監(jiān)聽(tīng)在tcp:27019端口上

2）mongos

只需啟動(dòng)mongos時(shí)，使用--configdb=172.16.100.16:27019 指定config server即可,監(jiān)聽(tīng)在tcp 27017作為代理

mongos啟動(dòng)時(shí)的選項(xiàng)：

mongos --configdb=172.168.100.16 --fork --logpath=/var/log/mongodb/mongos.log

3）在mongos節(jié)點(diǎn)上添加shard節(jié)點(diǎn)

和shard相關(guān)命令的幫助：

testSet:PRIMARY> sh.help()
	sh.addShard( host )                       server:port OR setname/server:port
	                      #　添加shard節(jié)點(diǎn)，可以是副本集名稱
	sh.enableSharding(dbname)                 enables sharding on the database dbname                    
	                     　#　指定在哪個(gè)數(shù)據(jù)庫(kù)上啟用切片功能
	sh.shardCollection(fullName,key,unique)   shards the collection
	                       # 對(duì)哪個(gè)collection作切片
	sh.splitFind(fullName,find)               splits the chunk that find is in at the median
	sh.splitAt(fullName,middle)               splits the chunk that middle is in at middle
	sh.moveChunk(fullName,find,to)            move the chunk where 'find' is to 'to' (name of shard)
	sh.setBalancerState( <bool on or not> )   turns the balancer on or off true=on, false=off
	sh.getBalancerState()                     return true if enabled
	sh.isBalancerRunning()                    return true if the balancer has work in progress on any mongos
	sh.addShardTag(shard,tag)                 adds the tag to the shard
	sh.removeShardTag(shard,tag)              removes the tag from the shard
	sh.addTagRange(fullName,min,max,tag)      tags the specified range of the given collection
	sh.status()                               prints a general overview of the clustest                # 查看shard的狀態(tài)；“primary” 表示如果一些collection很小，沒(méi)必要做shard，沒(méi)有做shard的collection存放的數(shù)據(jù)節(jié)點(diǎn)

創(chuàng)建一個(gè)collection時(shí)，明確指定基于哪個(gè)鍵作shard

sh.shardCollection(fullName,key,unique)

fullName：為完整的名字，包括數(shù)據(jù)庫(kù)和集合：數(shù)據(jù)庫(kù)名稱.集合名稱

例子：sh.shardCollection("testdb.students",{"age":1})

表示對(duì)testdb庫(kù)中students集合做切片，基于“age”字段創(chuàng)建升序索引；然后在testdb庫(kù)students集合下的數(shù)據(jù)就會(huì)自動(dòng)分發(fā)到各個(gè)shard節(jié)點(diǎn)上

use admin

db.runCommand("listShards") # 列出shard節(jié)點(diǎn)

db.printShardingStatus()和sh.status()一樣

sh.isBanlancerRunning() # 查看均衡器是否在運(yùn)行,需要均衡時(shí)才會(huì)自動(dòng)運(yùn)行，

sh.getBalancerState() # 均衡功能是否開(kāi)啟

sh.moveChunk(fullName,find,to) # 手動(dòng)移動(dòng)chunk，不建議使用

向AI問(wèn)一下細(xì)節(jié)

【MongoDB】03、MongoDB索引及分片基礎(chǔ)

猜你喜歡

最新資訊

相關(guān)推薦

相關(guān)標(biāo)簽

【MongoDB】03、MongoDB索引及分片基礎(chǔ)