溫馨提示×

您好,登錄后才能下訂單哦!

密碼登錄×
登錄注冊(cè)×
其他方式登錄
點(diǎn)擊 登錄注冊(cè) 即表示同意《億速云用戶服務(wù)條款》

Elasticsearch 索引文檔

發(fā)布時(shí)間:2020-07-14 13:20:16 來源:網(wǎng)絡(luò) 閱讀:781 作者:JUN_LJ 欄目:數(shù)據(jù)庫(kù)

內(nèi)容主要通過翻譯官方文檔而來,版本7.10

  1. 索引文檔操作(通過curl實(shí)現(xiàn))

curl -X PUT "localhost:9200/twitter/_doc/1" -H 'Content-Type: application/json' -d'

{

? ? "user" : "kimchy",

? ? "post_date" : "2009-11-15T14:12:12",

? ? "message" : "trying out Elasticsearch"

}

'


-X 選項(xiàng): 指定curl的請(qǐng)求操作,默認(rèn)是GET,也可以是PUT POST DELETE

-H 選項(xiàng): 傳入請(qǐng)求頭

-d 選項(xiàng): data,數(shù)據(jù)內(nèi)容選項(xiàng)


不存在索引時(shí),會(huì)自動(dòng)創(chuàng)建。當(dāng)然可以進(jìn)行設(shè)置(通過action.auto_create_index)。

PUT _cluster/settings

{

? ? "persistent": {

? ? ? ? "action.auto_create_index": "twitter,index10,-index1*,+ind*"?

? ? }

}

注: 名稱為twitter,index10的索引會(huì)創(chuàng)建,不符合index1*格式,但符合ind*格式也會(huì)被創(chuàng)建。


PUT _cluster/settings

{

? ? "persistent": {

? ? ? ? "action.auto_create_index": "false"?

? ? }

}

注: 默認(rèn)全部不自動(dòng)創(chuàng)建。會(huì)提示錯(cuò)誤。如例子: {"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index [mytwitter]"


PUT _cluster/settings

{

? ? "persistent": {

? ? ? ? "action.auto_create_index": "true"?

? ? }

}

注: 默認(rèn)全部自動(dòng)創(chuàng)建


默認(rèn)的MAPPING規(guī)則,一個(gè)索引下只允許有一個(gè)type.

例如試圖創(chuàng)建第二個(gè)名為mydoc的type:

curl -X PUT "localhost:9200/twitter/mydoc/1" -H 'Content-Type: application/json' -d'

{

? ? "user" : "kimchy",

? ? "post_date" : "2009-11-15T14:12:12",

? ? "message" : "trying out Elasticsearch"

}

'

會(huì)產(chǎn)生報(bào)錯(cuò):{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"Rejecting mapping update to [twitter] as the final mapping would have more?


than 1 type: [_doc, mydoc]"}],"type":"illegal_argument_exception","reason":"Rejecting mapping update to [twitter] as the final mapping would have more than 1?


type


2 索引文檔的op_type選項(xiàng)(只允許新建,不允許更新文檔):

curl -X PUT "localhost:9200/twitter/_doc/1?op_type=create" -H 'Content-Type: application/json' -d'

{

? ? "user" : "kimchy",

? ? "post_date" : "2009-11-15T14:12:12",

? ? "message" : "trying out Elasticsearch"

}

'

如果索引文檔twitter/_doc/1已經(jīng)存在,創(chuàng)建就會(huì)失敗。


與上面等價(jià)的寫法:

curl -X PUT "localhost:9200/twitter/_create/1" -H 'Content-Type: application/json' -d'

{

? ? "user" : "kimchy",

? ? "post_date" : "2009-11-15T14:12:12",

? ? "message" : "trying out Elasticsearch"

}

'


文檔ID的自動(dòng)生成:

如果沒有指定文檔ID,系統(tǒng)會(huì)自動(dòng)生成一個(gè)唯一ID(索引該文檔理論肯定是新創(chuàng)建的,不會(huì)更新其他文檔):

例:

curl -X POST "localhost:9200/twitter/_doc/" -H 'Content-Type: application/json' -d'

{

? ? "user" : "mjj",

? ? "post_date" : "2009-11-15T14:12:12",

? ? "message" : "test Elasticsearch"

}

'

返回結(jié)果(部分): {"_index":"twitter","_type":"_doc","_id":"olLK42oBqV8-hMggVV3X"


3 樂觀的并發(fā)控制:

Optimistic concurrency controledit

Index operations can be made conditional and only be performed if the last modification to the document was assigned the sequence number and primary term?


specified by the if_seq_no and if_primary_term parameters. If a mismatch is detected, the operation will result in a VersionConflictException and a status?


code of 409. See Optimistic concurrency control for more details.

索引文檔結(jié)束后,返回結(jié)果中會(huì)包含一個(gè)序號(hào):_seq_no。索引文檔前會(huì)獲取下一個(gè)序號(hào),作為自己的序號(hào),結(jié)束后會(huì)再獲取序號(hào),進(jìn)行比較。如果序號(hào)不一致,說明有其他程序索


引了文檔。那么該次操作就返回409號(hào)錯(cuò)誤。


4. Routing(文檔存放于那個(gè)物理shard)

By default, shard placement ? or routing ? is controlled by using a hash of the document’s id value. For more explicit control, the value fed into the hash?


function used by the router can be directly specified on a per-operation basis using the routing parameter. For example:


POST twitter/_doc?routing=kimchy

{

? ? "user" : "kimchy",

? ? "post_date" : "2009-11-15T14:12:12",

? ? "message" : "trying out Elasticsearch"

}


In the example above, the "_doc" document is routed to a shard based on the routing parameter provided: "kimchy".


When setting up explicit mapping, the _routing field can be optionally used to direct the index operation to extract the routing value from the document?


itself. This does come at the (very minimal) cost of an additional document parsing pass. If the _routing mapping is defined and set to be required, the?


index operation will fail if no routing value is provided or extracted.

默認(rèn)情況下,系統(tǒng)通過對(duì)文檔id進(jìn)行hash運(yùn)算,確定存放于具體的shard。但我們也可以通過指定routing參數(shù),使hash函數(shù)對(duì)所提過的參數(shù)值進(jìn)行運(yùn)算,確定shard。

而且,還可以在mapping中,通過設(shè)置_routing字段來指示用文檔中的哪個(gè)值來進(jìn)行hash運(yùn)算。但是如果索引的文檔中沒有包含mapping設(shè)置中的字段,將會(huì)產(chǎn)生報(bào)錯(cuò)。


5 Wait For Active Shards

默認(rèn)設(shè)置下,primary shard索引完文檔就完成了操作。

但可以通過index.write.wait_for_active_shards調(diào)整,確保有多個(gè)shard已保存了變更,默認(rèn)該值為1(primay shard也算1個(gè)shard)。

如果設(shè)置為2,表示primary shard完成索引后,還要復(fù)制一份變更到另一個(gè)replica shard才行,replica shard完成前就需要等待。

如果index.write.wait_for_active_shards設(shè)置成all,就是所有num of shards+1. 索引操作需要有新的節(jié)點(diǎn)加入才能完成。

number_of_replicas數(shù)表示所需的replican shards. 但active shards包含primary shard.

例子:

For example, suppose we have a cluster of three nodes, A, B, and C and we create an index index with the number of replicas set to 3 (resulting in 4 shard?


copies, one more copy than there are nodes). If we attempt an indexing operation, by default the operation will only ensure the primary copy of each shard is?


available before proceeding. This means that even if B and C went down, and A hosted the primary shard copies, the indexing operation would still proceed?


with only one copy of the data. If wait_for_active_shards is set on the request to 3 (and all 3 nodes are up), then the indexing operation will require 3?


active shard copies before proceeding, a requirement which should be met because there are 3 active nodes in the cluster, each one holding a copy of the?


shard. However, if we set wait_for_active_shards to all (or to 4, which is the same), the indexing operation will not proceed as we do not have all 4 copies?


of each shard active in the index. The operation will timeout unless a new node is brought up in the cluster to host the fourth copy of the shard.

?

6. Noop updates 空更新


When updating a document using the index API a new version of the document is always created even if the document hasn’t changed. If this isn’t acceptable?


use the _update API with detect_noop set to true. This option isn’t available on the index API because the index API doesn’t fetch the old source and isn’


t able to compare it against the new source.


There isn’t a hard and fast rule about when noop updates aren’t acceptable. It’s a combination of lots of factors like how frequently your data source?


sends updates that are actually noops and how many queries per second Elasticsearch runs on the shard receiving the updates.

當(dāng)通過Index API更新一個(gè)文檔時(shí),無論內(nèi)容有沒有被實(shí)際更改,都會(huì)創(chuàng)建version。如果這是不可接受的,就要使用_update API,并設(shè)置空操作檢測(cè)選項(xiàng)(detect_noop)設(shè)置成


true.這個(gè)選項(xiàng)在index API中不存在,因?yàn)閕ndex API不會(huì)去獲取舊的數(shù)據(jù)與新的數(shù)據(jù)比對(duì)。


7 Timeout

The primary shard assigned to perform the index operation might not be available when the index operation is executed. Some reasons for this might be that?


the primary shard is currently recovering from a gateway or undergoing relocation. By default, the index operation will wait on the primary shard to become?


available for up to 1 minute before failing and responding with an error. The timeout parameter can be used to explicitly specify how long it waits. Here is?


an example of setting it to 5 minutes:

curl -X PUT "localhost:9200/twitter/_doc/1?timeout=5m" -H 'Content-Type: application/json' -d'

{

? ? "user" : "kimchy",

? ? "post_date" : "2009-11-15T14:12:12",

? ? "message" : "trying out Elasticsearch"

}

'

如果primary出現(xiàn)異常,不能完成索引文檔操作,系統(tǒng)就會(huì)等待,默認(rèn)情況是等待一分鐘,仍然異常就會(huì)超時(shí)報(bào)錯(cuò)。以上有設(shè)置超時(shí)時(shí)間為5分鐘。


8 Versioning

Each indexed document is given a version number. By default, internal versioning is used that starts at 1 and increments with each update, deletes included.?


Optionally, the version number can be set to an external value (for example, if maintained in a database). To enable this functionality, version_type should?


be set to external. The value provided must be a numeric, long value greater than or equal to 0, and less than around 9.2e+18.


When using the external version type, the system checks to see if the version number passed to the index request is greater than the version of the currently?


stored document. If true, the document will be indexed and the new version number used. If the value provided is less than or equal to the stored document’s?


version number, a version conflict will occur and the index operation will fail. For example:

curl -X PUT "localhost:9200/twitter/_doc/1?version=2&version_type=external" -H 'Content-Type: application/json' -d'

{

? ? "message" : "elasticsearch now has versioning support, double cool!"

}

'


索引文檔版本控制。

每個(gè)索引文檔都有個(gè)版本號(hào),默認(rèn)由ES自內(nèi)部制,從1開始,更新和刪除操作會(huì)增加版本序號(hào)。

版本也可以由外部系統(tǒng)控制,通過version_type設(shè)置為external和給定version值。如果給定的版本號(hào)大于當(dāng)前版本號(hào),會(huì)報(bào)錯(cuò)。手動(dòng)指定版本號(hào)的例子如上。


向AI問一下細(xì)節(jié)

免責(zé)聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點(diǎn)不代表本網(wǎng)站立場(chǎng),如果涉及侵權(quán)請(qǐng)聯(lián)系站長(zhǎng)郵箱:is@yisu.com進(jìn)行舉報(bào),并提供相關(guān)證據(jù),一經(jīng)查實(shí),將立刻刪除涉嫌侵權(quán)內(nèi)容。

AI