Elasticsearch中如何進行Match查詢

發(fā)布時間：2021-11-16 16:55:00 來源：億速云閱讀：268 作者：柒染欄目：大數(shù)據(jù)

Elasticsearch中如何進行Match查詢，針對這個問題，這篇文章詳細介紹了相對應(yīng)的分析和解答，希望可以幫助更多想解決這個問題的小伙伴找到更簡單易行的方法。

如果索引單詞對而不是索引獨立的單詞，就能對這些單詞的上下文盡可能多的保留。這個時候就需要用到shingles。

例句：Sue ate the alligator 
unigram：["sue", "ate", "the", "alligator"]
bigrams：["sue ate", "ate the", "the alligator"]
trigrams：["sue ate the", "ate the alligator"]

備注：
Trigrams 提供了更高的精度，但是也大大增加了索引中唯一詞項的數(shù)量。在大多數(shù)情況下，Bigrams 就夠了。

幸運的是，用戶傾向于使用和搜索數(shù)據(jù)相似的構(gòu)造來表達搜索意圖。
但這一點很重要：只是索引 bigrams 是不夠的；我們?nèi)匀恍枰?nbsp;unigrams ，但可以將匹配 bigrams 作為增加相關(guān)度評分的信號。

Shingles 需要在索引時作為分析過程的一部分被創(chuàng)建。 
我們可以將 unigrams 和 bigrams 都索引到單個字段中， 但將它們分開保存在能被獨立查詢的字段會更清晰。
unigrams 字段將構(gòu)成我們搜索的基礎(chǔ)部分，而 bigrams 字段用來提高相關(guān)度。

注意：
詞項匹配
只有當(dāng)用戶輸入的查詢內(nèi)容和在原始文檔中順序相同時，shingles 才是有用的
總結(jié)：
使用短語查詢時使用Es默認的標準分詞器（標準分詞器：細粒度切分）最好，這樣可以使查詢分詞和索引分詞的詞項最大可能的達到匹配
特別適合需要前后詞一起搭配的情景（例：人名、地名...）

數(shù)據(jù)準備階段

新建索引setting：
PUT /my_index
{
    "settings": {
        "number_of_shards": 1,
        "analysis": {
            "filter": {
                "my_shingle_filter": {
                    "type": "shingle",
                    "min_shingle_size": 2,
                    "max_shingle_size": 2,   
                    "output_unigrams":  false
                }
            },
            "analyzer": {
                "my_shingle_analyzer": {
                    "type": "custom",
                    "tokenizer":"standard",
                    "filter": [
                        "lowercase",
                        "my_shingle_filter">

Elasticsearch中如何進行Match查詢

測試階段

1.match查詢

GET /my_index/_doc/_search
{
   "query": {
        "match": {
           "title": "the hungry alligator ate sue"
        }
   }
}

查詢結(jié)果：
{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 1.3721708,
    "hits" : [
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.3721708,#兩個文檔都包含 the 、 alligator 和 ate ，所以獲得相同的評分。
        "_source" : {
          "title" : "Sue ate the alligator"
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.3721708,#兩個文檔都包含 the 、 alligator 和 ate ，所以獲得相同的評分。
        "_source" : {
          "title" : "The alligator ate Sue"
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 0.21526179,#我們可以通過設(shè)置 minimum_should_match 參數(shù)排除文檔 3 ，參考 控制精度 。 
        "_source" : {
          "title" : "Sue never goes anywhere without her alligator skin purse"
        }
      }
    ]
  }
}

分析：
注意文檔 1 和 2 有相同的相關(guān)度評分因為他們包含了相同的單詞

2.match.shingles查詢

GET /my_index/_doc/_search
{
   "query": {
      "bool": {
         "must": {
            "match": {
               "title": "the hungry alligator ate sue"
            }
         },
         "should": {
            "match": {
               "title.shingles": "the hungry alligator ate sue"
            }
         }
      }
   }
}

查詢結(jié)果：
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 3.6694741,
    "hits" : [
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 3.6694741,
        "_source" : {
          "title" : "The alligator ate Sue"
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.3721708,
        "_source" : {
          "title" : "Sue ate the alligator"
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 0.21526179,
        "_source" : {
          "title" : "Sue never goes anywhere without her alligator skin purse"
        }
      }
    ]
  }
}

分析：
仍然匹配到了所有的 3 個文檔， 但是文檔 2 現(xiàn)在排到了第一名因為它匹配了 shingled 詞項 ate sue.

關(guān)于Elasticsearch中如何進行Match查詢問題的解答就分享到這里了，希望以上內(nèi)容可以對大家有一定的幫助，如果你還有很多疑惑沒有解開，可以關(guān)注億速云行業(yè)資訊頻道了解更多相關(guān)知識。

向AI問一下細節(jié)

Elasticsearch中如何進行Match查詢

數(shù)據(jù)準備階段

測試階段

1.match查詢

2.match.shingles查詢

猜你喜歡

最新資訊

相關(guān)推薦

相關(guān)標簽