溫馨提示×

溫馨提示×

您好，登錄后才能下訂單哦！

密碼登錄×

忘記密碼？

登錄注冊×

獲取短信驗證碼

其他方式登錄

點擊登錄注冊即表示同意《億速云用戶服務(wù)條款》

用戶登錄×

賬戶密碼登錄

請使用微信掃描上方二維碼

使用幫助

請求超時！

請點擊重新獲取二維碼

如何探索Elasticsearch中的父子文檔

發(fā)布時間：2021-12-16 17:17:34 來源：億速云閱讀：498 作者：柒染欄目：大數(shù)據(jù)

如何探索Elasticsearch中的父子文檔，相信很多沒有經(jīng)驗的人對此束手無策，為此本文總結(jié)了問題出現(xiàn)的原因和解決方法，通過這篇文章希望你能解決這個問題。

官網(wǎng)地址：

2.x中文版本
7.9版本

簡介

父-子關(guān)系文檔在實質(zhì)上類似于 nested model ：允許將一個對象實體和另外一個對象實體關(guān)聯(lián)起來。而這兩種類型的主要區(qū)別是：在 nested objects 文檔中，所有對象都是在同一個文檔中，而在父-子關(guān)系文檔中，父對象和子對象都是完全獨(dú)立的文檔。

父-子關(guān)系的主要作用是允許把一個 type 的文檔和另外一個 type 的文檔關(guān)聯(lián)起來，構(gòu)成一對多的關(guān)系：一個父文檔可以對應(yīng)多個子文檔。與 nested objects 相比，父-子關(guān)系的主要優(yōu)勢有：

更新父文檔時，不會重新索引子文檔。
創(chuàng)建，修改或刪除子文檔時，不會影響父文檔或其他子文檔。這一點在這種場景下尤其有用：子文檔數(shù)量較多，并且子文檔創(chuàng)建和修改的頻率高時。
子文檔可以作為搜索結(jié)果獨(dú)立返回。

Elasticsearch 維護(hù)了一個父文檔和子文檔的映射關(guān)系，得益于這個映射，父-子文檔關(guān)聯(lián)查詢操作非?？?。但是這個映射也對父-子文檔關(guān)系有個限制條件：父文檔和其所有子文檔，都必須要存儲在同一個分片中。

父-子文檔ID映射存儲在 Doc Values 中。當(dāng)映射完全在內(nèi)存中時， Doc Values 提供對映射的快速處理能力，另一方面當(dāng)映射非常大時，可以通過溢出到磁盤提供足夠的擴(kuò)展能力

Has child query

因為has_child執(zhí)行聯(lián)接，所以它與其他查詢相比速度較慢。隨著指向唯一父文檔的匹配子文檔數(shù)量的增加，其性能會下降。搜索中的每個has_child查詢都會大大增加查詢時間。
如果您關(guān)心查詢性能，請不要使用此查詢。如果需要使用has_child查詢，請盡可能少使用。

要使用has_child查詢，您的索引必須包含一個聯(lián)接字段映射。 例如：
PUT /my-index-000001
{
  "mappings": {
    "properties": {
      "my-join-field": {
        "type": "join",
        "relations": {
          "parent": "child"
        }
      }
    }
  }
}

GET /_search
{
  "query": {
    "has_child": {
      "type": "child",
      "query": {
        "match_all": {}
      },
      "max_children": 10,
      "min_children": 2,
      "score_mode": "min"
    }
  }
}

type:（必需，字符串）為聯(lián)接字段映射的子關(guān)系的名稱。
query：（必需的查詢對象）要在type字段的子文檔上運(yùn)行的查詢。如果子文檔與搜索匹配，則查詢返回父文檔。
ignore_unmapped：（可選，布爾值）指示是否忽略未映射的類型并且不返回任何文檔而不是返回錯誤。默認(rèn)為false。

如果為false，則在未映射類型的情況下，Elasticsearch返回錯誤。您可以使用此參數(shù)查詢可能不包含該類型的多個索引。

max_children：（可選，整數(shù)）與返回的父文檔允許的查詢相匹配的子文檔的最大數(shù)量。如果父文檔超出此限制，則將其從搜索結(jié)果中排除。
min_children：（可選，整數(shù)）與查詢相匹配的子文檔的最小數(shù)量，該查詢與為返回的父文檔的查詢匹配所需。如果父文檔不符合此限制，則將其從搜索結(jié)果中排除。
score_mode：（可選，字符串）指示匹配子文檔的分?jǐn)?shù)如何影響根父文檔的相關(guān)性分?jǐn)?shù)。有效值為：

none (Defaul不要使用匹配的子文檔的相關(guān)性分?jǐn)?shù)。該查詢將父文檔分配為0分。
avg：使用所有匹配的子文檔的平均相關(guān)性得分。
max：使用所有匹配的子文檔中的最高相關(guān)性得分。
min：使用所有匹配的子文檔中最低的相關(guān)性得分。
sum：將所有匹配的子文檔的相關(guān)性得分相加。

1. Sorting

您不能使用標(biāo)準(zhǔn)排序選項對has_child查詢的結(jié)果進(jìn)行排序。如果需要按子文檔中的字段對返回的文檔進(jìn)行排序，請使用function_score查詢并按_score進(jìn)行排序。例如，以下查詢按其子文檔的click_count字段對返回的文檔進(jìn)行排序。

GET /_search
{
  "query": {
    "has_child": {
      "type": "child",
      "query": {
        "function_score": {
          "script_score": {
            "script": "_score * doc['click_count'].value"
          }
        }
      },
      "score_mode": "max"
    }
  }
}

Has parent query

返回其子級父文檔與提供的查詢匹配的子文檔。您可以使用聯(lián)接字段映射在同一索引中的文檔之間創(chuàng)建父子關(guān)系。

因為執(zhí)行連接，所以has_parent查詢比其他查詢慢。隨著匹配父文檔數(shù)量的增加，其性能會下降。搜索中的每個has_parent查詢都會大大增加查詢時間。

要使用has_parent查詢，您的索引必須包含一個聯(lián)接字段映射。 例如：
PUT /my-index-000001
{
  "mappings": {
    "properties": {
      "my-join-field": {
        "type": "join",
        "relations": {
          "parent": "child"
        }
      },
      "tag": {
        "type": "keyword"
      }
    }
  }
}

GET /my-index-000001/_search
{
  "query": {
    "has_parent": {
      "parent_type": "parent",
      "query": {
        "term": {
          "tag": {
            "value": "Elasticsearch"
          }
        }
      }
    }
  }
}

parent_type：（必需，字符串）為聯(lián)接字段映射的父級關(guān)系的名稱。
query：（必需的查詢對象）要在parent_type字段的父文檔上運(yùn)行的查詢。如果父文檔與搜索匹配，則查詢返回其子文檔。
score：（可選，布爾值）指示是否將匹配的父文檔的相關(guān)性分?jǐn)?shù)匯總到其子文檔中。默認(rèn)為false。

如果為false，Elasticsearch將忽略父文檔的相關(guān)性得分。 Elasticsearch還會為每個子文檔分配一個關(guān)聯(lián)分?jǐn)?shù)，該關(guān)聯(lián)分?jǐn)?shù)等于查詢的提升值，默認(rèn)為1。
如果為true，則將匹配的父文檔的相關(guān)性分?jǐn)?shù)匯總到其子文檔的相關(guān)性分?jǐn)?shù)中。

ignore_unmapped：（可選，布爾值）指示是否忽略未映射的parent_type而不返回任何文檔而不是錯誤。默認(rèn)為false。

如果為false，則在未映射parent_type的情況下，Elasticsearch返回錯誤。
您可以使用此參數(shù)查詢可能不包含parent_type的多個索引。

1. Sorting

您不能使用標(biāo)準(zhǔn)排序選項對has_parent查詢的結(jié)果進(jìn)行排序。

如果需要按返回文檔的父文檔中的字段對它們進(jìn)行排序，請使用function_score查詢并按_score進(jìn)行排序。例如，以下查詢按其父文檔的view_count字段對返回的文檔進(jìn)行排序。

GET /_search
{
  "query": {
    "has_parent": {
      "parent_type": "parent",
      "score": true,
      "query": {
        "function_score": {
          "script_score": {
            "script": "_score * doc['view_count'].value"
          }
        }
      }
    }
  }
}

Parent ID query

返回加入特定父文檔的子文檔。您可以使用聯(lián)接字段映射在同一索引中的文檔之間創(chuàng)建父子關(guān)系。

要使用parent_id查詢，您的索引必須包含一個聯(lián)接字段映射。若要查看如何為parent_id查詢設(shè)置索引，請嘗試以下示例。

創(chuàng)建具有聯(lián)接字段映射的索引。
PUT /my-index-000001
{
  "mappings": {
    "properties": {
      "my-join-field": {
        "type": "join",
        "relations": {
          "my-parent": "my-child"
        }
      }
    }
  }
}


索引ID為1的父文檔。
PUT /my-index-000001/_doc/1?refresh
{
  "text": "This is a parent document.",
  "my-join-field": "my-parent"
}


索引父文檔的子文檔。
PUT /my-index-000001/_doc/2?routing=1&refresh
{
  "text": "This is a child document.",
  "my_join_field": {
    "name": "my-child",
    "parent": "1"
  }
}

以下搜索返回ID為1的父文檔的子文檔。
GET /my-index-000001/_search
{
  "query": {
      "parent_id": {
          "type": "my-child",
          "id": "1"
      }
  }
}

type：（必需，字符串）為聯(lián)接字段映射的子關(guān)系的名稱。
id：（必需，字符串）父文檔的ID。查詢將返回此父文檔的子文檔。
ignore_unmapped：（可選，布爾值）指示是否忽略未映射的類型并且不返回任何文檔而不是返回錯誤。默認(rèn)為false。

如果為false，則在未映射類型的情況下，Elasticsearch返回錯誤。
您可以使用此參數(shù)查詢可能不包含該類型的多個索引。

實例分享

跟低版本的”_parent”的方式不一樣，說明Es在后期高版本做了語法上的修改

父子文檔在理解上來說，可以理解為一個關(guān)聯(lián)查詢，有些類似MySQL中的JOIN查詢，通過某個字段關(guān)系來關(guān)聯(lián)。父子文檔與嵌套文檔主要的區(qū)別在于，父子文檔的父對象和子對象都是獨(dú)立的文檔，而嵌套文檔中都在同一個文檔中存儲，如下圖所示：

1. 構(gòu)建父-子索引

新建Setting：

PUT /test_doctor
{
  "settings": {
    "number_of_shards": 1,
    "analysis": {
      "analyzer": {
        "index_ansj_analyzer": {
          "type": "custom",
          "tokenizer": "index_ansj",
          "filter": [
            "my_synonym",
            "asciifolding"
          ]
        },
        "comma": {
          "type": "pattern",
          "pattern": ","
        },
        "shingle_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "shingle_filter"
          ]
        }
      },
      "filter": {
        "my_synonym": {
          "type": "synonym",
          "synonyms_path": "analysis/synonym.txt"
        },
        "shingle_filter": {
          "type": "shingle",
          "min_shingle_size": 2,
          "max_shingle_size": 2,
          "output_unigrams": false
        }
      }
    }
  }
} 

新建Mapping：

PUT /test_doctor/_mapping/_doc
{
  "_doc": {
    "properties": {
      "date": {
        "type": "date"
      },
      "name": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword"
          }
        }
      },
      "comment": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword"
          }
        }
      },
      "age": {
        "type": "long"
      },
      "body": {
        "type": "text",
        "analyzer":"index_ansj_analyzer"
        "fields": {
          "keyword": {
            "type": "keyword"
          }
        }
      },
      "title": {
        "type": "text",
        "analyzer":"index_ansj_analyzer",
        "fields": {
          "keyword": {
            "type": "keyword"
          }
        }
      },
      "relation": {  # 這個relation相當(dāng)于一個普通的字段名
        "type": "join",
        "relations": { # 該relations部分定義了文檔內(nèi)的一組可能的關(guān)系，每個關(guān)系是父名和子名
          "question": "answer"
        }
      }
    }
  }
}

這段代碼建立了一個test_doctor的索引，其中relation是一個用于join的字段，type為join，關(guān)系relations為：父為question, 子為answer。
至于建立一父多子關(guān)系，只需要改為數(shù)組即可："question": ["answer", "comment"]
備注：question和answer是自定義的一種關(guān)系

2. 插入數(shù)據(jù)

插入父文檔數(shù)據(jù)，需要指定上文索引結(jié)構(gòu)中的relation為question
PUT test_doctor/_doc/1
{
    "title":"這是一篇文章",
    "body":"這是一篇文章，從哪里說起呢？ ... ...",
    "relation":"question"  # 這個relation是一個普通的字段，value值為question表示為父文檔
}
PUT test_doctor/_doc/2
{
    "title":"這是一篇小說",
    "body":"這是一篇小說，從哪里說起呢？ ... ...",
    "relation":"question"  # 這個relation是一個普通的字段，value值為question表示為父文檔
}

注意也可以寫成這樣"relation":{"name":"question"}


插入子文檔，需要在請求地址上使用routing參數(shù)指定是誰的子文檔，并且指定索引結(jié)構(gòu)中的relation關(guān)系
PUT test_doctor/_doc/3?routing=1
{
    "name":"張三",
    "comment":"寫的不錯",
    "age":28,
    "date":"2020-05-04",
    "relation":{  # 這個relation是一個普通的字段，value值為answer表示為子文檔
        "name":"answer",
        "parent":1
    }
}
PUT test_doctor/_doc/4?routing=1
{
    "name":"李四",
    "comment":"寫的很好",
    "age":20,
    "date":"2020-05-04",
    "relation":{  # 這個relation是一個普通的字段，value值為answer表示為子文檔
        "name":"answer",
        "parent":1
    }
}
PUT test_doctor/_doc/5?routing=2
{
    "name":"王五",
    "comment":"這是一篇非常棒的小說",
    "age":31,
    "date":"2020-05-01",
    "relation":{  # 這個relation是一個普通的字段，value值為answer表示為子文檔
        "name":"answer",
        "parent":2
    }
}
PUT test_doctor/_doc/6?routing=2
{
    "name":"小六",
    "comment":"這是一篇非常棒的小說",
    "age":31,
    "date":"2020-05-01",
    "relation":{  # 這個relation是一個普通的字段，value值為answer表示為子文檔
        "name":"answer",
        "parent":2
    }
}

父文檔：
Map drugMap = Maps.newHashMap();
drugMap.put("id", "2"); // 
drugMap.put("title", "這是一篇小說"); // 
drugMap.put("body", "這是一篇小說，從哪里說起呢？ ... ...");
drugMap.put("relation", "question");// 固定寫法

子文檔：
Map maps = Maps.newHashMap();
maps.put("name", "answer");  // 固定寫法
maps.put("parent", "2");   // 這里的1是指的父文檔所綁定的id

Map doctorTeamMap = Maps.newHashMap();
doctorTeamMap.put("id", "6");  
doctorTeamMap.put("name", "小六"); 
doctorTeamMap.put("comment", "這是一篇非常棒的小說"); 
doctorTeamMap.put("age", "31");  
doctorTeamMap.put("date", "2020-05-01");
doctorTeamMap.put("relation", maps);    // 固定寫法

Java代碼實現(xiàn)：
/**
 * 使用BulkProcessor批量更新數(shù)據(jù)
 * @param indexName 索引名稱
 * @param jsonString    索引的document數(shù)據(jù)
 */
public boolean addIndexBulk(String indexName, Map<String, Object> jsonString, String id) {

    IndexRequest request = new IndexRequest(indexName, "_doc", id);
    request.source(jsonString, XContentType.JSON);

    dataBulkProcessor.add(request);

    return true;
}

/**
 * 添加路由
 * @param indexName
 * @param jsonString
 * @param id
 * @param routing
 * @return
 */
public boolean addIndexBulk(String indexName, Map<String, Object> jsonString, String id, String routing) {

    IndexRequest request = new IndexRequest(indexName, "_doc", id);
    request.source(jsonString, XContentType.JSON);
    request.routing(routing);

    dataBulkProcessor.add(request);

    return true;
}

3. 查詢數(shù)據(jù)

關(guān)系字段查詢

es會自動生成一個額外的用于表示關(guān)系的字段：field#question
我們可以通過以下方式查詢：
POST test_doctor/_search
{
 "script_fields": {
    "parent": {
      "script": {
         "source": "doc['relation#question']" 
      }
    }
  }
}
響應(yīng)結(jié)果：
{
  "took" : 124,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 7,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "test_doctor",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "fields" : {
          "parent" : [
            "1"
          ]
        }
      },
      {
        "_index" : "test_doctor",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_routing" : "1",
        "fields" : {
          "parent" : [
            "1"
          ]
        }
      },
      {
        "_index" : "test_doctor",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 1.0,
        "_routing" : "1",
        "fields" : {
          "parent" : [
            "1"
          ]
        }
      },
      {
        "_index" : "test_doctor",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 1.0,
        "_routing" : "1",
        "fields" : {
          "parent" : [
            "1"
          ]
        }
      },
      {
        "_index" : "test_doctor",
        "_type" : "_doc",
        "_id" : "5",
        "_score" : 1.0,
        "fields" : {
          "parent" : [
            "5"
          ]
        }
      },
      {
        "_index" : "test_doctor",
        "_type" : "_doc",
        "_id" : "6",
        "_score" : 1.0,
        "_routing" : "5",
        "fields" : {
          "parent" : [
            "5"
          ]
        }
      },
      {
        "_index" : "test_doctor",
        "_type" : "_doc",
        "_id" : "7",
        "_score" : 1.0,
        "_routing" : "1",
        "fields" : {
          "parent" : [
            "1"
          ]
        }
      }
    ]
  }
}
有_routing字段的說明是子文檔，它的parent字段是父文檔id，如果沒有_routing就是父文檔，它的parent指向當(dāng)前id

通過parent_id查詢子文檔

通過parent_id query傳入父文檔id即可
POST test_doctor/_search
{
  "query": {
    "parent_id": { 
      "type": "answer",
      "id": "5"
    }
  }
}


Java API：

//子文檔名
String child_type = "answer";
//父文檔ID
String id = "5";
//ParentId查詢
ParentIdQueryBuilder parentIdQueryBuilder = new ParentIdQueryBuilder(child_type, id);
builder.query(parentIdQueryBuilder);
builder.from(0);
builder.size(10);


通過ID和routing ，訪問子文檔(不加routing查不到)
GetRequest getRequest = new GetRequest(indexName, child_type);		
//必須指定路由（父ID）
getRequest.routing(id);

通過子文檔查詢-has_child

使用has_child來根據(jù)子文檔內(nèi)容查詢父文檔，其實type就是創(chuàng)建文檔時，子文檔的標(biāo)識。

查詢包含特定子文檔的父文檔，這是一種很耗性能的查詢，盡量少用。它的查詢標(biāo)準(zhǔn)格式如下
POST test_doctor/_search
{
  "query": {
    "has_child": {
      "type": "answer",
      "query": {
        "match": {
          "name": "張三"
        }
      },
      "inner_hits": {} # 同時返回父子數(shù)據(jù)
    }
  }
}

POST test_doctor/_search
{
    "query": {
        "has_child" : {
            "type" : "answer",
            "query" : {
                "match_all" : {}
            },
            "max_children": 10, //可選，符合查詢條件的子文檔最大返回數(shù)
            "min_children": 2, //可選，符合查詢條件的子文檔最小返回數(shù)
            "score_mode" : "min"
        }
    }
}

如果也想根據(jù)父文檔的字段進(jìn)行過濾，采用后置過濾器的方法
POST test_doctor/_search
{
  "query": {
    "has_child": {
      "type": "answer",
      "query": {
        "match": {
          "name": "張三"
        }
      },
      "inner_hits": {}
    }
  },
  "post_filter": {
    "bool": {
      "must": [
        {
          "term": {
            "title": {
              "value": "文章",
              "boost": 1
            }
          }
        }
      ]
    }
  }
}

Java API：
// 子文檔查詢條件
QueryBuilder matchQuery = QueryBuilders.termQuery("name", "張三");
// 是否計算評分
ScoreMode scoreMode = ScoreMode.Total;
HasChildQueryBuilder childQueryBuilder = new HasChildQueryBuilder("answer", matchQuery, scoreMode);
childQueryBuilder.innerHit(new InnerHitBuilder());
builder.query(childQueryBuilder);
builder.postFilter(boolQueryBuilder);

通過父文檔查詢-has_parent

根據(jù)父文檔查詢子文檔 has_parent。

{
  "query": {
    "has_parent": {
      "parent_type":"question",
      "query": {
        "match": {
          "title": "這是一篇文章"
        }
      }
    }
  }
}

// 是否計算評分
score = true;
HasParentQueryBuilder hasParentQueryBuilder = new HasParentQueryBuilder("question", boolQueryBuilder, score);
builder.query(hasParentQueryBuilder);
builder.postFilter(QueryBuilders.termQuery("indextype", "answer")); // 子文檔的過濾條件

看完上述內(nèi)容，你們掌握如何探索Elasticsearch中的父子文檔的方法了嗎？如果還想學(xué)到更多技能或想了解更多相關(guān)內(nèi)容，歡迎關(guān)注億速云行業(yè)資訊頻道，感謝各位的閱讀！

向AI問一下細(xì)節(jié)

推薦閱讀：

免責(zé)聲明：本站發(fā)布的內(nèi)容（圖片、視頻和文字）以原創(chuàng)、轉(zhuǎn)載和分享為主，文章觀點不代表本網(wǎng)站立場，如果涉及侵權(quán)請聯(lián)系站長郵箱：is@yisu.com進(jìn)行舉報，并提供相關(guān)證據(jù)，一經(jīng)查實，將立刻刪除涉嫌侵權(quán)內(nèi)容。

上一篇新聞：
springboot如何集成spring cache
下一篇新聞：
怎么解析Python中的Dict

猜你喜歡

AI
助
手

產(chǎn)品服務(wù)

地區(qū)劃分

專題活動

幫助支持

關(guān)于我們

售后咨詢

7*24小時在線電話：400-100-2938

7*24小時在線 QQ：800811969

關(guān)注億速云

億速云公眾號

手機(jī)網(wǎng)站二維碼