使用 Elasticsearch 计算文档数量

如果一个人想要计算一个索引(Elasticsearch)中的文档数量,那么(至少?)有两种可能:

  • 直接 count

    POST my _ index/_ count

    应在 my_index中返回文件的数量。

  • 使用 search

    这里可以使用 count作为 search_type或其他类型。在这两种情况下,都可以从字段 ['hits']['total']中提取总计数

我的问题是:

  • 不同的方法有什么区别? 哪一种 我应该更喜欢吗

  • 我提出这个问题是因为我正在经历不同的结果 取决于选择的方法。我现在正在调试过程中 这个问题,这个问题弹出来了

165854 次浏览

Probably _count is a bit faster since it doesn't have to execute a full query with ranking and result fetching and can simply return the size.

It would be interesting to know a bit more about how you manage to get different results though. For that I need more information like what exact queries you are sending and if any indexing is going on on the index.

But suppose that you do the following

  1. index some documents
  2. refresh the index

_search and _count (with a match all query) should return the same total. If not, that'd be very weird.

The two queries provide the same result but: - count consumes less resources/bandwidth because doesn't require to fetch documents, scoring and other internal optimizations. Set the search size to 0, could be very similar.

If you want count all the record in an index, you can also execute an aggregation terms on "_type" field.

The results should be the same. Before comparing the results, be sure to execute an index refresh.

curl http://localhost:9200/_cat/indices?v provides you the count and other information in a tabular format

health status index                              uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   logstash-2019.10.09-000001         IS7HBUgRRzO7Rn1puBFUIQ   1   1          0            0       283b           283b
green  open   .kibana_task_manager_1             e4zZcF9wSQGFHB_lzTszrg   1   0          2            0     12.5kb         12.5kb
yellow open   metricbeat-7.4.0-2019.10.09-000001 h_CWzZHcRsakxgyC36-HTg   1   1       6118            0      2.2mb          2.2mb
green  open   .apm-agent-configuration           J6wkUr2CQAC5kF8-eX30jw   1   0          0            0       283b           283b
green  open   .kibana_2                          W2ZETPygS8a83-Xcd6t44Q   1   0       1836           23      1.1mb          1.1mb
green  open   .kibana_1                          IrBlKqO0Swa6_HnVRYEwkQ   1   0          8            0    208.8kb        208.8kb
yellow open   filebeat-7.4.0-2019.10.09-000001   xSd2JdwVR1C9Ahz2SQV9NA   1   1          0            0       283b           283b
green  open   .tasks                             0ZzzrOq0RguMhyIbYH_JKw   1   0          1            0      6.3kb          6.3kb

Old question, chipping in because on ElasticSearch version > 7.0 :

  1. _search: returns the documents with the hit count for the search query, less than or equal to the result window size, which is typically 10,000. e.g.:

    {"took":3,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":10000,"relation":"gte"},"max_score": 0.34027478,"hits":[...]}}

  2. _count: returns the total number of hits for the search query irrespective of the result window size. no documents returned, e.g.:

    {"count":5703899,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0}}

So, _search might return the total hits as 10,000 if that is your configured result window size, while _count would return the actual count for the same query.

If _search must be used instead of _count, and you're on Elasticsearch 7.0+, setting size: 0 and track_total_hits: true will provide the same info as _count

GET my-index/_search
{
"query": { "term": { "field": { "value": "xyz" } } },
"size": 0,
"track_total_hits": true
}




{
"took" : 612,
"timed_out" : false,
"_shards" : {
"total" : 629,
"successful" : 629,
"skipped" : 524,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 29349466,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}

See Elasticsearch 7.0 Breaking changes

If you want to check index by index, you can use the following query

GET _all/_search
{
"size": 0,
"aggs": {
"NAME": {
"terms": {
"field": "_index",
"size": 100000
}
}
}
}

The result will be the following screenshot. enter image description here