筛选数组中包含任何给定值的项

我有一套文件,比如

{
tags:['a','b','c']
// ... a bunch properties
}

正如标题中所述: 是否有一种方法可以使用 Nest 过滤包含任何给定标记的所有文档?

例如,上面的记录将匹配[‘ c’,‘ d’]

或者我应该手动构建多个“ OR”?

202668 次浏览

Edit: The bitset stuff below is maybe an interesting read, but the answer itself is a bit dated. Some of this functionality is changing around in 2.x. Also Slawek points out in another answer that the terms query is an easy way to DRY up the search in this case. Refactored at the end for current best practices. —nz

You'll probably want a Bool Query (or more likely Filter alongside another query), with a should clause.

The bool query has three main properties: must, should, and must_not. Each of these accepts another query, or array of queries. The clause names are fairly self-explanatory; in your case, the should clause may specify a list filters, a match against any one of which will return the document you're looking for.

From the docs:

In a boolean query with no must clauses, one or more should clauses must match a document. The minimum number of should clauses to match can be set using the minimum_should_match parameter.

Here's an example of what that Bool query might look like in isolation:

{
"bool": {
"should": [
{ "term": { "tag": "c" }},
{ "term": { "tag": "d" }}
]
}
}

And here's another example of that Bool query as a filter within a more general-purpose Filtered Query:

{
"filtered": {
"query": {
"match": { "title": "hello world" }
},
"filter": {
"bool": {
"should": [
{ "term": { "tag": "c" }},
{ "term": { "tag": "d" }}
]
}
}
}
}

Whether you use Bool as a query (e.g., to influence the score of matches), or as a filter (e.g., to reduce the hits that are then being scored or post-filtered) is subjective, depending on your requirements.

It is generally preferable to use Bool in favor of an Or Filter, unless you have a reason to use And/Or/Not (such reasons do exist). The Elasticsearch blog has more information about the different implementations of each, and good examples of when you might prefer Bool over And/Or/Not, and vice-versa.

Elasticsearch blog: All About Elasticsearch Filter Bitsets

Update with a refactored query...

Now, with all of that out of the way, the terms query is a DRYer version of all of the above. It does the right thing with respect to the type of query under the hood, it behaves the same as the bool + should using the minimum_should_match options, and overall is a bit more terse.

Here's that last query refactored a bit:

{
"filtered": {
"query": {
"match": { "title": "hello world" }
},
"filter": {
"terms": {
"tag": [ "c", "d" ],
"minimum_should_match": 1
}
}
}
}

elasticsearch 2.0.1:

There's also terms query which should save you some work. Here example from docs:

{
"terms" : {
"tags" : [ "blue", "pill" ],
"minimum_should_match" : 1
}
}

Under hood it constructs boolean should. So it's basically the same thing as above but shorter.

There's also a corresponding terms filter.

So to summarize your query could look like this:

{
"filtered": {
"query": {
"match": { "title": "hello world" }
},
"filter": {
"terms": {
"tags": ["c", "d"]
}
}
}
}

With greater number of tags this could make quite a difference in length.

Whilst this an old question, I ran into this problem myself recently and some of the answers here are now deprecated (as the comments point out). So for the benefit of others who may have stumbled here:

A term query can be used to find the exact term specified in the reverse index:

{
"query": {
"term" : { "tags" : "a" }
}

From the documenation https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html

Alternatively you can use a terms query, which will match all documents with any of the items specified in the given array:

{
"query": {
"terms" : { "tags" : ["a", "c"]}
}

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-terms-query.html

One gotcha to be aware of (which caught me out) - how you define the document also makes a difference. If the field you're searching in has been indexed as a text type then Elasticsearch will perform a full text search (i.e using an analyzed string).

If you've indexed the field as a keyword then a keyword search using a 'non-analyzed' string is performed. This can have a massive practical impact as Analyzed strings are pre-processed (lowercased, punctuation dropped etc.) See (https://www.elastic.co/guide/en/elasticsearch/guide/master/term-vs-full-text.html)

To avoid these issues, the string field has split into two new types: text, which should be used for full-text search, and keyword, which should be used for keyword search. (https://www.elastic.co/blog/strings-are-dead-long-live-strings)

You should use Terms Query

{
"query" : {
"terms" : {
"tags" : ["c", "d"]
}
}
}

For those looking at this in 2020, you may notice that accepted answer is deprecated in 2020, but there is a similar approach available using terms_set and minimum_should_match_script combination.

Please see the detailed answer here in the SO thread