What does percolator mean/do in elasticsearch?

Even though I read the documentation for Elasticsearch to understand what a percolator is. I still have difficulty understanding what it means and where it is used in simple terms. Can anyone provide me with more details?

18839 次浏览

What you usually do is index documents and get them back by querying. What the percolator allows you to do in a nutshell is index your queries and percolate documents against the indexed queries to know which queries they match. It's also called reversed search, as what you do is the opposite to what you are used to.

There are different usecases for the percolator, the first one being any platform that stores users interests in order to send the right content to the right users as soon as it comes in.

For instance a user subscribes to a specific topic, and as soon as a new article for that topic comes in, a notification will be sent to the interested users. You can express the users interests as an elasticsearch query, using the query DSL, and you can register it in elasticsearch as it was a document. Every time a new article is issued, without needing to index it, you can percolate it to know which users are interested in it. At this point in time you know who needs to receive a notification containing the article link (sending the notification is not done by elasticsearch though). An additional step would also be to index the content itself but that is not required.

Have a look at this presentation to see other couple of usecases and other features available in combination with the percolator starting from elasticsearch 1.0.

In Simple terms percolator does this:

User: Hey Percolator! How can you help me?

Percolator: Hai User! I can help you to get the alerts of your interests.

User: That's great! What should I do next?

Percolator: Please let me know your interests in the form of queries indexed in Elasticsearch.

User: I've prepared all my interests as queries and indexed them into Elasticsearch. Is it that simple?

Percolator: Yes! It is that simple! I'll watch all incoming documents and get back to you with documents if they matches with any of your interests(queries)!

User: That's awesome! I'm just curious and want to know that how can you figure out which documents match with my interests.

Percolator: That's a good question! Answer for that is very simple! You had indexed your interests as queries into Elasticsearch right? I use them and run all those(not exactly all but for simplicity let's assume all) queries against incoming documents(these docs need not to be indexed and could be just sent for percolation!). In fact this process is called percolation! If any document matches with any of your queries then I'll send that result to the client(It could be you also)!

Under the hood, a percolate query will take what you want to percolate (e.g. that news article that you want to alert on) and Elasticsearch will create a tiny in-memory index with that document.

You'd have a bunch of registered queries (e.g. one for each user's preferences). Initially, Elasticsearch will pre-filter queries that are likely to match, then run those likely ones. Much like Luwak used to do (now Lucene Monitor).

The rule of thumb, for the alerting use-case at least, is:

  • have lots of incoming documents and few queries (e.g. alert on logs)? Simply run queries at a scheduled interval
  • have fewer documents and lots of queries? Then percolate these documents

I've also seen people using percolator to tag documents, but implementing something custom in the indexing pipeline to do that sounds more logical.