Redis Vs RabbitMQ 作为一个介于 Logstash 和 elasticsearch 之间的数据代理/消息系统

我们正在定义一个架构来收集日志信息的 Logstash 托运人安装在各种机器和索引的数据在一个 elasticsearch 服务器集中使用 Kibana 作为图形层。我们需要在 Logstash 托运人和 elasticsearch 之间建立一个可靠的信息系统来保证交货。在选择 Redis 而非 RabbitMQ 作为 Logstash 托运人与 elasticsearch 之间的数据代理/信息系统时,应考虑哪些因素?

117222 次浏览

Quick questions to ask:

  1. why do you need a broker? If you're using logstash or logstash-forwarder to read files from these servers, they both will slow down if the pipeline gets congested.
  2. do you have any experience with administering rabbit or redis? All things being equal, the tool you know how to use is the better tool.

In the realm of opinions, I've run redis as a broker, and hated it. Of course, that could have been my inexperience with redis (not a problem with the product itself), but it was the weakest link in the pipeline and always failed when we needed it most.

I have been wondering the same thing. Earlier recommendations by the Logstash folks recommend Redis over RabbitMQ (http://logstash.net/docs/1.1.1/tutorials/getting-started-centralized), however that section of the notes no longer exists in the current documentation although there are generic notes on using a broker to deal with spikes here https://www.elastic.co/guide/en/logstash/current/deploying-and-scaling.html.

While I am also using RabbitMQ quite happily, I'm currently exploring a Redis broker, since the AMQP protocol is likely overkill for my logging use case.

After evaluating both Redis and RabbitMQ I chose RabbitMQ as our broker for the following reasons:

  1. RabbitMQ allows you to use a built in layer of security by using SSL certificates to encrypt the data that you are sending to the broker and it means that no one will sniff your data and have access to your vital organizational data.
  2. RabbitMQ is a very stable product that can handle large amounts of events per seconds and many connections without being the bottle neck.
  3. In our organization we already used RabbitMQ and had good internal knowledge about using it and an already prepared integration with chef.

Regarding scaling, RabbitMQ has a built in cluster implementation that you can use in addition to a load balancer in order to implement a redundant broker environment.

Is my RabbitMQ cluster Active Active or Active Passive?

Now to the weaker point of using RabbitMQ:

  1. most Logstash shippers do not support RabbitMQ but on the other hand, the best one, named Beaver, has an implementation that will send data to RabbitMQ without a problem.
  2. The implementation that Beaver has with RabbitMQ in its current version is a little slow on performance (for my purposes) and was not able to handle the rate of 3000 events/sec from one server and from time to time the service crashed.
  3. Right now I am working on a fix that will solve the performance problem for RabbitMQ and make the Beaver shipper more stable. The first solution is to add more processes that can run simultaneously and will give the shipper more power. The second solution is to change Beaver to send data to RabbitMQ asynchronously which theoretically should be much faster. I hope that I’ll finish implementing both solutions by the end of this week.

You can follow the issue here: https://github.com/josegonzalez/python-beaver/issues/323

And check the pull request here: https://github.com/josegonzalez/python-beaver/pull/324

If you have more questions feel free to leave a comment.

Redis is created as a key value data store despite having some basic message broker capabilities.

RabbitMQ is created as a message broker. It has lots of message broker capabilities naturally.

I have been doing some research on this topic. If performance is important and persistence is not, RabbitMQ is a perfect choice. Redis is a technology developed with a different intent.

Following is a list of pros for using RabbitMQ over Redis:

  • RabbitMQ uses Advanced Message Queuing Protocol (AMQP) which can be configured to use SSL, additional layer of security.
  • RabbitMQ takes approximately 75% of the time Redis takes in accepting messages.
  • RabbitMQ supports priorities for messages, which can be used by workers to consume high priority messages first.
  • There is no chance of loosing the message if any worker crashes after consuming the message, which is not the case with Redis.
  • RabbitMQ has a good routing system to direct messages to different queues.

A few cons for using RabbitMQ:

  • RabbitMQ might be a little hard to maintain, hard to debug crashes.
  • node-name or node-ip fluctuations can cause data loss, but if managed well, durable messages can solve the problem.

If you specifically want to send logs from Logstash to Elasticsearch, you might want to use Filebeat instead of either Redis or RabbitMQ. Personally, I use fluent-bit to collect logs to send to Elasticsearch.

However, the other answers on this page have a lot of out-of-date information regarding Redis's capabilities. Redis has supported:

But there are some limitations:

  • Redis is still not as focused as RabbitMQ when it comes to message durability and crash recovery.
  • Redis pub/sub is not as scalable as RabbitMQ. Redis pub/sub messages were not sharded by Redis cluster nodes (until relatively recently). Redis Streams are a newer, more scalable API.

The main advantage of RabbitMQ is that it gives out-of-the-box services.