卡夫卡还是 SNS 还是别的什么?

如果这是一个新手的问题,我很抱歉。但我想知道我该用什么。据我所知,卡夫卡是:

ApacheKafka 是一个分布式发布-订阅消息传递系统。

SNS 也是一个发布/订阅系统。

我的目标是使用 AWS 上的一些队列消息传递系统和将分布在少数服务器上的应用程序(顺便说一下,主要语言是 Python)。因为它在亚马逊上,我的第一个想法是使用 SNS 和 SQS。但后来我看到很多人在 AWS 上使用卡夫卡。一个人比另一个人有什么优势?

68645 次浏览

The use-cases for Kafka and Amazon SQS/Amazon SNS are quite different.

Kafka, as you wrote, is a distributed publish-subscribe system. It is designed for very high throughput, processing thousands of messages per second. Of course you need to setup and cluster it for yourself. It supports multiple readers, which may "catch up" with the stream of messages at any point (well, as long as the messages are still on disk). You can use it both as a queue (using consumer groups) and as a topic.

An important characteristic is that you cannot selectively acknowledge messages as "processed"; the only option is acknowledging all messages up to a certain offset.

SQS/SNS on the other hand:

  • no setup/no maintenance
  • either a queue (SQS) or a topic (SNS)
  • various limitations (on size, how long a message lives, etc)
  • limited throughput: you can do batch and concurrent requests, but still achieving high throughputs would be expensive
  • I'm not sure if the messages are replicated; however at-least-once guarantee delivery in SQS would suggest so
  • SNS has notifications for email, SMS, SQS, HTTP built-in. With Kafka, you would probably have to code it yourself
  • no "message stream" concept

So overall I would say SQS/SNS are well suited for simpler tasks and workloads with a lower volume of messages.

This is a classic trade-off:

AWS tools (SQS, SNS)

These will be easier for you to setup, and integrate with the rest of your architecture, especially if most of it is already running on AWS. It will also probably be cheaper at first, since they have a good pay as you go model, but the cost will not scale as well, so you have to think about that.

Apache Kafka

Here, you're using a highly popular (not trendy) distributed (this is important if you think you will scale a lot) PUB/SUB model. Nowadays, this model seems to be much preferred, since running analytics on the data going through the pipes is very common, and usually with an SOA architecture you can have a multitude of small services consuming the messages and doing their thing, without having the data be removed from the queue. You also get a lot of configuration options, so depending on your use case you can fine tune it to your needs. This means more work, but a more optimized service down the road.

Summary

This is a classic trade-off of speed of development and ease of development vs the best, very modular and personalized solution, that has more overhead for the first implementation but scales better.

Personal Advice

If you are prototyping something, favor speed of development, so AWS tools. If your requirements are frozen and require significant scale, definitely take the time to use kafka. I also am a big believer in using-open-source-makes-the-world-better, but that's not the biggest argument to use.

points mentioned above are really helpful in addition to above

  1. Its super difficult to multi-tenant SQS/SNS perhaps there is now way until creating separate queue for each tenant (very hard to maintain)
  2. Kafka is clusterable, cluster connected to apps and db’s in real time and provide key / value access of data. Retention period for each message , distribution and replication are bigger advantage -- Where is SQS is more of a blackbox, sends a message and receiver, receives mark it processed and delete.