何时使用 Apachekafka 而不是 ActiveMQ

小开

Kafka and ActiveMQ may have some overlaps but they were originally designed for different purposes. So comparing them is just like comparing an Apple and an Orange.

Kafka

Kafka is a distributed streaming platform with very good horizontal scaling capability. It allows applications to process and re-process streamed data on disk. Due to it's high throughput it's commonly used for real-time data streaming.

ActiveMQ

ActiveMQ is a general-purpose message broker that supports several messaging protocols such as AMQP, STOMP, MQTT. It supports more complicated message routing patterns as well as the Enterprise Integration Patterns. In general it is mainly used for integration between applications/services especially in a Service Oriented Architecture.

小开

I think one thing that should be noted in a discussion about which brokers to use (and when Kafka comes up) is that the Kafka benchmark that is frequently referenced shows the upper limit of any modern distributed computer. Today's brokers all have about the same total capacity in MB/s. Kafka does extremely well with small messages (10-1024 bytes) when compared to other brokers, but still limits out at around the ~75 Mb/s mark (per-broker).

There is frequently an apples-to-oranges comparison esp when talking "clustering". ActiveMQ and other enterprise brokers cluster the publishing of messages and the tracking of consumer subscriptions. Kafka clusters the publishing and requires the consumer to track subscription. Seems minimal, but its a significant difference.

All brokers have the same back pressure issues-- Kafka can do a "LAZY PERSISTENCE" where the producer isn't waiting around for the broker to sync to disk.. this is good for a lot of use cases, but probably not the I-care-about-every-single-message scenario ppatierno mentions in his slide show.

Kafka really good for horizontal scaling for things like big data processing of small messages. ActiveMQ is more ideal for the class of use case frequently referred to as enterprise messaging (this is just a term, doesn't mean Kafka isn't good for the enterprise)-- transacted data (although Kafka is adding this).. kiosk.. retail store.. store and forward.. dmz traversal.. data center-to-data center publishing.. etc

小开

I hear this question every week... While ActiveMQ (like IBM MQ or JMS in general) is used for traditional messaging, Apache Kafka is used as streaming platform (messaging + distributed storage + processing of data). Both are built for different use cases.

You can use Kafka for "traditional messaging", but not use MQ for Kafka-specific scenarios.

The article “Apache Kafka vs. Enterprise Service Bus (ESB)—Friends, Enemies, or Frenemies? (https://www.confluent.io/blog/apache-kafka-vs-enterprise-service-bus-esb-friends-enemies-or-frenemies/)” discusses why Kafka is not competitive but complementary to integration and messaging solutions (including ActiveMQ) and how to integrate both.

小开

Kafka Architecture is different to ActiveMQ.

In Kafka, producer will publish messages to topic, which is a stream of messages of a particular type. Consumer will subscribe to one or more topics of brokers by pulling the data.

Key differences:

ActiveMQ Broker had to maintain the delivery state of every message resulting into lower throughput. Kafka producer doesn’t wait for acknowledgements from the broker unlike in ActiveMQ and sends messages as faster as the broker can handle. Overall throughput will be high if broker can handle the messages as fast as producer.
Kafka has a more efficient storage format. On average, each message had an overhead of 9 bytes in Kafka, versus 144 bytes in ActiveMQ.
In AcitveMQ, Producer send message to Broker and Broker push messages to all consumers. Producer has responsibility to ensure that message has been delivered. In Kafka, Consumer will pull messages from broker at its own time. It's the responsibility of consumer to consume the messages it has supposed to consume.
Slow Consumers in AMQ can cause problems on non-durable topics since they can force the broker to keep old messages in RAM which once it fills up, forces the broker to slow down producers, causing the fast consumers to be slowed down. A slow consumer in Kakfa does not impact other consumers.
In Kafka - A consumer can rewind back to an old offset and re-consume data. It is useful when you fix some issue and decide to re-play the old messages post issue resolution.
Performance of Queue and Topics may degrade with addition of more consumers in ActiveMQ. But Kafka does not have that dis-advantage with addition of more consumers.
Kafka is highly scalable due to replication of partitions. It can ensure that messages are delivered in a sequence with in a partition.
ActiveMQ is traditional messaging system where as Kakfa is meant for distributed processing system with huge amount of data and effective for stream processing

Due to above efficiencies, Kafka throughput is more than normal messaging systems like ActiveMQ and RabbitMQ.

More details can be read at notes.stephenholiday.com

EDIT: It's especially for the people, who thinks producer does not wait for confirmation of acknowledgement from broker can read ActiveMQ documentation page

The ProducerWindowSize is the maximum number of bytes of data that a producer will transmit to a broker before waiting for acknowledgment messages from the broker that it has accepted the previously sent messages.