1 kafka提供了信息的生产者和消费者的无缝集成,同时不会阻塞信息的生产者,且生产者无需知道最终消费者是什么。

   Kafka provides seamless integration between information of producers and consumers without blocking the producers of the information,and without letting producers know who the final consumers are.


2 kafka性能

存储高达TB级数据提供常数级时间消耗的表现。

• Persistent messaging: To derive the real value from big data, any kind of information loss cannot be afforded.               Apache Kafka is designed with O(1) disk structures that provide constant-time performance even with very 

   large  volumes of stored messages, which is in order of TB.

    高吞吐率:每秒百万条消息

• High throughput: Keeping big data in mind, Kafka is designed to work on commodity hardware and to support               millions of messages per second.

    分布式:kafka支持服务端的分区,消费者以集群的形式进行消费,同时保证每个分区内的数据有序。

• Distributed: Apache Kafka explicitly supports messages partitioning over  Kafka servers and distributing                     consumption over a cluster of consumer    machines while maintaining per-partition ordering semantics.
• Multiple client support: Apache Kafka system supports easy integration of  clients from different platforms such as      Java, .NET, PHP, Ruby, and Python.
• Real time: Messages produced by the producer threads should be immediately  visible to consumer threads; this      feature is critical to event-based systems such  as Complex Event Processing (CEP) systems.


3   默认情况下,生产者的消息发送请求会被阻塞,直至消息被发送到所有的活跃副本。当然,生产者也可以被配置成将消息仅发送给单个服务端(broker)。

  By default, the producer's message send request is blocked until the message is committed to all active replicas; however, producers can also be configured to commit messages to a single broker


4 与kafka的生产者一样,消费者的出栈模型被修改(0.8版本以后)成一个长的"pull(拉)"模型。假若提交的消息在producer端部可用时消费者会被阻塞。

   Like Kafka producers, Kafka consumers' polling model changes to a long pulling model and gets blocked until a committed message is available from the producer, which avoids frequent pulling.


5 kafka单节点多broker配置

  

kafka has no active member kafka has no active members_zookeeper

   每个Broker都需要有一个独立的配置文件,三个Broker,分别配置为server-1.properties,server-2.properties,server-3.properties。三个配置文件的以下参数必须不同:

               • brokerid
               • port
               • log.dir

例如,server-1.properties

               • brokerid=1
               • port=9092
               • log.dir=/tmp/kafka8-logs/broker1

 接下来,我们可以在不同终端启动每个broker。命令如下:

    

[root@localhost kafka-0.8]# env JMX_PORT=9999 bin/kafka-server-start.sh
config/server-1.properties
[root@localhost kafka-0.8]# env JMX_PORT=10000 bin/kafka-server-start.sh
config/server-2.properties

    

创建topics

   此时仅需要指明zookeeper集群,副本数,分区数,topic名即可 

[root@localhost kafka-0.8]# bin/kafka-create-topic.sh --zookeeper
localhost:2181 --replica 2 --partition 2 --topic othertopic

 

启动生产者producer发送消息

 如果想用单一producer与所有的broker联系,需要传入初始的broker list参数


[root@localhost kafka-0.8]# bin/kafka-console-producer.sh --broker-list
localhost:9092,localhost:9093 --topic othertopic

  如果想启动多个producer,并且与不同的broker组合连接,可以为不同的producer加入不同的broker list参数。与上面的启动Broker一样。


6  kafka消费者组名必须全局唯一,任何新启用的Cunsumer(消费者)如果使用了已经在用的消费者组名,将会导致系统的混乱。因为当新启用的消费者线程启动时,kafka triggers将会在新线程和已有的消费者线程之间重新均衡。经过这样的均衡过程,原本发往新线程的数据可能会被发往已有的线程,这将导致不可预料结果。解决方案是:启用新线程之前先关闭所有使用当前组名的线程。

The consumer group name is unique and global across the Kafka cluster and any new consumers with an in-use consumer group name may cause ambiguous behavior in the system. When a new process is started with the existing consumer group name, Kafka triggers rebalance between the new and existing process threads for the consumer group. Post rebalance, some of the messages that are intended for a new process may go to an old process, causing unexpected results. To avoid this
ambiguous behavior, any existing consumers should be shut down before starting new consumers for an existing consumer group name.