In this tutorial, we will see how you can install kafka in Red Hat Enterprise Linux Server release 7.3 (Maipo) and bring up a single zookeeper and broker instance along with the producer and consumer.
Note: To get information about your rehat os use the command cat /etc/redhat-release
Prerequisites for basic kafka installation
1) Linux operation system
2) Java 8 JDK
3) Scala 2.11.x (Since apache kafka was mostly written in java you need its runtime)
To install java follow the instruction in the below site:
You might need telnet to check the status of the zookeeper in future. To install it follow the instructions from the below site
You will need yum in future, so install it with the following command
wget http://apache.cs.utah.edu/kafka/0.10.0.1/kafka_2.11-0.10.0.1.tgz
tar -xvf kafka_2.11-0.10.0.1.tgz
To run zookeeper
With zookeeper started now we can run a single kafka broker
To run Kafka server
Errors
Type the below command and then start the broker
http://stackoverflow.com/questions/34966739/kafka-failed-to-map-1073741824-bytes-for-committing-reserved-memory
http://stackoverflow.com/questions/27681511/how-do-i-set-the-java-options-for-kafka
To create a topic
you have to specify the zookeeper instance here because there could be multiple zookeeper instance each managing their own independent clusters. By specifying the zookeeper server here you are specifically saying, I want the topic to be created for this specific zookeeper managed cluster.
Remember its the zookeeper component thats responsible for assigning a broker leader to be responsible for the topic
When the topic is created a couple of interesting things happened behind the scenes.
1) Zookeper scanned its registery of brokers and made a decision to assign a leader for the specific topic
2) On the broker there is a log directory and in there /tmp/kafka-logs there will be 2 files index and log files: 00000000000000000000.index 00000000000000000000.log
That keeps the record of the messages (commit log)
To list the topics
To start the producer
To start the consumer
Finally, to see the number of brokers registered to a zookeeper use the command
Command to list the number of brokers registered to zookeeper
bin/kafka-producer-perf-test.sh --topic my_topic --num-records 50 --record-size 1 --throughput 10 --producer-props bootstrap.servers=localhost:9092 key.serializer=org.apache.kafka.common.serialization.StringSerializer value.serializer=org.apache.kafka.common.serialization.StringSerializer
Here we are producing 50 records with 1 byte and a throughput of 10 per seconds. so it take 5 seconds to send all the messages.
Output:
50 records sent, 10.052272 records/sec (0.00 MB/sec), 8.46 ms avg latency, 270.00 ms max latency, 2 ms 50th, 15 ms 95th, 270 ms 99th, 270 ms 99.9th.
To alter a topic
by default it has 50 partitions
TODO
check the status of zookeeper script
Note: To get information about your rehat os use the command cat /etc/redhat-release
Prerequisites for basic kafka installation
1) Linux operation system
2) Java 8 JDK
3) Scala 2.11.x (Since apache kafka was mostly written in java you need its runtime)
To install java follow the instruction in the below site:
http://tecadmin.net/install-java-8-on-centos-rhel-and-fedora/#To install Scala follow the instruction in the below site:
http://backtobazics.com/scala/4-steps-to-setup-scala-on-centos/Telnet installation
You might need telnet to check the status of the zookeeper in future. To install it follow the instructions from the below site
https://www.unixmen.com/installing-telnet-centosrhelscientific-linux-6-7/Wget installation
You will need yum in future, so install it with the following command
sudo yum install wgetKafka installation
wget http://apache.cs.utah.edu/kafka/0.10.0.1/kafka_2.11-0.10.0.1.tgz
tar -xvf kafka_2.11-0.10.0.1.tgz
To run zookeeper
bin/zookeeper-server-start.sh config/zookeeper.propertiesTo check the status of zookeeper
telnet localhost 2181The stat command will give the following status
stat
Received: 2606
Sent: 2608
Connections: 6
Outstanding: 0
Zxid: 0x37
Mode: standalone
Node count: 33From the stat you can see that we are running in standalone mode, that says only a single instance running for testing and development purposes.
With zookeeper started now we can run a single kafka broker
To run Kafka server
bin/kafka-server-start.sh config/server.propertiesWhen starting the kafka server if you get the following error, resolve it with the following instructions:
Errors
Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00000000c0000000, 1073741824, 0) failed; error='Cannot allocate memory' (errno=12)To resolve the error
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 1073741824 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /home/ec2-user/kafka_2.11-0.10.0.1/hs_err_pid19313.log
Type the below command and then start the broker
export KAFKA_HEAP_OPTS="-Xmx2048m -Xms256m"References to know about the error in detail:
http://stackoverflow.com/questions/34966739/kafka-failed-to-map-1073741824-bytes-for-committing-reserved-memory
http://stackoverflow.com/questions/27681511/how-do-i-set-the-java-options-for-kafka
bin/kafka-topics.sh --create --topic my_topic --zookeeper localhost:2181 --replication-factor 1 --partitions 1where replication-factor 1 means, the specific topic will be available in only one broker. For fault tolerance its a good practice to replicate the data across multiple broker.
you have to specify the zookeeper instance here because there could be multiple zookeeper instance each managing their own independent clusters. By specifying the zookeeper server here you are specifically saying, I want the topic to be created for this specific zookeeper managed cluster.
Remember its the zookeeper component thats responsible for assigning a broker leader to be responsible for the topic
When the topic is created a couple of interesting things happened behind the scenes.
1) Zookeper scanned its registery of brokers and made a decision to assign a leader for the specific topic
2) On the broker there is a log directory and in there /tmp/kafka-logs there will be 2 files index and log files: 00000000000000000000.index 00000000000000000000.log
That keeps the record of the messages (commit log)
To list the topics
bin/kafka-topics.sh --list --zookeeper localhost:2181To describe the topic
kafka_2.11-0.10.0.1]$ bin/kafka-topics.sh --describe --topic my_topic --zookeeper localhost:2181output:
Topic:my_topic PartitionCount:1 ReplicationFactor:1 Configs:
Topic: my_topic Partition: 0 Leader: 0 Replicas: 0 Isr: 0
To start the producer
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic my_topicThen type the message and press enter
To start the consumer
bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic my_topic --from-beginning
cd /tmp/kafka-logs/my_topic-0/cat *.logHere you will see the messages are persisted in the logs
Finally, to see the number of brokers registered to a zookeeper use the command
Command to list the number of brokers registered to zookeeper
bin/zookeeper-shell.sh localhost:2181 <<< "ls /brokers/ids"To test the performance of the producer
bin/kafka-producer-perf-test.sh --topic my_topic --num-records 50 --record-size 1 --throughput 10 --producer-props bootstrap.servers=localhost:9092 key.serializer=org.apache.kafka.common.serialization.StringSerializer value.serializer=org.apache.kafka.common.serialization.StringSerializer
Here we are producing 50 records with 1 byte and a throughput of 10 per seconds. so it take 5 seconds to send all the messages.
Output:
50 records sent, 10.052272 records/sec (0.00 MB/sec), 8.46 ms avg latency, 270.00 ms max latency, 2 ms 50th, 15 ms 95th, 270 ms 99th, 270 ms 99.9th.
To alter a topic
bin/kafka-topics.sh --zookeeper localhost:2181 --alter --topic my_topic --partition 2To view the offset topic
bin/kafka-topics.sh --zookeeper localhost:2181 --describe --topic __consumer_offsets
by default it has 50 partitions
TODO
check the status of zookeeper script
best artical
ReplyDelete