Saturday, 7 January 2017

Apache Kafka installation in redhat 7

In this tutorial, we will see how you can install kafka  in Red Hat Enterprise Linux Server release 7.3 (Maipo) and bring up a single zookeeper and broker instance along with the producer and consumer.

Note: To get information about your rehat os use the command cat /etc/redhat-release

Prerequisites for basic kafka installation

1) Linux operation system
2) Java 8 JDK 
3) Scala 2.11.x (Since apache kafka was mostly written in java you need its runtime)

To install java follow the instruction in the below site:
http://tecadmin.net/install-java-8-on-centos-rhel-and-fedora/#
To install Scala follow the instruction in the below site:
http://backtobazics.com/scala/4-steps-to-setup-scala-on-centos/
Telnet installation

You might need telnet to check the status of the zookeeper in future. To install it follow the instructions from the below site
https://www.unixmen.com/installing-telnet-centosrhelscientific-linux-6-7/
Wget installation

You will need yum in future, so install it with the following command
sudo yum install wget
Kafka installation

wget http://apache.cs.utah.edu/kafka/0.10.0.1/kafka_2.11-0.10.0.1.tgz
tar -xvf kafka_2.11-0.10.0.1.tgz

To run zookeeper
bin/zookeeper-server-start.sh config/zookeeper.properties 
To check the status of zookeeper
telnet localhost 2181
stat
The stat command will give the following status
Received: 2606
Sent: 2608
Connections: 6
Outstanding: 0
Zxid: 0x37
Mode: standalone
Node count: 33
From the stat you can see that we are running in standalone mode, that says only a single instance running for testing and development purposes.

With zookeeper started now we can run a single kafka broker 

To run Kafka server
bin/kafka-server-start.sh config/server.properties 
When starting the kafka server if you get the following error, resolve it with the following instructions:

Errors
Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00000000c0000000, 1073741824, 0) failed; error='Cannot allocate memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 1073741824 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /home/ec2-user/kafka_2.11-0.10.0.1/hs_err_pid19313.log
To resolve the error

Type the below command and then start the broker
export KAFKA_HEAP_OPTS="-Xmx2048m -Xms256m"
References to know about the error in detail:
http://stackoverflow.com/questions/34966739/kafka-failed-to-map-1073741824-bytes-for-committing-reserved-memory

http://stackoverflow.com/questions/27681511/how-do-i-set-the-java-options-for-kafka

To create a topic
bin/kafka-topics.sh --create --topic my_topic --zookeeper localhost:2181  --replication-factor 1 --partitions 1
where replication-factor 1 means, the specific topic will be available in only one broker. For fault tolerance its a good practice to replicate the data across multiple broker.

you have to specify the zookeeper instance here because there could be multiple zookeeper instance each managing their own independent clusters. By specifying the zookeeper server here you are specifically saying, I want the topic to be created for this specific zookeeper managed cluster.

Remember its the zookeeper component thats responsible for assigning a broker leader to be responsible for the topic

When the topic is created a couple of interesting things happened behind the scenes. 

1) Zookeper scanned its registery of brokers and made a decision to assign a leader for the specific topic 

2) On the broker there is a log directory and in there /tmp/kafka-logs there will be 2 files index and log files: 00000000000000000000.index  00000000000000000000.log

That keeps the record of the messages (commit log)

To list the topics
bin/kafka-topics.sh --list --zookeeper localhost:2181
To describe the topic
kafka_2.11-0.10.0.1]$ bin/kafka-topics.sh --describe --topic my_topic --zookeeper localhost:2181 
output:  
Topic:my_topic PartitionCount:1 ReplicationFactor:1 Configs:
Topic: my_topic Partition: 0 Leader: 0 Replicas: 0 Isr: 0

To start the producer 
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic my_topic
Then type the message and press enter

To start the consumer
bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic my_topic --from-beginning
cd /tmp/kafka-logs/my_topic-0/cat *.log
Here you will see the messages are persisted in the logs

Finally, to see the number of brokers registered to a zookeeper use the command

Command to list the number of brokers registered to zookeeper

bin/zookeeper-shell.sh localhost:2181 <<< "ls /brokers/ids"
To test the performance of the producer 

bin/kafka-producer-perf-test.sh --topic my_topic --num-records 50 --record-size 1 --throughput 10 --producer-props bootstrap.servers=localhost:9092 key.serializer=org.apache.kafka.common.serialization.StringSerializer value.serializer=org.apache.kafka.common.serialization.StringSerializer

Here we are producing 50 records with 1 byte and a throughput of 10 per seconds. so it take 5 seconds to send all the messages.

Output:

50 records sent, 10.052272 records/sec (0.00 MB/sec), 8.46 ms avg latency, 270.00 ms max latency, 2 ms 50th, 15 ms 95th, 270 ms 99th, 270 ms 99.9th.


To alter a topic
bin/kafka-topics.sh --zookeeper localhost:2181 --alter --topic my_topic --partition 2
To view the offset topic
bin/kafka-topics.sh --zookeeper localhost:2181 --describe --topic __consumer_offsets

by default it has 50 partitions



TODO

check the status of zookeeper script

1 comment: