I have been testing kafka for the past week or so and figured I would share
my results so far.
I am not sure if the formatting will keep in email but here are the results
in a google doc...all 1,100 of them
https://docs.google.com/spreadsheets/d/1UL-o2MiV0gHZtL4jFWNyqRTQl41LFdM0upjRIwCWNgQ/edit?usp=sharing
One thing I found is there appears to be a bottleneck in
kafka-producer-perf-test.sh
The servers I used for testing have 12 7.2K drives and 16 cores. I was NOT
unable to scale the broker past 350MBsec when adding drives even though I
was able to get 150MBsec from a single drive. I wanted to determine the
source of the low utilization.
I tired changing the following
· log.flush.interval.messages on the broker
· log.flush.interval.ms flush on the broker
· num.io.threads on the broker
· thread settings on the producer
· producer message sizes
· producer batch sizes
· different number of topics (which impact the number of drives)
None of the above had any impact. The last thing I tried was running
multiple producers which had a very noticeable impact. As previously
mentioned I had already tested the thread setting of the producer and found
it to scale when increasing the thread count from 1,2,4 and 8. After that
it plateaued so I had been using 8 threads for each test. To show the
impact on number of producers I created 12 topics with partition counts
from 1 to 12. I used a single broker with no replication and configured
the producer(s) to send 10 million 2200 byte messages in batches of 400
with no ack.
Running with three producers has almost double the throughput that one
producer will have.
Other Key points learned so far
· Ensure you are using correct network interface. ( use
advertised.host.name if the servers have multiple interfaces)
· Use batching on the producer – With a single broker sending 2200
byte messages in batches of 200 resulted in 283MBsec vs. a batch size of 1
was 44MBsec
· The message size, the configuration of request.required.acks and
the number of replicas (only when ack is set to all) had the most influence
on the overall throughput.
· The following table shows results of testing with messages sizes
of 200, 300, 1000 and 2200 bytes on a three node cluster. Each message
size was tested with the three available ack modes (NONE, LEADER and ALL)
and with replication of two and three copies. Having three copies of data
is recommended, however both are included for reference.
*Replica=2*
*Replica=3*
*message.size*
*acks*
*MB.sec*
*nMsg.sec*
*MB.sec*
*nMsg.sec*
*Per Server MB.sec*
*Per Server nMsg.sec*
200
NONE
251
1,313,888
237
1,242,390
79
414,130
300
NONE
345
1,204,384
320
1,120,197
107
373,399
1000
NONE
522
546,896
515
540,541
172
180,180
2200
NONE
368
175,165
367
174,709
122
58,236
200
LEADER
115
604,376
141
739,754
47
246,585
300
LEADER
186
650,280
192
670,062
64
223,354
1000
LEADER
340
356,659
328
343,808
109
114,603
2200
LEADER
310
147,846
293
139,729
98
46,576
200
ALL
74
385,594
58
304,386
19
101,462
300
ALL
105
367,282
78
272,316
26
90,772
1000
ALL
203
212,400
124
130,305
41
43,435
2200
ALL
212
100,820
136
64,835
45
21,612
Some observations from the above table
· Increasing the number of replicas when request.required.acks is
none or leader only has limited impact on overall performance (additional
resources are required to replicate data but during tests this did not
impact producer throughput)
· Compression is not shown as it was found that the data generated
for the test is not realistic to a production workload. (GZIP compressed
data 300:1 which is unrealistic )
· For some reason a message size of 1000 bytes performed the best.
Need to look into this more.
Thanks
Bert
my results so far.
I am not sure if the formatting will keep in email but here are the results
in a google doc...all 1,100 of them
https://docs.google.com/spreadsheets/d/1UL-o2MiV0gHZtL4jFWNyqRTQl41LFdM0upjRIwCWNgQ/edit?usp=sharing
One thing I found is there appears to be a bottleneck in
kafka-producer-perf-test.sh
The servers I used for testing have 12 7.2K drives and 16 cores. I was NOT
unable to scale the broker past 350MBsec when adding drives even though I
was able to get 150MBsec from a single drive. I wanted to determine the
source of the low utilization.
I tired changing the following
· log.flush.interval.messages on the broker
· log.flush.interval.ms flush on the broker
· num.io.threads on the broker
· thread settings on the producer
· producer message sizes
· producer batch sizes
· different number of topics (which impact the number of drives)
None of the above had any impact. The last thing I tried was running
multiple producers which had a very noticeable impact. As previously
mentioned I had already tested the thread setting of the producer and found
it to scale when increasing the thread count from 1,2,4 and 8. After that
it plateaued so I had been using 8 threads for each test. To show the
impact on number of producers I created 12 topics with partition counts
from 1 to 12. I used a single broker with no replication and configured
the producer(s) to send 10 million 2200 byte messages in batches of 400
with no ack.
Running with three producers has almost double the throughput that one
producer will have.
Other Key points learned so far
· Ensure you are using correct network interface. ( use
advertised.host.name if the servers have multiple interfaces)
· Use batching on the producer – With a single broker sending 2200
byte messages in batches of 200 resulted in 283MBsec vs. a batch size of 1
was 44MBsec
· The message size, the configuration of request.required.acks and
the number of replicas (only when ack is set to all) had the most influence
on the overall throughput.
· The following table shows results of testing with messages sizes
of 200, 300, 1000 and 2200 bytes on a three node cluster. Each message
size was tested with the three available ack modes (NONE, LEADER and ALL)
and with replication of two and three copies. Having three copies of data
is recommended, however both are included for reference.
*Replica=2*
*Replica=3*
*message.size*
*acks*
*MB.sec*
*nMsg.sec*
*MB.sec*
*nMsg.sec*
*Per Server MB.sec*
*Per Server nMsg.sec*
200
NONE
251
1,313,888
237
1,242,390
79
414,130
300
NONE
345
1,204,384
320
1,120,197
107
373,399
1000
NONE
522
546,896
515
540,541
172
180,180
2200
NONE
368
175,165
367
174,709
122
58,236
200
LEADER
115
604,376
141
739,754
47
246,585
300
LEADER
186
650,280
192
670,062
64
223,354
1000
LEADER
340
356,659
328
343,808
109
114,603
2200
LEADER
310
147,846
293
139,729
98
46,576
200
ALL
74
385,594
58
304,386
19
101,462
300
ALL
105
367,282
78
272,316
26
90,772
1000
ALL
203
212,400
124
130,305
41
43,435
2200
ALL
212
100,820
136
64,835
45
21,612
Some observations from the above table
· Increasing the number of replicas when request.required.acks is
none or leader only has limited impact on overall performance (additional
resources are required to replicate data but during tests this did not
impact producer throughput)
· Compression is not shown as it was found that the data generated
for the test is not realistic to a production workload. (GZIP compressed
data 300:1 which is unrealistic )
· For some reason a message size of 1000 bytes performed the best.
Need to look into this more.
Thanks
Bert