Quantcast
Channel: Kafka Timeline
Viewing all articles
Browse latest Browse all 1519

spikes in producer requests/sec

$
0
0
We're seeing periodic spikes in req/sec rates across our nodes. Our
cluster is 10 nodes, and the topic has a replication factor of 3. We
push around 200k messages / sec into Kafka.

The machines are running the most recent version of Kafka and we're
connecting via librdkafka. pingstream02-10 are using the CMS garbage
collector, but I switched pingstream01 to use G1GC under the theory that
maybe these were GC pauses. The graph shows that likely didn't improve
the situation.

My next thought is that maybe this is the effect of log rolling.
Checking in the logs, I see a lot of this:

[2014-11-11 13:46:45,836] 72952071 [ReplicaFetcherThread-0-7] INFO
kafka.log.Log - Rolled new log segment for 'pings-342' in 3 ms.
[2014-11-11 13:46:47,116] 72953351 [kafka-request-handler-0] INFO
kafka.log.Log - Rolled new log segment for 'pings-186' in 2 ms.
[2014-11-11 13:46:48,155] 72954390 [ReplicaFetcherThread-0-8] INFO
kafka.log.Log - Rolled new log segment for 'pings-253' in 3 ms.
[2014-11-11 13:46:48,408] 72954643 [ReplicaFetcherThread-0-4] INFO
kafka.log.Log - Rolled new log segment for 'pings-209' in 3 ms.
[2014-11-11 13:46:48,436] 72954671 [ReplicaFetcherThread-0-4] INFO
kafka.log.Log - Rolled new log segment for 'pings-299' in 2 ms.
[2014-11-11 13:46:48,687] 72954922 [kafka-request-handler-0] INFO
kafka.log.Log - Rolled new log segment for 'pings-506' in 2 ms.

The "pings" topic in question has 512 partitions, so it does this 512
times every so often. We have an effective retention period of a bit
less than 30 min, so rolling happens pretty frequently. Still, if I
assume worst case that rolling locks up the process for 2ms and there
are 512 rolls every few minutes, I'd expect halting to happen for about
a second at a time. The graphs seem to indicate much longer dips, but
it's hard for me to know if I'm looking at real data or some sort of
artifact.

Fwiw, the producers are not reporting any errors, so it does not seem
like we're losing data.

I'm new to Kafka. Should I be worried? If so, how should I be debugging
this?

Thanks,
Wes

Viewing all articles
Browse latest Browse all 1519

Trending Articles