I have a single broker test Kafka instance that was running fine on Friday
(basically out of the box configuration with 2 partitions), now I come back
on Monday and producers are unable to send messages.
What else can i look at to debug, and prevent?
I know how to recover by removing data directories for kafka and zookeeper
to start fresh. But, this isn't the first time this has happened, so I
would like to understand it better to feel more comfortable with kafka.
===================
Producer error (from console produce)
===================
[2014-08-11 19:32:49,781] WARN Error while fetching metadata
[{TopicMetadata for topic mytopic ->
No partition metadata for topic mytopic due to
kafka.common.LeaderNotAvailableException}] for topic [mytopic]: class
kafka.common.LeaderNotAvailableException
(kafka.producer.BrokerPartitionInfo)
[2014-08-11 19:32:49,782] ERROR Failed to collate messages by topic,
partition due to: Failed to fetch topic metadata for topic: mytopic
(kafka.producer.async.DefaultEventHandler)
===============
state-change.log
===============
[2014-08-11 19:12:45,312] TRACE Controller 0 epoch 3 started leader
election for partition [mytopic,0] (state.change.logger)
[2014-08-11 19:12:45,321] ERROR Controller 0 epoch 3 initiated state change
for partition [mytopic,0] from OfflinePartition to OnlinePartition failed
(state.change.logger)
kafka.common.NoReplicaOnlineException: No replica for partition [mytopic,0]
is alive. Live brokers are: [Set()], Assigned replicas are: [List(0)]
at
kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:61)
[2014-08-11 19:12:45,312] TRACE Controller 0 epoch 3 started leader
election for partition [mytopic,1] (state.change.logger)
[2014-08-11 19:12:45,321] ERROR Controller 0 epoch 3 initiated state change
for partition [mytopic,1] from OfflinePartition to OnlinePartition failed
(state.change.logger)
kafka.common.NoReplicaOnlineException: No replica for partition [mytopic,1]
is alive. Live brokers are: [Set()], Assigned replicas are: [List(0)]
at
kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:61)
===============
controller.log
===============
[2014-08-11 19:12:45,308] DEBUG [OfflinePartitionLeaderSelector]: No broker
in ISR is alive for [mytopic,1]. Pick the leader from the alive assigned
replicas: (kafka.controller.OfflinePartitionLeaderSelector)
[2014-08-11 19:12:45,321] DEBUG [OfflinePartitionLeaderSelector]: No broker
in ISR is alive for [mytopic,0]. Pick the leader from the alive assigned
replicas: (kafka.controller.OfflinePartitionLeaderSelector)
(basically out of the box configuration with 2 partitions), now I come back
on Monday and producers are unable to send messages.
What else can i look at to debug, and prevent?
I know how to recover by removing data directories for kafka and zookeeper
to start fresh. But, this isn't the first time this has happened, so I
would like to understand it better to feel more comfortable with kafka.
===================
Producer error (from console produce)
===================
[2014-08-11 19:32:49,781] WARN Error while fetching metadata
[{TopicMetadata for topic mytopic ->
No partition metadata for topic mytopic due to
kafka.common.LeaderNotAvailableException}] for topic [mytopic]: class
kafka.common.LeaderNotAvailableException
(kafka.producer.BrokerPartitionInfo)
[2014-08-11 19:32:49,782] ERROR Failed to collate messages by topic,
partition due to: Failed to fetch topic metadata for topic: mytopic
(kafka.producer.async.DefaultEventHandler)
===============
state-change.log
===============
[2014-08-11 19:12:45,312] TRACE Controller 0 epoch 3 started leader
election for partition [mytopic,0] (state.change.logger)
[2014-08-11 19:12:45,321] ERROR Controller 0 epoch 3 initiated state change
for partition [mytopic,0] from OfflinePartition to OnlinePartition failed
(state.change.logger)
kafka.common.NoReplicaOnlineException: No replica for partition [mytopic,0]
is alive. Live brokers are: [Set()], Assigned replicas are: [List(0)]
at
kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:61)
[2014-08-11 19:12:45,312] TRACE Controller 0 epoch 3 started leader
election for partition [mytopic,1] (state.change.logger)
[2014-08-11 19:12:45,321] ERROR Controller 0 epoch 3 initiated state change
for partition [mytopic,1] from OfflinePartition to OnlinePartition failed
(state.change.logger)
kafka.common.NoReplicaOnlineException: No replica for partition [mytopic,1]
is alive. Live brokers are: [Set()], Assigned replicas are: [List(0)]
at
kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:61)
===============
controller.log
===============
[2014-08-11 19:12:45,308] DEBUG [OfflinePartitionLeaderSelector]: No broker
in ISR is alive for [mytopic,1]. Pick the leader from the alive assigned
replicas: (kafka.controller.OfflinePartitionLeaderSelector)
[2014-08-11 19:12:45,321] DEBUG [OfflinePartitionLeaderSelector]: No broker
in ISR is alive for [mytopic,0]. Pick the leader from the alive assigned
replicas: (kafka.controller.OfflinePartitionLeaderSelector)