Hi,
I've noticed an interesting behaviour which I hope someone can fully
explain.
I have 3 Kafka Node cluster with a setting of log.retention.hours=168 (7
days) and log.segment.bytes=536870912.
I recently restarted one of the nodes and it's uptime is now 3 days behind
than the other 2.
After about 7 days I noticed that the other 2 nodes cleared out an equal
amount of stale logs/data,
but the restarted node didn't cleared out the same amount. The restarted
node only cleared out a
similar amount 3 days later. Generally now the restarted node seems to be 3
days (free space wise) behind
the other 2.
I noticed that certain partition log and index files on the restarted node
are different to the other 2.
Below is an example (Node C is the restarted one)
Node A - Topic Z Partition 12
[ 488 Sep 26 11:47] 00000000000000046460.index
[ 1781829 Sep 24 12:09] 00000000000000046460.log
[ 10485760 Sep 28 22:23] 00000000000000046522.index
[ 1536693 Sep 28 22:23] 00000000000000046522.log
Node B - Topic Z Partition 12
[ 488 Sep 26 11:47] 00000000000000046460.index
[ 1781829 Sep 24 12:09] 00000000000000046460.log
[ 10485760 Sep 28 22:23] 00000000000000046522.index
[ 1536693 Sep 28 22:23] 00000000000000046522.log
Node C - Topic Z Partition 12
[ 10485760 Sep 28 22:23] 00000000000000046485.index
[ 2277311 Sep 28 22:23] 00000000000000046485.log
I can see that Node C's base offset (log prefix) is in between the offsets
of the other logs on Node A B, and that
suggests to me some partition 12 messages are on Node A B but not on Node
C?
I was hoping someone could help me figure out what's happening.
Thanks
Dayo
I've noticed an interesting behaviour which I hope someone can fully
explain.
I have 3 Kafka Node cluster with a setting of log.retention.hours=168 (7
days) and log.segment.bytes=536870912.
I recently restarted one of the nodes and it's uptime is now 3 days behind
than the other 2.
After about 7 days I noticed that the other 2 nodes cleared out an equal
amount of stale logs/data,
but the restarted node didn't cleared out the same amount. The restarted
node only cleared out a
similar amount 3 days later. Generally now the restarted node seems to be 3
days (free space wise) behind
the other 2.
I noticed that certain partition log and index files on the restarted node
are different to the other 2.
Below is an example (Node C is the restarted one)
Node A - Topic Z Partition 12
[ 488 Sep 26 11:47] 00000000000000046460.index
[ 1781829 Sep 24 12:09] 00000000000000046460.log
[ 10485760 Sep 28 22:23] 00000000000000046522.index
[ 1536693 Sep 28 22:23] 00000000000000046522.log
Node B - Topic Z Partition 12
[ 488 Sep 26 11:47] 00000000000000046460.index
[ 1781829 Sep 24 12:09] 00000000000000046460.log
[ 10485760 Sep 28 22:23] 00000000000000046522.index
[ 1536693 Sep 28 22:23] 00000000000000046522.log
Node C - Topic Z Partition 12
[ 10485760 Sep 28 22:23] 00000000000000046485.index
[ 2277311 Sep 28 22:23] 00000000000000046485.log
I can see that Node C's base offset (log prefix) is in between the offsets
of the other logs on Node A B, and that
suggests to me some partition 12 messages are on Node A B but not on Node
C?
I was hoping someone could help me figure out what's happening.
Thanks
Dayo