Hi,
I have a question about mirroring. I would like to create a highly
available Kafka service that runs on AWS and can survive an AZ failure.
Based on what I've read, I plan to create a Kafka cluster in each AZ and
use mirror maker to replicate one cluster to the other. I'll call the two
clusters in their respective availability zones A and B. A is the primary
which is replicated to B. Normally, all consumers consume from A and
record their current offset in a persistent store that is replicated across
A and B (like Dynamo). If I detect that A has failed producers and
consumers will fail over to B. That's the basic idea.
Now, the question: Can I rely on the offset that is being stored in the
persistent store to refer to the same event in each cluster? Or is it
possible for the two to get out of sync over time - I don't know why,
failures of some kind maybe - in which case the offset from A might not
really be valid with respect to the replica B. If that is possible, then
I'm wondering what I can/should do about it to achieve a clean failover.
I realize that the replication may lag behind, so some events from A make
be lost when there is a failover. That is okay.
I've been told that creating a single cluster that spans AZs and relying
on the new replication functionality in 0.8 is a bad idea, as zookeeper
isn't well behaved in that case. Hence my alternative design.
Thanks in advance.
Seth
I have a question about mirroring. I would like to create a highly
available Kafka service that runs on AWS and can survive an AZ failure.
Based on what I've read, I plan to create a Kafka cluster in each AZ and
use mirror maker to replicate one cluster to the other. I'll call the two
clusters in their respective availability zones A and B. A is the primary
which is replicated to B. Normally, all consumers consume from A and
record their current offset in a persistent store that is replicated across
A and B (like Dynamo). If I detect that A has failed producers and
consumers will fail over to B. That's the basic idea.
Now, the question: Can I rely on the offset that is being stored in the
persistent store to refer to the same event in each cluster? Or is it
possible for the two to get out of sync over time - I don't know why,
failures of some kind maybe - in which case the offset from A might not
really be valid with respect to the replica B. If that is possible, then
I'm wondering what I can/should do about it to achieve a clean failover.
I realize that the replication may lag behind, so some events from A make
be lost when there is a failover. That is okay.
I've been told that creating a single cluster that spans AZs and relying
on the new replication functionality in 0.8 is a bad idea, as zookeeper
isn't well behaved in that case. Hence my alternative design.
Thanks in advance.
Seth