Hello,
According to the Kafka FAQ "How do I choose the number of partitions for a
topic", clusters with more than 10K partitions are not tested. I am looking
for advice on how to scale the number of partitions beyond that. My use
case is to publish messages to 1 million users, each with an unique user
id. Users are not always connected but a user must receive published
messages in order.
What is the best way to divide topics and partitions for this use case? Do
I need 1 million partitions? The FAQ seems to think so, i.e. "if we were
storing notifications for users we would encourage a design with a single
notifications topic partitioned by user id". But the FAQ implies strongly
that 1 million partitions may wreak havoc on zookeeper because they will
lead to X million znodes that have to be stored in memory. Any suggestions?
Thanks,
mission
According to the Kafka FAQ "How do I choose the number of partitions for a
topic", clusters with more than 10K partitions are not tested. I am looking
for advice on how to scale the number of partitions beyond that. My use
case is to publish messages to 1 million users, each with an unique user
id. Users are not always connected but a user must receive published
messages in order.
What is the best way to divide topics and partitions for this use case? Do
I need 1 million partitions? The FAQ seems to think so, i.e. "if we were
storing notifications for users we would encourage a design with a single
notifications topic partitioned by user id". But the FAQ implies strongly
that 1 million partitions may wreak havoc on zookeeper because they will
lead to X million znodes that have to be stored in memory. Any suggestions?
Thanks,
mission