Quantcast
Channel: Kafka Timeline
Viewing all articles
Browse latest Browse all 1519

Kafka 0.8.1.1 replication issues

$
0
0
Hi Kafka users!

I was just migrating a cluster of 3 brokers from one set of EC2 instances
to another, but ran into replication problems. The method of migration used
is that of stopping one broker and letting a new broker join with the same
broker.id. Replication started, but after ~4 of ~15 GB the process stopped
with the following errors getting logged every ~500ms.

On the new broker (the fetcher):

[2014-11-04 17:02:33,762] ERROR [ReplicaFetcherThread-0-1926078608], Error
in fetch Name: FetchRequest; Version: 0; CorrelationId: 1523; ClientId:
ReplicaFetcherThread-0-1926078608; ReplicaId: 544181083; MaxWait: 500 ms;
MinBytes: 1 bytes; RequestInfo: [qa.mx-error,302] ->
PartitionFetchInfo(0,10485760),[qa.xl-msg,46] ->
PartitionFetchInfo(101768,10485760),[qa.xl-error,202] ->
PartitionFetchInfo(0,10485760),[qa.mx-msg,177] ->
... total of 700+ partitions
-> PartitionFetchInfo(0,10485760) (kafka.server.ReplicaFetcherThread)
java.io.EOFException: Received -1 when reading from channel, socket has
likely been closed.
at kafka.utils.Utils$.read(Utils.scala:376)
at
kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
at kafka.network.Receive$class.readCompletely(Transmission.scala:56)
at
kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29)
at kafka.network.BlockingChannel.receive(BlockingChannel.scala:100)
at
kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:81)
at
kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(SimpleConsumer.scala:71)
at
kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SimpleConsumer.scala:109)
at
kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:109)
at
kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:109)
at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
at
kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply$mcV$sp(SimpleConsumer.scala:108)
at
kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:108)
at
kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:108)
at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
at kafka.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:107)
at
kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:96)
at
kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:88)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51)
[2014-11-04 17:02:33,765] WARN Reconnect due to socket error: null
(kafka.consumer.SimpleConsumer)

On one of the two old nodes (presumably the broker providing the data)

[2014-11-04 17:03:28,030] ERROR Closing socket for /10.145.135.246 because
of error (kafka.network.Processor)
kafka.common.KafkaException: This operation cannot be completed on a
complete request.
at kafka.network.Transmission$class.expectIncomplete(Transmission.scala:34)
at kafka.api.FetchResponseSend.expectIncomplete(FetchResponse.scala:191)
at kafka.api.FetchResponseSend.writeTo(FetchResponse.scala:214)
at kafka.network.Processor.write(SocketServer.scala:375)
at kafka.network.Processor.run(SocketServer.scala:247)
at java.lang.Thread.run(Thread.java:745)

It looks similar to this previous post, but the thread doesn't seem to have
a resolution to the problem.
http://thread.gmane.org/gmane.comp.apache.kafka.user/1153

There is also this one, but again no resolution.
http://thread.gmane.org/gmane.comp.apache.kafka.user/3804

Does anyone have any clues as to what might be going on here? And any
suggestions for solutions?

Thanks,
Christofer

Viewing all articles
Browse latest Browse all 1519

Trending Articles