Hi All,
I have a situation where one producer/consumer is causing timeout errors on
the Kafka broker. The exception in the logs looks like this:
[2013-12-16 17:32:25,992] ERROR Closing socket for /10.236.67.30 because of
error (kafka.network.Processor)
java.io.IOException: Connection timed out
at sun.nio.ch.FileDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
at sun.nio.ch.IOUtil.read(IOUtil.java:175)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
at kafka.utils.Utils$.read(Utils.scala:395)
at
kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
at kafka.network.Processor.read(SocketServer.scala:347)
at kafka.network.Processor.run(SocketServer.scala:245)
at java.lang.Thread.run(Thread.java:662)
When this happens, _another separate_ producer hangs for about 2-5 seconds.
In the producer log I get this exception:
[2013-12-16 16:32:14,961] INFO Disconnecting from
qa-hermes004.phx.qa.com:9092 (kafka.producer.SyncProducer)
[2013-12-16 16:32:14,982] WARN Failed to send producer request with
correlation id 3290 to broker 1 with data for partitions [ktr3,4]
(kafka.producer.async.DefaultEventHandler)
java.io.IOException: An established connection was aborted by the software
in your host machine.
at sun.nio.ch.SocketDispatcher.writev0(Native Method)
at sun.nio.ch.SocketDispatcher.writev(SocketDispatcher.java:49)
at sun.nio.ch.IOUtil.write(IOUtil.java:171)
at sun.nio.ch.SocketChannelImpl.write0(SocketChannelImpl.java:377)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:400)
at java.nio.channels.SocketChannel.write(SocketChannel.java:371)
at
kafka.network.BoundedByteBufferSend.writeTo(BoundedByteBufferSend.scala:56)
at kafka.network.Send$class.writeCompletely(Transmission.scala:75)
at
kafka.network.BoundedByteBufferSend.writeCompletely(BoundedByteBufferSend.scala:26)
at kafka.network.BlockingChannel.send(BlockingChannel.scala:92)
at kafka.producer.SyncProducer.liftedTree1$1(SyncProducer.scala:72)
at
kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:71)
at
kafka.producer.SyncProducer$$anonfun$send$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SyncProducer.scala:102)
at
kafka.producer.SyncProducer$$anonfun$send$1$$anonfun$apply$mcV$sp$1.apply(SyncProducer.scala:102)
at
kafka.producer.SyncProducer$$anonfun$send$1$$anonfun$apply$mcV$sp$1.apply(SyncProducer.scala:102)
at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
at
kafka.producer.SyncProducer$$anonfun$send$1.apply$mcV$sp(SyncProducer.scala:101)
at
kafka.producer.SyncProducer$$anonfun$send$1.apply(SyncProducer.scala:101)
at
kafka.producer.SyncProducer$$anonfun$send$1.apply(SyncProducer.scala:101)
at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
at kafka.producer.SyncProducer.send(SyncProducer.scala:100)
at
kafka.producer.async.DefaultEventHandler.kafka$producer$async$DefaultEventHandler$$send(DefaultEventHandler.scala:245)
at
kafka.producer.async.DefaultEventHandler$$anonfun$dispatchSerializedData$1.apply(DefaultEventHandler.scala:107)
at
kafka.producer.async.DefaultEventHandler$$anonfun$dispatchSerializedData$1.apply(DefaultEventHandler.scala:101)
at
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:80)
at
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:80)
at scala.collection.Iterator$class.foreach(Iterator.scala:631)
at
scala.collection.mutable.HashTable$$anon$1.foreach(HashTable.scala:161)
at
scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:194)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:80)
at
kafka.producer.async.DefaultEventHandler.dispatchSerializedData(DefaultEventHandler.scala:97)
at
kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:73)
at kafka.producer.Producer.send(Producer.scala:74)
at kafka.javaapi.producer.Producer.send(Producer.scala:32)
at producer.MessageProducer.send(MessageProducer.java:33)
at producer.MessageProducer.main(MessageProducer.java:103)
[2013-12-16 16:32:14,987] INFO Back off for 100 ms before retrying send.
Remaining retries = 3 (kafka.producer.async.DefaultEventHandler)
[2013-12-16 16:32:15,088] INFO Fetching metadata from broker id:0,host:
qa-hermes003.phx.qa.com,port:9092 with correlation id 3291 for 1 topic(s)
Set(ktr3) (kafka.client.ClientUtils$)
[2013-12-16 16:32:15,134] INFO Connected to
qa-hermes003.phx.qa.com:9092for producing
(kafka.producer.SyncProducer)
[2013-12-16 16:32:15,185] INFO Disconnecting from
qa-hermes003.phx.qa.com:9092 (kafka.producer.SyncProducer)
[2013-12-16 16:32:15,185] INFO Disconnecting from
qa-hermes003.phx.qa.com:9092 (kafka.producer.SyncProducer)
[2013-12-16 16:32:15,232] INFO Connected to
qa-hermes004.phx.qa.com:9092for producing
(kafka.producer.SyncProducer)
[2013-12-16 16:32:15,279] INFO Connected to
qa-hermes003.phx.qa.com:9092for producing
(kafka.producer.SyncProducer)
It's almost as if the error from the other producer/consumer is causing the
broker itself to hang, causing my producer to abort the connection.
Has anyone else seen anything like this before? Any idea what's going on?
I have a situation where one producer/consumer is causing timeout errors on
the Kafka broker. The exception in the logs looks like this:
[2013-12-16 17:32:25,992] ERROR Closing socket for /10.236.67.30 because of
error (kafka.network.Processor)
java.io.IOException: Connection timed out
at sun.nio.ch.FileDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
at sun.nio.ch.IOUtil.read(IOUtil.java:175)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
at kafka.utils.Utils$.read(Utils.scala:395)
at
kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
at kafka.network.Processor.read(SocketServer.scala:347)
at kafka.network.Processor.run(SocketServer.scala:245)
at java.lang.Thread.run(Thread.java:662)
When this happens, _another separate_ producer hangs for about 2-5 seconds.
In the producer log I get this exception:
[2013-12-16 16:32:14,961] INFO Disconnecting from
qa-hermes004.phx.qa.com:9092 (kafka.producer.SyncProducer)
[2013-12-16 16:32:14,982] WARN Failed to send producer request with
correlation id 3290 to broker 1 with data for partitions [ktr3,4]
(kafka.producer.async.DefaultEventHandler)
java.io.IOException: An established connection was aborted by the software
in your host machine.
at sun.nio.ch.SocketDispatcher.writev0(Native Method)
at sun.nio.ch.SocketDispatcher.writev(SocketDispatcher.java:49)
at sun.nio.ch.IOUtil.write(IOUtil.java:171)
at sun.nio.ch.SocketChannelImpl.write0(SocketChannelImpl.java:377)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:400)
at java.nio.channels.SocketChannel.write(SocketChannel.java:371)
at
kafka.network.BoundedByteBufferSend.writeTo(BoundedByteBufferSend.scala:56)
at kafka.network.Send$class.writeCompletely(Transmission.scala:75)
at
kafka.network.BoundedByteBufferSend.writeCompletely(BoundedByteBufferSend.scala:26)
at kafka.network.BlockingChannel.send(BlockingChannel.scala:92)
at kafka.producer.SyncProducer.liftedTree1$1(SyncProducer.scala:72)
at
kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:71)
at
kafka.producer.SyncProducer$$anonfun$send$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SyncProducer.scala:102)
at
kafka.producer.SyncProducer$$anonfun$send$1$$anonfun$apply$mcV$sp$1.apply(SyncProducer.scala:102)
at
kafka.producer.SyncProducer$$anonfun$send$1$$anonfun$apply$mcV$sp$1.apply(SyncProducer.scala:102)
at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
at
kafka.producer.SyncProducer$$anonfun$send$1.apply$mcV$sp(SyncProducer.scala:101)
at
kafka.producer.SyncProducer$$anonfun$send$1.apply(SyncProducer.scala:101)
at
kafka.producer.SyncProducer$$anonfun$send$1.apply(SyncProducer.scala:101)
at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
at kafka.producer.SyncProducer.send(SyncProducer.scala:100)
at
kafka.producer.async.DefaultEventHandler.kafka$producer$async$DefaultEventHandler$$send(DefaultEventHandler.scala:245)
at
kafka.producer.async.DefaultEventHandler$$anonfun$dispatchSerializedData$1.apply(DefaultEventHandler.scala:107)
at
kafka.producer.async.DefaultEventHandler$$anonfun$dispatchSerializedData$1.apply(DefaultEventHandler.scala:101)
at
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:80)
at
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:80)
at scala.collection.Iterator$class.foreach(Iterator.scala:631)
at
scala.collection.mutable.HashTable$$anon$1.foreach(HashTable.scala:161)
at
scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:194)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:80)
at
kafka.producer.async.DefaultEventHandler.dispatchSerializedData(DefaultEventHandler.scala:97)
at
kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:73)
at kafka.producer.Producer.send(Producer.scala:74)
at kafka.javaapi.producer.Producer.send(Producer.scala:32)
at producer.MessageProducer.send(MessageProducer.java:33)
at producer.MessageProducer.main(MessageProducer.java:103)
[2013-12-16 16:32:14,987] INFO Back off for 100 ms before retrying send.
Remaining retries = 3 (kafka.producer.async.DefaultEventHandler)
[2013-12-16 16:32:15,088] INFO Fetching metadata from broker id:0,host:
qa-hermes003.phx.qa.com,port:9092 with correlation id 3291 for 1 topic(s)
Set(ktr3) (kafka.client.ClientUtils$)
[2013-12-16 16:32:15,134] INFO Connected to
qa-hermes003.phx.qa.com:9092for producing
(kafka.producer.SyncProducer)
[2013-12-16 16:32:15,185] INFO Disconnecting from
qa-hermes003.phx.qa.com:9092 (kafka.producer.SyncProducer)
[2013-12-16 16:32:15,185] INFO Disconnecting from
qa-hermes003.phx.qa.com:9092 (kafka.producer.SyncProducer)
[2013-12-16 16:32:15,232] INFO Connected to
qa-hermes004.phx.qa.com:9092for producing
(kafka.producer.SyncProducer)
[2013-12-16 16:32:15,279] INFO Connected to
qa-hermes003.phx.qa.com:9092for producing
(kafka.producer.SyncProducer)
It's almost as if the error from the other producer/consumer is causing the
broker itself to hang, causing my producer to abort the connection.
Has anyone else seen anything like this before? Any idea what's going on?