Quantcast
Channel: Kafka Timeline
Viewing all 1519 articles
Browse latest View live

Issues with metrics collection

$
0
0
Hello,

We're collecting the JMX metrics from the Kafka brokers. We're seeing a
couple of issues. Could someone please throw some light if you've come
across something similar?

1) We have a 3 broker Kafka cluster and when we're trying to collect the
metrics like messages in per sec, bytes in per sec, etc. we get the values
as 0 for one of the three brokers. But we get proper values for metrics
like heap memory usage for all the brokers. When we restart the cluster,
the same or some other broker would behave in a similar way.

We're seeing similar behavior in another cluster as well.

2) We're logging the time it takes to collect the metrics. The time to
collect seems to increase over time and crosses a minute in a couple of
days. It's of the order of 1 or 2 seconds when start the cluster.

Thanks.

Most common kafka client comsumer implementations?

$
0
0
Curious on a couple questions...

Are most people(are you?) using the simple consumer vs the high level
consumer in production?

What is the common processing paradigm for maintaining a full pipeline for
kafka consumers for at-least-once messaging? E.g. you pull a batch of 1000
messages and:

option 1.
you wait for the slowest worker to finish working on that message, when you
get back 1000 acks internally you commit your offset and pull another batch

option 2.
you feed your workers n msgs at a time in sequence and move your offset up
as you work through your batch

option 3.
you maintain a full stream of 1000 messages ideally and as you get acks
back from your workers you see if you can move your offset up in the stream
to pull n more messages to fill up your pipeline so you're not blocked by
the slowest consumer (probability wise)

any good docs or articles on the subject would be great, thanks!

Unable to read from the beginning using High level consumer API

$
0
0
Kafka Team,

I am using high level consumer API as shown below to read contents from the topic.

Properties props = new Properties();
props.put("zookeeper.connect" ,"localhost:2181");
props.put" zookeeper.session.timeout.ms" ,"10000");
props.put("zookeeper.sync.time.ms" ,200);
props.put("auto.commit.interval.ms" ,"1000");
props.put("consumer.timeout.ms" ,"120000"
props.put("group.id" ,"TEST123");
ConsumerConfig config = new ConsumerConfig(props);

ConsumerConnector consumer = kafka.consumer.Consumer
.createJavaConsumerConnector(config);

Map<String, Integer> topicCountMap = new HashMap<String, Integer>();
topicCountMap.put("TEST", new Integer(1));
Map<String, List<KafkaStream<byte[], byte[]>>> consumerMap = consumer.createMessageStreams(topicCountMap);
List<KafkaStream<byte[], byte[]>> streams = consumerMap.get("TEST");

// now launch all the threads
ThreadPoolExecutor executor = resource.getExecutor();
// now create an object to consume the messages

for (final KafkaStream<byte[], byte[]> stream : streams) {
TestTask task = new TestTask(stream);
executor.submit(task);

And the Testtask is just printing the messages.

The kafka logger shows the below statement

Consumer APP51_DFGHSFV1-1406836437053-9ed3b6a7 selected partitions : YYYY:0: fetched offset = -1: consumed offset = -1,YYYY:1: fetched offset = -1: consumed offset = -1
- [APP51_DFGHSFV1-1406836437053-9ed3b6a7],

Even when the fetched and consumed offset displays -1, I am not getting the messages from the beginning
The retention window policy is set as -log.retention.hours=168

If I produce new messages, then those messages are consumed and I can see the logged statements

If I use the simple consumer API and specify the starting offset as 0, then I am able to read from the beginning

Are there any settings that would enable for new consumer group to read messages from the beginning?

Thanks,
Srividhya

This email and any files transmitted with it are confidential, proprietary and intended solely for the individual or entity to whom they are addressed. If you have received this email in error please delete it immediately.

Zookeeper offset

$
0
0
Kafka Team,

In the integration environment Kafka and zookeeper are running under supervision. Once in a while when zookeeper and kafka are shut-down and started back again, the consumers are not able to read the data from the topic. I am not seeing any exceptions in the log. The consumer offset checker utility does show a lag for the consumer group.

Does that mean when kafka/zookeeper are shut-down abruptly, it's possible that the zookeeper data directories are not committed with proper offset or got corrupted? I tried with new consumer groups too and with simple consumers. After that point, I am not able to retrieve the data from the topic. How do I recover the data?

This is a critical problem and any help is really appreciated.

Thanks!

This email and any files transmitted with it are confidential, proprietary and intended solely for the individual or entity to whom they are addressed. If you have received this email in error please delete it immediately.

undesirable log retention behavior

$
0
0
it seems that log retention is purely based on last touch/modified
timestamp. This is undesirable for code push in aws/cloud.

e.g. let's say retention window is 24 hours. disk size is 1 TB. disk util
is 60% (600GB). when new instance comes up, it will fetch log files (600GB)
from peers. those log files all have newer timestamps. they won't be purged
until 24 hours later. note that during the first 24 hours, new msgs
(another 600GB) continue to come in. This can cause disk full problem
without any intervention. With this behavior, we have to keep disk util
under 50%.

can last modified timestamp be inserted into the file name when rolling
over log files? then kafka can check the file name for timestamp. does this
make sense?

Thanks,
Steven

Consume more than produce

$
0
0
Hey,

After a year or so I have Kafka as my streaming layer in my production, I decided it is time to audit, and to test how many events do I lose, if I lose events at all.

I discovered something interesting which I can't explain.

The producer produces less events that the consumer group consumes.

It is not much more, it is about 0.1% more events

I use the Consumer API (not the simple consumer API)

I was thinking I might had rebalancing going on in my system, but it doesn't look like that.

Did anyone see such a behaviour

In order to audit, I calculated for each event the minute it arrived, and assigned this value to the event, I used statsd do to count all events from all my producer cluster, and all consumer group cluster.

I must say that it is not a happening for every minute,

Thanks, Guy

Reading messages offset in Apache Kafka

$
0
0
I am very much new to Kafka and we are using Kafka 0.8.1.

What I need to do is to consume a message from topic. For that, I will have
to write one consumer in Java which will consume a message from topic and
then save that message to database. After a message is saved, some
acknowledgement will be sent to Java consumer. If acknowledgement is true,
then next message should be consumed from the topic. If acknowldgement is
false(which means due to some error message,read from the topic, couldn't
be saved into the database), then again that message should be read.

I think I need to use Simple Consumer,to have control over message offset
and have gone through the Simple Consumer example as given in this link
https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example

In this example, offset is evaluated in run method as 'readOffset'. Do I
need to play with that? For e.g. I can use LatestTime() instead of
EarliestTime() and in case of false, I will reset the offset to the one
before using offset - 1.

Is this how I should proceed? Or can the same be done using High Level API?

Delete message after consuming it

$
0
0
I want to delete the message from a Kafka broker after consuming it(Java
consumer). How can I do that?

Data is been written in only 1 partition

$
0
0
HI all!

I think I already saw this question on the mailing list, but I'm not able
to find it back...

I'm using kafka 0.8.1.1, i have 3 brokers and I have a default replication
factor of 2 and a default partitioning factor of 2.

My partition are distributed fairly on every brokers.

My problem is that for all the topics I have, my data is only send to
either the partition 0 or the partition 1 (and correctly replicated). All
my brokers are in sync and the data is on every brokers (depending on where
the partitions are).

How can I made my producer / brokers write to the other partition??

Thanks!

François Langelier
Étudiant en génie Logiciel - École de Technologie Supérieure
<http://www.etsmtl.ca/>
Capitaine Club Capra <http://capra.etsmtl.ca/>
VP-Communication - CS Games <http://csgames.org> 2014
Jeux de Génie <http://www.jdgets.com/> 2011 à 2014
Argentier Fraternité du Piranha <http://fraternitedupiranha.com/> 2012-2014
Comité Organisateur Olympiades ÉTS 2012
Compétition Québécoise d'Ingénierie 2012 - Compétition Senior

Request: Adding us to the "Powered By" list

$
0
0
Dear Kafka team,

Would you mind add us @
https://cwiki.apache.org/confluence/display/KAFKA/Powered+By ?
We're using it as part of our ticket sequencing system for our helpdesk
software.

find topic partition count through simpleclient api

$
0
0
Hi,

What's the way to find a topic's partition count dynamically using
simpleconsumer api ?

If I use one seed broker within a cluster of 10 brokers, and add list of
topic name into the simple consumer request to find topics' metadata, when
it returns,
is the size of partitionsMetadata per topicmetadata same as the number of
partitions for a given topic ? Also, for retrieval, do I need to have more
than 1 seed broker to get all metadata info of a topic? Is only 1 seed
broker enough ?

Thanks,

Weide

Updated Kafka Roadmap?

$
0
0
Howdy,

I was wondering if it would be possible to update the release plan:

https://cwiki.apache.org/confluence/display/KAFKA/Future+release+plan

aligned with the feature roadmap:

https://cwiki.apache.org/confluence/display/KAFKA/Index

We have several active projects actively and planning to use Kafka, and any current guidance on the new releases related to ZK dependence, producer and consumer API/client timing would be very helpful. For example, is 0.8.2 possible in August, or is September likely?

Also, any chance something like:

https://cwiki.apache.org/confluence/display/KAFKA/Idempotent+Producer

…might make it into 0.9?

Thanks!

Kafka 0.8 automatically changing leadership

$
0
0
Hi,

We have a Kafka 0.8 cluster in a test environment (in this case, on AWS EC2
nodes). Even though we've tried to run very little load on this test
cluster, it seems like the instances can't even keep up with that.
Leadership moves automatically for at least a few of the topics, which
never happens when we run them on our prod, non-AWS hardware. This causes
us to eventually have to rebalance the topics on those test clusters, which
is annoying.

Can any of you point me to the set of conditions/thresholds that have to be
met for the Kafka cluster to decide to automatically move leadership of a
topic/partition to another replica in the ISR? I'd like to understand how
exactly Kafka does this, to see if we can provision an instance type for
those test Kakfa clusters that can handle the load without moving
leadership around.

Thanks,

Marcos Juarez

kafka consumer fail over

$
0
0
Hi,

I have a use case for a master slave cluster where the logic inside master
need to consume data from kafka and publish some aggregated data to kafka
again. When master dies, slave need to take the latest committed offset
from master and continue consuming the data from kafka and doing the push.

My questions is what will be easiest kafka consumer design for this
scenario to work ? I was thinking about using simpleconsumer and doing
manual consumer offset syncing between master and slave. That seems to
solve the problem but I was wondering if it can be achieved by using high
level consumer client ?

Thanks,

Weide

Exception in thread "main" kafka.common.FailedToSendMessageException: Failed to send messages after 3 tries

$
0
0
Hi Team,

I am trying with Kafka client on Windows 7 64bit -corporate pc which is backed by proxy and Kafka is hosted in Ubuntu 12.04. This is my code:

Properties props = new Properties();
props.put("metadata.broker.list", "10.10.10.10:9092"); //Example IP
props.put("serializer.class", "kafka.serializer.StringEncoder");
props.put("request.required.acks", "1");

ProducerConfig config = new ProducerConfig(props);

Producer<String, String> producer = new Producer<String, String>(config);

But I am getting this error:

log4j:WARN No appenders could be found for logger (kafka.utils.VerifiableProperties).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Exception in thread "main" kafka.common.FailedToSendMessageException: Failed to send messages after 3 tries.
at kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:90)
at kafka.producer.Producer.send(Producer.scala:76)
at kafka.javaapi.producer.Producer.send(Producer.scala:33)
at com.wipro.bos.KafkaProducer.main(KafkaProducer.java:26)

I am not getting what is the error is, I am able to see this log in Kafka log in Ubuntu machine

[2014-08-02 11:31:22,275] INFO Closing socket connection to /10.10.10.10. (kafka.network.Processor)
[2014-08-02 11:31:23,452] INFO Closing socket connection to /10.10.10.10. (kafka.network.Processor)
[2014-08-02 11:31:24,572] INFO Closing socket connection to /10.10.10.10. (kafka.network.Processor)
[2014-08-02 11:31:25,691] INFO Closing socket connection to /10.10.10.10. (kafka.network.Processor)
[2014-08-02 11:31:26,811] INFO Closing socket connection to /10.10.10.100. (kafka.network.Processor)

Can anyone please guide me? I am not understanding what does this error mean.

Thanks,
Pradeep Simha
Technical Lead

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

Socket is not connected error while consuming messages using kafka

$
0
0
Hi Team,

I am trying to consume a message from Kafka hosted on Ubuntu server, using this example: https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example but whenever I run I get this exception:

java.net.SocketException: Socket is not connected
at sun.nio.ch.Net.translateToSocketException(Net.java:149) ~[na:1.7.0_25]
at sun.nio.ch.Net.translateException(Net.java:183) ~[na:1.7.0_25]
at sun.nio.ch.Net.translateException(Net.java:189) ~[na:1.7.0_25]
at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:423) ~[na:1.7.0_25]
at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1232) [zookeeper-3.3.4.jar:3.3.3-1203054]
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1211) [zookeeper-3.3.4.jar:3.3.3-1203054]
Caused by: java.nio.channels.NotYetConnectedException: null
at sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:773) ~[na:1.7.0_25]
at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:421) ~[na:1.7.0_25]
... 2 common frames omitted

I am trying to solve this issue from hours, I looked in config file, properties file etc but still couldn't. If someone could help me in this it would be highly appreciated as I am very beginner to this and have no previous experience in working in this.

Thanks,
Pradeep Simha
Technical Lead

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

How to use kafka as flume source.

$
0
0
Hi,

We are planning to use kafka as *flume source*. Please advice me, how to
use kafka as source in flume.

please share if there is any best example of *flume- kafka source- hdfs
sink*.

Regards,

Rafeeq S
*(“What you do is what matters, not what you think or say or plan.” )*

offset commit api

$
0
0
Hi,

I'm reading the offset management on the API link.
https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-OffsetCommit/FetchAPI

I have a couple of questions regarding using the offset fetch and commit
API in 0.8.1.1 ?

1. Is the new offset commit and fetch api usable in use in 0.8.1.1 ? Does
0.8.1.1 already support offset coordinator ?

2. what's the difference between old offsetrequest and new
offsetfetchrequest ? It seems to me that the new api support per consumer
group offset management fetch while old api doesn't. Also, what's the
purpose of using a timestamp parameter in the fetch request ?

3. In 0.8.1.1, the OffsetCommitRequest uses OffsetMetadataAndError, could
you tell me what's the purpose of the error parameter and metadata
parameter in the request ?

4. Can I assume the offset management is purely independent of message
consumption ? In other words, if I use a simple consumer to fetch message
with random client id, can I still manually set some consumer group along
with offset in the offset commit message ? Is that allowed ?

Thanks,

Weide

Graceful shutdown without using jmx

$
0
0
Hi,

I've installed kafka 0.8.1.1 on linux. But the linux kafka-server-start.sh
doesn't set default jmx port so I cannot change leader of some partitions.
How can I shutdown without data loss or duplication? I've already tested,
and I've known that kafka lose data without graceful shutdown.

Regards,
JL

Performance

$
0
0
Hello,

Is there an official benchmark that compares the various clients (Java, C#,
C++, etc.)?
Viewing all 1519 articles
Browse latest View live