Quantcast
Channel: Kafka Timeline
Viewing all 1519 articles
Browse latest View live

Some doubts regarding kafka config parameters

0
0
hi,

I have the following doubts regarding some kafka config parameters:

For example if I have a Throughput topic with replication factor 1 and a
single partition 0,then i will see the following files under
/tmp/kafka-logs/Throughput_0:

00000000000000000000.index
00000000000000000000.log

00000000000070117826.index
00000000000070117826.log

1) *log.delete.delay.ms <http://log.delete.delay.ms>:*

The period of time we hold log files around after they are removed from the

In the above description, does “*index*” refer to the in-memory
segment-list and not the 00000****.index file(in example above)?

As per documentation, kafka maintains an in-memory segment list:

To enable read operations, kafka maintains an in-memory range(segment

2) *socket.request.max.bytes: *The maximum request size the server will
allow.

how is this different from message.max.bytes (The maximum size of a message
that the server can receive.)

3) *fetch.wait.max.ms <http://fetch.wait.max.ms>: *

Does the server above refer to kafka consumer, which will block for
fetch.wait.max.ms? How is fetch.wait.max.ms different from *consumer.timeout.ms
<http://consumer.timeout.ms>* ?

4) Is there any correlation between a producer's
*queue.buffering.max.messages* and *send.buffer.bytes? *

5) Will batching not happen in case producer.type=async and
request.required.acks=1 or -1 ? Since next message will only be sent after
an ack is received from leader/all ISR replicas?

6) *topic.metadata.refresh.interval.ms
<http://topic.metadata.refresh.interval.ms>: *
After every 10 mins I see the following on my producer side:

1200483 [main] INFO kafka.client.ClientUtils$ - Fetching metadata from
broker id:0,host:localhost,port:9092 with correlation id 15078270 for 1
topic(s) Set(Throughput)

1200484 [main] INFO kafka.producer.SyncProducer - Connected to
localhost:9092 for producing

1200486 [main] INFO kafka.producer.SyncProducer - Disconnecting from
localhost:9092

1200486 [main] INFO kafka.producer.SyncProducer - Disconnecting from
sdp08:9092

1200487 [main] INFO kafka.producer.SyncProducer - Connected to sdp08:9092
for producing

Why is there a disconnection and re-connection happening on each metadata
refresh even though the leader is alive? I have noticed that I loose some
messages when this happens(with request.required.acks=0) ?

thank you,
shweta

Improving the Kafka client ecosystem

0
0
A question was asked in another thread about what was an effective way
to contribute to the Kafka project for people who weren't very
enthusiastic about writing Java/Scala code.

I wanted to kind of advocate for an area I think is really important
and not as good as it could be--the client ecosystem. I think our goal
is to make Kafka effective as a general purpose, centralized, data
subscription system. This vision only really works if all your
applications, are able to integrate easily, whatever language they are
in.

We have a number of pretty good non-java producers. We have been
lacking the features on the server-side to make writing non-java
consumers easy. We are fixing that right now as part of the consumer
work going on right now (which moves a lot of the functionality in the
java consumer to the server side).

But apart from this I think there may be a lot more we can do to make
the client ecosystem better.

Here are some concrete ideas. If anyone has additional ideas please
reply to this thread and share them. If you are interested in picking
any of these up, please do.

1. The most obvious way to improve the ecosystem is to help work on
clients. This doesn't necessarily mean writing new clients, since in
many cases we already have a client in a given language. I think any
way we can incentivize fewer, better clients rather than many
half-working clients we should do. However we are working now on the
server-side consumer co-ordination so it should now be possible to
write much simpler consumers.

2. It would be great if someone put together a mailing list just for
client developers to share tips, tricks, problems, and so on. We can
make sure all the main contributors on this too. I think this could be
a forum for kind of directing improvements in this area.

3. Help improve the documentation on how to implement a client. We
have tried to make the protocol spec not just a dry document but also
have it share best practices, rationale, and intentions. I think this
could potentially be even better as there is really a range of options
from a very simple quick implementation to a more complex highly
optimized version. It would be good to really document some of the
options and tradeoffs.

4. Come up with a standard way of documenting the features of clients.
In an ideal world it would be possible to get the same information
(author, language, feature set, download link, source code, etc) for
all clients. It would be great to standardize the documentation for
the client as well. For example having one or two basic examples that
are repeated for every client in a standardized way. This would let
someone come to the Kafka site who is not a java developer, and click
on the link for their language and view examples of interacting with
Kafka in the language they know using the client they would eventually
use.

5. Build a Kafka Client Compatibility Kit (KCCK) :-) The idea is this:
anyone who wants to implement a client would implement a simple
command line program with a set of standardized options. The
compatibility kit would be a standard set of scripts that ran their
client using this command line driver and validate its behavior. E.g.
for a producer it would test that it correctly can send messages, that
the ordering is retained, that the client correctly handles
reconnection and metadata refresh, and compression. The output would
be a list of features that passed are certified, and perhaps basic
performance information. This would be an easy way to help client
developers write correct clients, as well as having a standardized
comparison for the clients that says that they work correctly.

-Jay

Switch Apache logo URL to HTTPS

0
0
Hi,

Reading Kafka webpage/documentation through HTTPS (e.g.
https://kafka.apache.org/documentation.html) I spotted that it generates
warning (in Firefox and Chromium) about a reference to unencrypted
resources. In fact the Apache logo is always load using HTTP -
http://www.apache.org/images/feather-small.png

As that image is also available via HTTPS I think it would painless to
switch "src" in "img" tag in
https://svn.apache.org/repos/asf/kafka/site/includes/footer.html to
HTTPS or use double slash syntax which was defined in 15+ years old RFC
1808 and should be supported by recent browsers -
http://stackoverflow.com/a/9632363/313516

Marcin

Kafka cluster setup

0
0
Sorry for the spam.

I am new to apache kafka. Can someone point me to kafka cluster installation document. I am planning to set up a 3 node ZK quorum and 10 broker cluster.

Thanks,
Raj Tanneru

question about compression

0
0
In trying to better understand compression I came across the following

http://geekmantra.wordpress.com/2013/03/28/compression-in-kafka-gzip-or-snappy/

“in Kafka 0.8, messages for a partition are served by the leader broker.
The leader assigns these unique logical offsets to every message it appends
to its log. Now, if the data is compressed, the leader has to decompress
the data in order to assign offsets to the messages inside the compressed
message. So the leader decompresses data, assigns offsets, compresses it
again and then appends the re-compressed data to disk”

I am assuming when the data is re compressed on the broker the same rows
are batched together. For example say I am using a batch size of 400 from
the producer these messages would be saved compressed on desk in batches of
400. Does this imply that consumers need to ensure they set the same batch
size to ensure their requests align with the stored batch size? For
example if the consumer is set to use batches of 100 and the producer used
400 would the consumer then read 400 messages for each batch of 100
messages? …only to go back and request many of the same rows on the next
batch?

Bert

much reduced io utilization after upgrade to 0.8.0 -> 0.8.1.1

0
0
I recently upgraded some of our kafka clusters to use 0.8.1.1 (from 0.8.0).
It's all looking good so far. One thing I notice though (seems like a
good thing) is that the iostat utilization has gone way down after the
upgrade.

I'm not sure if I know exactly what could could be responsible for this, is
this an expected result.

Is it possibly related to: https://issues.apache.org/jira/browse/KAFKA-615

Thanks,

Jason

Kafka consumer per topic

0
0
Hello All,

I hope that this is the right place for this question, I am trying to determine if I have a separate connection per kafka topic that I want to consume if that would cause any performance, or usage problems for my kafka servers or the clients?

Thank you,

Nick

The information and attachments in this email may
contain privileged, proprietary and confidential
information for its intended recipients.

If you have received this email in error, please
notify the sender and delete the email.

The contents of this message are subject to
written reconfirmation from an authorized
representative of blinkx.

Partitions per Machine for a topic

0
0
HI,
Is the maximum no. of partitions for a topic dependent on the no. of
machines in a kafka cluster?
For e.g., if I have 3 machines in a cluster, can I have 5 partitions with a
caveat that one machine can host multiple partitions for a given topic?

Regards,
Kashyap

Kafka on yarn

0
0
Hi guys,

Kafka is getting more and more popular and in most cases people run kafka
as long-term service in the cluster. Is there a discussion of running kafka
on yarn cluster which we can utilize the convenient configuration/resource
management and HA. I think there is a big potential and requirement for
that.
I found a project https://github.com/kkasravi/kafka-yarn. But is there a
official roadmap/plan for this?

Thank you very much!

Best,
Siyuan

num.partitions vs CreateTopicCommand.main(args)

0
0
Hi All,

In kafka.properties, I put (forgot to change):

num.partitions=1

While I create topics programatically:

String[] args = new String[]{
"--zookeeper", config.getString("zookeeper"),
"--topic", config.getString("topic"),
"--replica", config.getString("replicas"),
"--partition", config.getString("partitions")
};

CreateTopicCommand.main(args);

The performance engineer told me only one consumer thread is actively
working even I have 4 consumer threads started (could see when debugging or
in thread dump); and 4 partitions configured from the args.

It seems that num.partitions is still controlling the parallelism. Do I
need to change this num.partitions accordingly? Could I remove it? What is
I have different parallel requirement for different topic?

Thank you in advance!

Best Regards,
Mingtao

ConsumerConnector not processing partitions on a particular kafka broker.

0
0
Hello all

Some background.

I have a 3 kafka brokers A,B and C, there is a kafka topic called topic
with 20 partitions (no replicas).

Everything has been working fine for about a week when suddenly all the
data sent to partitions belonging to broker C are not seen by the Consumer
the consumer is using the high level consumer and does not look much
different to the sample provided in the documentation.

When I inspected the topic i can see that all the partitions are lagging
behind. A restart (og the consumer) seems to sort it out but I am stumped
as to whats doing on any help appreciated.

Thanks
Pablo

Durability

0
0
Hi,

I have come back to looking at Kafka after a while.

Is it really the case that messages can be lost if the producer is
disconnected from the broker, as described in KAFKA-789
<https://issues.apache.org/jira/browse/KAFKA-789>, and touched on with some
elaboration in KAFKA-156 <https://issues.apache.org/jira/browse/KAFKA-156>?
Since it seems mildly related, could anyone please also remind me what are
the *practical* considerations for co-locating the producer and broker on
the same machine? I think my cloud architecture would be "cleaner" and
meaner in lifecycle terms, if brokers were separate machines in the same
cloud zone.

I am currently looking at Kafka as a means of offloading the writing of a
lot of data from my application. My application creates a lot of data, and
I wish for it to become "send and forget" in the sense that my main
application does not concern with storing the data to various data stores,
but rather only spits out the data, and lets over small services route
handle their persistence. So, it's a bit, in a way, like logging, but, it
is more important that data is not lost.

Thanks in advance,
Matan

Serious Bug? Segment getting deleted as soon as it is rolled over

0
0
We just noticed that one of our topics has been horribly misbehaving.

*retention.ms <http://retention.ms>* for the topic is set to 1209600000 ms

However, segments are getting schedule for deletetion as soon as a new one
is rolled over. And naturally consumers are running into a
kafka.common.OffsetOutOfRangeException whenever this happens.

Is this a known bug? It is incredibly serious. We seem to have lost about
40 million messages on a single topic and are yet to figure out what all
topics are affected.

I thought of restarting Kafka but figured I'd leave it untouched while I
figure out what I can capture for finding the root cause.

Meanwhile in order to keep from losing any more data, I have a periodic job
that is doing a *'cp -al' *of the partitions into a separate folder. That
way Kafka goes ahead and deletes the segment but the data is not lost from
the filesystem.

If this is a unseen bug, what should I save from the running instance.

By the way, this has affected all partitions and replicas of the topic and
not on a specific host.

Lost messages during leader election

0
0
Hi,

I have a test that continuously sends messages to one broker, brings up
another broker, and adds it as a replica for all partitions, with it being
the preferred replica for some. I have auto.leader.rebalance.enable=true,
so replica election gets triggered. Data is being pumped to the old broker
all the while. It seems that some data gets lost while switching over to
the new leader. Is this a bug, or do I have something misconfigured? I also
have request.required.acks=-1 on the producer.

Here's what I think is happening:

1. Producer writes message to broker 0, [EventServiceUpsertTopic,13], w/
broker 0 currently leader, with ISR=(0), so write returns successfully,
even when acks = -1. Correlation id 35836

Producer log:

[2014-07-24 14:44:26,991] [DEBUG] [dw-97 - PATCH
/v1/events/type_for_test_bringupNewBroker_shouldRebalance_shouldNotLoseData/event?_idPath=idField_mergeFields=field1]
[kafka.producer.BrokerPartitionInfo] Partition
[EventServiceUpsertTopic,13] has leader 0

[2014-07-24 14:44:26,993] [DEBUG] [dw-97 - PATCH
/v1/events/type_for_test_bringupNewBroker_shouldRebalance_shouldNotLoseData/event?_idPath=idField_mergeFields=field1]
[k.producer.async.DefaultEventHandler] Producer sent messages with
correlation id 35836 for topics [EventServiceUpsertTopic,13] to broker 0 on
localhost:56821
2. Broker 1 is still catching up

Broker 0 Log:

[2014-07-24 14:44:26,992] [DEBUG] [kafka-request-handler-3]
[kafka.cluster.Partition] Partition [EventServiceUpsertTopic,13] on broker
0: Old hw for partition [EventServiceUpsertTopic,13] is 971. New hw is 971.
All leo's are 975,971

[2014-07-24 14:44:26,992] [DEBUG] [kafka-request-handler-3]
[kafka.server.KafkaApis] [KafkaApi-0] Produce to local log in 0 ms

[2014-07-24 14:44:26,992] [DEBUG] [kafka-processor-56821-0]
[kafka.request.logger] Completed request:Name: ProducerRequest; Version:
0; CorrelationId: 35836; ClientId: ; RequiredAcks: -1; AckTimeoutMs: 10000
ms from client /127.0.0.1:57086
;totalTime:0,requestQueueTime:0,localTime:0,remoteTime:0,responseQueueTime:0,sendTime:0
3. Leader election is triggered by the scheduler:

Broker 0 Log:

[2014-07-24 14:44:26,991] [INFO ] [kafka-scheduler-0]
[k.c.PreferredReplicaPartitionLeaderSelector]
[PreferredReplicaPartitionLeaderSelector]: Current leader 0 for partition [
EventServiceUpsertTopic,13] is not the preferred replica. Trigerring
preferred replica leader election

[2014-07-24 14:44:26,993] [DEBUG] [kafka-scheduler-0]
[kafka.utils.ZkUtils$] Conditional update of path
/brokers/topics/EventServiceUpsertTopic/partitions/13/state with value
{"controller_epoch":1,"leader":1,"version":1,"leader_epoch":3,"isr":[0,1]}
and expected version 3 succeeded, returning the new version: 4

[2014-07-24 14:44:26,994] [DEBUG] [kafka-scheduler-0]
[k.controller.PartitionStateMachine] [Partition state machine on
Controller 0]: After leader election, leader cache is updated to
Map(<Snipped>(Leader:1,ISR:0,1,LeaderEpoch:3,ControllerEpoch:1),<EndSnip>)

[2014-07-24 14:44:26,994] [INFO ] [kafka-scheduler-0]
[kafka.controller.KafkaController] [Controller 0]: Partition [
EventServiceUpsertTopic,13] completed preferred replica leader election.
New leader is 1
4. Broker 1 is still behind, but it sets the high water mark to 971!!!

Broker 1 Log:

[2014-07-24 14:44:26,999] [INFO ] [kafka-request-handler-6]
[kafka.server.ReplicaFetcherManager] [ReplicaFetcherManager on broker 1]
Removed fetcher for partitions [EventServiceUpsertTopic,13]

[2014-07-24 14:44:27,000] [DEBUG] [kafka-request-handler-6]
[kafka.cluster.Partition] Partition [EventServiceUpsertTopic,13] on broker
1: Old hw for partition [EventServiceUpsertTopic,13] is 970. New hw is -1.
All leo's are -1,971

[2014-07-24 14:44:27,098] [DEBUG] [kafka-request-handler-3]
[kafka.server.KafkaApis] [KafkaApi-1] Maybe update partition HW due to
fetch request: Name: FetchRequest; Version: 0; CorrelationId: 1; ClientId:
ReplicaFetcherThread-0-1; ReplicaId: 0; MaxWait: 500 ms; MinBytes: 1 bytes;
RequestInfo: [EventServiceUpsertTopic,13] ->
PartitionFetchInfo(971,1048576), <Snipped>

[2014-07-24 14:44:27,098] [DEBUG] [kafka-request-handler-3]
[kafka.cluster.Partition] Partition [EventServiceUpsertTopic,13] on broker
1: Recording follower 0 position 971 for partition [
EventServiceUpsertTopic,13].

[2014-07-24 14:44:27,100] [DEBUG] [kafka-request-handler-3]
[kafka.cluster.Partition] Partition [EventServiceUpsertTopic,13] on broker
1: Highwatermark for partition [EventServiceUpsertTopic,13] updated to 971
5. Consumer is none the wiser. All data that was in offsets 972-975 doesn't
show up!

I tried this with 2 initial replicas, and adding a 3rd which is supposed to
be the leader for some new partitions, and this problem also happens there.
The log on the old leader gets truncated to the offset on the new leader.
What's the solution? Can I make a new broker leader for partitions that are
currently active without losing data?

Thanks,
Jad.

KAFKA-1477 (authentication layer) and 0.8.2

0
0
Hi guys,

This JIRA (https://issues.apache.org/jira/browse/KAFKA-1477) leads me to
believe that an authentication layer implementation is planned as part of
the 0.8.2 release. I was wondering if this is still the case?

There was an earlier thread talking about security, but there hasn't been
activity on it in awhile.

I grabbed Joe's fork and it works, but I was wondering about it getting
merged back into the official 0.8.2 codebase, or is this more likely
something that will be in 0.9?

Thanks!

kafka support in collectd and syslog-ng

0
0
Hi list,

Just a quick note to let you know that kafka support has now been merged in
collectd, which means that system and application metrics can directly be
produced on a topic from the collectd daemon.

Additionally, syslog-ng will soon ship with a kafka producing module as
well, it will be part of the next release of the syslog-ng-incubator module
collection: https://github.com/balabit/syslog-ng-incubator.

What this means is that there is now a very lightweight way to create a
infrastructure event stream on top of kafka, with known tools in the ops
world.

I relied on the librdkafka library to provide kafka producing, which I can
recommend for C needs.

- pyr

MaxLag Mbean for Kafka Consumer

0
0
The Max lag Mbean is defined as "Number of messages the consumer lags
behind the producer". Now when I read the Mbean value it give me the count
as 0 (and occasionally some value like 130 or 340 )

ConsumerFetcherManager.test-consumer-group-MaxLag count = 0

But when I use the kafka.tools.ConsumerOffsetChecker I get following
as Lag value

*Group Topic Pid Offset
logSize *Lag* Owner*
test-consumer-group kafka-test 0 275985215
276195685 *210470* none

Are both the lags not related? Or is it like I am capturing the lag MaxLag
incorrectly?

request.required.acks in Async Mode

0
0
Hi,

I wanted to know if request.required.acks has a meaning in async mode ?
If so what value should it be set to.

Thanks regards,
Harshvardhan Chauhan

Issue with unit testing Kafka on 0.8.1.1 and scala 2.9.2

0
0
Hi,

i have been trying to run the kafka server using TestUtils.for my unit
tests, while the topic gets created, i'm getting the following error

error when handling request
Name:LeaderAndIsrRequest;Version:0;Controller:0;ControllerEpoch:1;CorrelationId:9;ClientId:id_0-host_localhost-port_9000;Leaders:id:0,host:localhost,port:9000;PartitionState:(netopic,0)

(LeaderAndIsrInfo:(Leader:0,ISR:0,1,LeaderEpoch:0,ControllerEpoch:1),ReplicationFactor:2),AllReplicas:0,1)
(kafka.server.KafkaApis:103)

It creates topics and i can check for existence

Output from my program
All topics Map(new-topic -> {})
topic exists true

here's how i'm creating the zookeeper client and topic

zkClient = new ZkClient(zookeeper.connectString(),
zkSessionTimeout,zkConnectionTimeout, ZKStringSerializer$.MODULE$ );

AdminUtils.createTopic(zkClient, topic, 1,
2,AdminUtils.createTopic$default$5());

kafka version

<groupId>org.apache.kafka</groupId>
<artifactId>kafka_2.9.2</artifactId>
<version>0.8.1.1</version>

Let me know if anyone has faced this issue and any resolution for the same.

Regards,
Sathya

Apache Kafka error on windows - Couldnot find or load main class QuorumPeerMain

0
0
Hi Team,

I just downloaded Kafka 2.8.0 from Apache website, and I am trying to setup using the instructions given on the website. But when I try to start zookeper server, I am getting below error:

Error: Could not find or load main class org.apache.zookeeper.server.quorum.QuorumPeerMain

My environment is Windows 7 64 bit. I tried to follow below e-mail chain: [Apache Email Chain][1] . But still it's having same issue. Can anyone guide me in this? As I am very new to this and couldn't find many information on Google/Apache Kafka email chain

[1]: http://mail-archives.apache.org/mod_mbox/kafka-users/201405.mbox/%3CCALUpvHXCtTKQFe59h_SN-6jZRScqWrgv0TWsjZ-HjV7AO19GOA [ at ] mail.gmail.com%3E

Thanks,
Pradeep Simha
Technical Lead
Cell: +91-8884382615
E-mail: pradeep.simha [ at ] wipro.com

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com
Viewing all 1519 articles
Browse latest View live




Latest Images