Quantcast
Channel: Kafka Timeline
Viewing all 1519 articles
Browse latest View live

OffsetOutOfRangeException and getOffsetsBefore

0
0
Hi,
We are currently using kafka-0.7.1 right now.
I have two questions:
1. We use SimpleConsumer to aggregate messages to log files and there is no zookeeper. Sometimes we can see kafka.common.OffsetOutOfRangeException.
And this exception happens when we start our consumer program. We do not know the reason why this happens.
How can I get a valid latest message offset in kafka-0.7.1 when this exception happens?
2. Before we start consumer, we call getOffsetsBefore function to get a list of valid offsets (up to maxSize) before the given time.
How can we interpret this list?
For example, this function returns an array [offset1, offset2].
Does this mean from offset1 to offset2 are valid, and offset2 to current offset are valid? We are confused about the meaning of this array.

-- Regards
Sining Ma

multiple consumers on a topic

0
0
I am very new to kafka, so I'll apologize in advance for any stupid
questions...

That being said is it possible within kafka to have multiple consumers on a
single topic? I had assumed the answer was yes, but I am running into some
issues setting this up. Any information would be greatly appreciated.

is 0.8 stable?

0
0
Hi,

Sorry if I missed the announcement, but is 0.8 stable/production worthy as
of yet?

Is anyone using it in the wild?

MirrorMaker consumer does not use broker.list property

0
0
Hi,

I'd like to bump this issue:
https://mail-archives.apache.org/mod_mbox/kafka-users/201212.mbox/%3CFA0E8A0482D176408729D604142248F319D22CA7%40EXCHANGE14.actuate.com%3E
as I'm encountering the same problem.

It seems that the MirrorMaker does the following:
1) if zk.connect is defined for the consumer, MirrorMaker script ignores
broker.list value
2) if zk.connect is not definied for the consumer, MirrorMaker fails with
an error message:

[2013-05-28 11:24:57,457] INFO group1_ip-<hostname> Connecting to zookeeper
instance at null (kafka.consumer.ZookeeperConsumerConnector)
[2013-05-28 11:24:57,457] INFO Initiating client connection,
connectString=null sessionTimeout=6000
watcher=org.I0Itec.zkclient.ZkClient [ at ] c820344(org.apache.zookeeper.ZooKeeper)
[2013-05-28 11:24:57,461] INFO Starting ZkClient event thread.
(org.I0Itec.zkclient.ZkEventThread)
Exception in thread "main" java.lang.NullPointerException
at org.apache.zookeeper.ClientCnxn.<init>(ClientCnxn.java:361)
at org.apache.zookeeper.ClientCnxn.<init>(ClientCnxn.java:332)
at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:383)
at org.I0Itec.zkclient.ZkConnection.connect(ZkConnection.java:64)
at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:872)
at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:98)
at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:84)
at
kafka.consumer.ZookeeperConsumerConnector.connectZk(ZookeeperConsumerConnector.scala:152)
at
kafka.consumer.ZookeeperConsumerConnector.<init>(ZookeeperConsumerConnector.scala:122)
at
kafka.consumer.ZookeeperConsumerConnector.<init>(ZookeeperConsumerConnector.scala:129)
at kafka.tools.MirrorMaker$$anonfun$3.apply(MirrorMaker.scala:102)
at kafka.tools.MirrorMaker$$anonfun$3.apply(MirrorMaker.scala:102)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:206)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:206)
at
scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:61)
at scala.collection.immutable.List.foreach(List.scala:45)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:206)
at scala.collection.immutable.List.map(List.scala:45)
at kafka.tools.MirrorMaker$.main(MirrorMaker.scala:102)
at kafka.tools.MirrorMaker.main(MirrorMaker.scala)

The command I run is

$KAFKA_ROOT/bin/kafka-run-class.sh kafka.tools.MirrorMaker
--consumer.config ./mirror_consumer.properties --producer.config
./mirror_producer.properties --whitelist=".*" --num.streams 1

The ./mirror_consumer.properties file contains:

broker.list=1:localhost:59092,2:localhost:59093
groupid=group1
shallowiterator.enable=true

When broker.list is commented out and zk.connect is defined instead (or
when both are defined), mirroring happens successfully.

Any leads?

Thank you,
Aman

Deadline Extension: 2013 Workshop on Middleware for HPC and Big Data Systems (MHPC'13)

0
0
we apologize if you receive multiple copies of this message

===================================================================

CALL FOR PAPERS

2013 Workshop on

Middleware for HPC and Big Data Systems

MHPC '13

as part of Euro-Par 2013, Aachen, Germany

===================================================================

Date: August 27, 2012

Workshop URL: http://m-hpc.org

Springer LNCS

SUBMISSION DEADLINE:

June 10, 2013 - LNCS Full paper submission (extended)
June 28, 2013 - Lightning Talk abstracts

SCOPE

Extremely large, diverse, and complex data sets are generated from
scientific applications, the Internet, social media and other applications.
Data may be physically distributed and shared by an ever larger community.
Collecting, aggregating, storing and analyzing large data volumes
presents major challenges. Processing such amounts of data efficiently
has been an issue to scientific discovery and technological
advancement. In addition, making the data accessible, understandable and
interoperable includes unsolved problems. Novel middleware architectures,
algorithms, and application development frameworks are required.

In this workshop we are particularly interested in original work at the
intersection of HPC and Big Data with regard to middleware handling
and optimizations. Scope is existing and proposed middleware for HPC
and big data, including analytics libraries and frameworks.

The goal of this workshop is to bring together software architects,
middleware and framework developers, data-intensive application developers
as well as users from the scientific and engineering community to exchange
their experience in processing large datasets and to report their scientific
achievement and innovative ideas. The workshop also offers a dedicated forum
for these researchers to access the state of the art, to discuss problems
and requirements, to identify gaps in current and planned designs, and to
collaborate in strategies for scalable data-intensive computing.

The workshop will be one day in length, composed of 20 min paper
presentations, each followed by 10 min discussion sections.
Presentations may be accompanied by interactive demonstrations.

TOPICS

Topics of interest include, but are not limited to:

- Middleware including: Hadoop, Apache Drill, YARN, Spark/Shark, Hive,
Pig, Sqoop,
HBase, HDFS, S4, CIEL, Oozie, Impala, Storm and Hyrack
- Data intensive middleware architecture
- Libraries/Frameworks including: Apache Mahout, Giraph, UIMA and GraphLab
- NG Databases including Apache Cassandra, MongoDB and CouchDB/Couchbase
- Schedulers including Cascading
- Middleware for optimized data locality/in-place data processing
- Data handling middleware for deployment in virtualized HPC environments
- Parallelization and distributed processing architectures at the
middleware level
- Integration with cloud middleware and application servers
- Runtime environments and system level support for data-intensive computing
- Skeletons and patterns
- Checkpointing
- Programming models and languages
- Big Data ETL
- Stream processing middleware
- In-memory databases for HPC
- Scalability and interoperability
- Large-scale data storage and distributed file systems
- Content-centric addressing and networking
- Execution engines, languages and environments including CIEL/Skywriting
- Performance analysis, evaluation of data-intensive middleware
- In-depth analysis and performance optimizations in existing data-handling
middleware, focusing on indexing/fast storing or retrieval between compute
and storage nodes
- Highly scalable middleware optimized for minimum communication
- Use cases and experience for popular Big Data middleware
- Middleware security, privacy and trust architectures

DATES

Papers:
Rolling abstract submission
June 10, 2013 - Full paper submission (extended)
July 8, 2013 - Acceptance notification
October 3, 2013 - Camera-ready version due

Lightning Talks:
June 28, 2013 - Deadline for lightning talk abstracts
July 15, 2013 - Lightning talk notification

August 27, 2013 - Workshop Date

TPC

CHAIR

Michael Alexander (chair), TU Wien, Austria
Anastassios Nanos (co-chair), NTUA, Greece
Jie Tao (co-chair), Karlsruhe Institut of Technology, Germany
Lizhe Wang (co-chair), Chinese Academy of Sciences, China
Gianluigi Zanetti (co-chair), CRS4, Italy

PROGRAM COMMITTEE

Amitanand Aiyer, Facebook, USA
Costas Bekas, IBM, Switzerland
Jakob Blomer, CERN, Switzerland
William Gardner, University of Guelph, Canada
José Gracia, HPC Center of the University of Stuttgart, Germany
Zhenghua Guom, Indiana University, USA
Marcus Hardt, Karlsruhe Institute of Technology, Germany
Sverre Jarp, CERN, Switzerland
Christopher Jung, Karlsruhe Institute of Technology, Germany
Andreas Knüpfer - Technische Universität Dresden, Germany
Nectarios Koziris, National Technical University of Athens, Greece
Yan Ma, Chinese Academy of Sciences, China
Martin Schulz - Lawrence Livermore National Laboratory
Viral Shah, MIT Julia Group, USA
Dimitrios Tsoumakos, Ionian University, Greece
Zhifeng Yun, Louisiana State University, USA

PAPER PUBLICATION

Accepted full papers will be published in the Springer LNCS series.

The best papers of the workshop -- after extension and revision -- will be
published in a Special Issue of the Springer Journal of Scalable Computing.

PAPER SUBMISSION

Papers submitted to the workshop will be reviewed by at least two
members of the program committee and external reviewers. Submissions
should include abstract, key words, the e-mail address of the
corresponding author, and must not exceed 10 pages, including tables
and figures at a main font size no smaller than 11 point. Submission
of a paper should be regarded as a commitment that, should the paper
be accepted, at least one of the authors will register and attend the
conference to present the work.

The format must be according to the Springer LNCS Style. Initial
submissions are in PDF; authors of accepted papers will be requested
to provide source files.

Format Guidelines:
http://www.springer.de/comp/lncs/authors.html

Style template:
ftp://ftp.springer.de/pub/tex/latex/llncs/latex2e/llncs2e.zip

Abstract Registration - Submission Link:
http://edas.info/newPaper.php?c=14763

LIGHTNING TALKS

Talks are strictly limited to 5 minutes. They can be used to gain early
feedback on ongoing research, for demonstrations, to present research
results, early research ideas, perspectives and positions of interest to
the community. Lightning talks should spark discussion with presenters
making themselves available following the lightning talk track.

DURATION: Workshop Duration is one day.

GENERAL INFORMATION

The workshop will be held as part of Euro-Par 2013.

Euro-Par 2013: http://www.europar2013.org

one consumerConnector or many?

0
0
In thinking about the design of consumption, we have in mind a generic
consumer server which would consume from more than one message type. The
handling of each type of message would be different. I suppose we could
have upwards of say 50 different message types, eventually, maybe 100+
different types. Which of the following designs would be best and why would
the other options be bad?

1) Have all message types go through one topic and use a dispatcher
pattern to select the correct handler. Use one consumerConnector.

2) Use a different topic for each message type, but still use one
consumerConnector and a dispatcher pattern.

3) Use a different topic for each message type and have a separate
consumerConnector for each topic.

I am struggling with whether my assumptions are correct. It seems that a
single connector for a topic would establish one socket to each broker, as
rebalancing assigns various partitions to that thread. Option 2 would pull
messages from more than one topic through a single socket to a particular
broker, is it so? Would option 3 be reasonable, establishing upwards of 100
sockets per broker?

I am guestimating that option 2 is the right way forward, to bound socket
use, and we'll need to figure out a way to parameterize stream consumption
with the right handlers for a particular msg type. If we add a topic, do
you think we should create a new connector or restart the original connector
with the new topic in the map?

Thanks,

rob

SimpleConsumer and message offsets

0
0
Hi,
Iam using a program which reads data from a stream using SimpleConsumer.
Is there a way for SimpleConsumer to remember the last offset read? I want
the program to continue from the last offset read when it is restarted.
Thanks in advance.

-Arathi

InvalidMessageException

0
0
Hi,

I am using kafka 0.7.2, do you see this exception? What's the possible
reason?

2013/05/29 19:18:19.325 ERROR [KafkaRequestHandlers] [] Error processing
ProduceRequest on gallery:0
kafka.message.InvalidMessageException: message is invalid, compression
codec: NoCompressionCodec size: 222 curr offset: 0 init offset: 0
at
kafka.message.ByteBufferMessageSet$$anon$1.makeNextOuter(ByteBufferMessageSet.scala:130)
at
kafka.message.ByteBufferMessageSet$$anon$1.makeNext(ByteBufferMessageSet.scala:160)
at
kafka.message.ByteBufferMessageSet$$anon$1.makeNext(ByteBufferMessageSet.scala:100)
at
kafka.utils.IteratorTemplate.maybeComputeNext(IteratorTemplate.scala:59)
at kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:51)
at
kafka.message.ByteBufferMessageSet.verifyMessageSize(ByteBufferMessageSet.scala:89)
at kafka.log.Log.append(Log.scala:218)
at
kafka.server.KafkaRequestHandlers.kafka$server$KafkaRequestHandlers$$handleProducerRequest(KafkaRequestHandlers.scala:69)
at
kafka.server.KafkaRequestHandlers.handleProducerRequest(KafkaRequestHandlers.scala:53)
at
kafka.server.KafkaRequestHandlers$$anonfun$handlerFor$1.apply(KafkaRequestHandlers.scala:38)
at
kafka.server.KafkaRequestHandlers$$anonfun$handlerFor$1.apply(KafkaRequestHandlers.scala:38)
at kafka.network.Processor.handle(SocketServer.scala:296)
at kafka.network.Processor.read(SocketServer.scala:319)
at kafka.network.Processor.run(SocketServer.scala:214)
at java.lang.Thread.run(Thread.java:679)

kafka.common.LeaderNotAvailableException: No leader for any partition

0
0
I am working on deploying Kafka 0.8 to our test cluster that consist of 2
kafka servers and 2 zookeper nodes.
I created topic using the following command

bin/kafka-topics.sh --create --topic junit1_analytics_data_log --zookeeper
danalyticspubsubzoo02:2181 --partitions=36 --replication-factor=2

After using producer to post messages to the topic I am getting the
following error. Can someone please advise?
[2013-05-29 17:36:46,775] WARN Error while fetching metadata
PartitionMetadata(29,None,Vector(),Vector(),5) for topic partition
[junit1_analytics_data_log,29]: [class
kafka.common.LeaderNotAvailableException]
(kafka.producer.BrokerPartitionInfo)
[2013-05-29 17:36:46,775] WARN Error while fetching metadata
PartitionMetadata(30,None,Vector(),Vector(),5) for topic partition
[junit1_analytics_data_log,30]: [class
kafka.common.LeaderNotAvailableException]
(kafka.producer.BrokerPartitionInfo)
[2013-05-29 17:36:46,775] WARN Error while fetching metadata
PartitionMetadata(31,None,Vector(),Vector(),5) for topic partition
[junit1_analytics_data_log,31]: [class
kafka.common.LeaderNotAvailableException]
(kafka.producer.BrokerPartitionInfo)
[2013-05-29 17:36:46,776] WARN Error while fetching metadata
PartitionMetadata(32,None,Vector(),Vector(),5) for topic partition
[junit1_analytics_data_log,32]: [class
kafka.common.LeaderNotAvailableException]
(kafka.producer.BrokerPartitionInfo)
[2013-05-29 17:36:46,776] WARN Error while fetching metadata
PartitionMetadata(33,None,Vector(),Vector(),5) for topic partition
[junit1_analytics_data_log,33]: [class
kafka.common.LeaderNotAvailableException]
(kafka.producer.BrokerPartitionInfo)
[2013-05-29 17:36:46,776] WARN Error while fetching metadata
PartitionMetadata(34,None,Vector(),Vector(),5) for topic partition
[junit1_analytics_data_log,34]: [class
kafka.common.LeaderNotAvailableException]
(kafka.producer.BrokerPartitionInfo)
[2013-05-29 17:36:46,777] WARN Error while fetching metadata
PartitionMetadata(35,None,Vector(),Vector(),5) for topic partition
[junit1_analytics_data_log,35]: [class
kafka.common.LeaderNotAvailableException]
(kafka.producer.BrokerPartitionInfo)
[2013-05-29 17:36:46,777] WARN Failed to collate messages by
topic,partition due to (kafka.producer.async.DefaultEventHandler)
kafka.common.LeaderNotAvailableException: No leader for any partition
at
kafka.producer.async.DefaultEventHandler.kafka$producer$async$DefaultEventHandler$$getPartition(DefaultEventHandler.scala:212)
at
kafka.producer.async.DefaultEventHandler$$anonfun$partitionAndCollate$1.apply(DefaultEventHandler.scala:150)
at
kafka.producer.async.DefaultEventHandler$$anonfun$partitionAndCollate$1.apply(DefaultEventHandler.scala:148)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
at
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at
kafka.producer.async.DefaultEventHandler.partitionAndCollate(DefaultEventHandler.scala:148)
at
kafka.producer.async.DefaultEventHandler.dispatchSerializedData(DefaultEventHandler.scala:94)
at
kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:72)
at kafka.producer.Producer.send(Producer.scala:74)
at kafka.producer.ConsoleProducer$.main(ConsoleProducer.scala:159)
at kafka.producer.ConsoleProducer.main(ConsoleProducer.scala)
[2013-05-29 17:36:46,880] INFO Fetching metadata with correlation id 6 for
1 topic(s) Set(junit1_analytics_data_log) (kafka.client.ClientUtils$)
[2013-05-29 17:36:46,881] INFO Connected to localhost:9092 for producing
(kafka.producer.SyncProducer)

Issue during commitOffsets using SimpleConsumer

0
0
Hi,
I get the following error on running SimpleConsumer.commitOffsets(). Could
you tell me what is the issue?

*java.io.EOFException*: Received -1 when reading from channel, socket has
likely been closed.

at kafka.utils.Utils$.read(Utils.scala:375)

at
kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)

at kafka.network.Receive$class.readCompletely(Transmission.scala:56)

at
kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29)

at kafka.network.BlockingChannel.receive(BlockingChannel.scala:100)

at kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:83)

at
kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(SimpleConsumer.scala:73)

at kafka.consumer.SimpleConsumer.commitOffsets(SimpleConsumer.scala:134)

at
kafka.javaapi.consumer.SimpleConsumer.commitOffsets(SimpleConsumer.scala:89)

at KafkaConsumer2.main(*KafkaConsumer2.java:226*)

Thanks
Arathi

async producer and new ack levels

0
0
With 0.8, we now have ack levels when sending messages. I'm wondering how
this applies when sending messages in async mode. Are there any guarantees
at least that each async batch will wait for the requested ack level before
sending the next batch?

I assume there is still a disconnect between sending a message and really
knowing if it was delivered, in async mode. Is it necessary to create an
eventHandler to try to manage things in this case?

Perhaps, if acknowledgements are desired, with the efficiency of batch
sending, it would make more sense to use a synchronous producer, and use
the batch sending mode (e.g. send a list of messages).

Jason

question about usage of SimpleConsumer

0
0
In Kafka, the consumers are responsible for maintaining state information
(offset) on what has been consumed (refer from kafka design
page).high-level consumer api will store its consumption state in
zookeeper, while simple consumer shoud deal with these things itself.
My doubt is what happened when I call getOffsetsBefore(topic,
partition,OffsetRequest.LatestTime(), maxNumOffsets) ? Where did it fetch
offset as I didn't store the offset, it seems that kafka maintain the
offset, anybody can give some explanation.

public void open(Map conf, TopologyContext context,
SpoutOutputCollector collector) {
_collector = collector;
_consumer = new SimpleConsumer(host, port, soTimeout, buffersize);
long[] offsets = _consumer.getOffsetsBefore(topic, partition,
OffsetRequest.LatestTime(), maxNumOffsets);
offset = offsets[0];
new StringScheme();

@Override
public void nextTuple() {
FetchRequest fetch = new FetchRequest(topic, partition, offset, maxSize);
ByteBufferMessageSet msgSet = _consumer.fetch(fetch);
for (MessageAndOffset msgAndOffset : msgSet) {
String msg = getMessage(msgAndOffset.message());
// log spout process time
Debug.log(this.getClass().getSimpleName(), msg);
Debug.incr(topic + "_" + this.getClass().getSimpleName(), 1);
_collector
.emit(new Values(msg), new KafkaMessageId(msg, offset, 1));
offset = msgAndOffset.offset();

What exactly happens if fetch size is smaller than the next batch (0.72 and high-level consumer)

0
0
Hello -- I'll try to look at the code, but I'm seeing something here
and I want to be *sure* I'm correct.

Say a batch sitting in a 0.72 partition is, say, 5MB in size. An
instance of a high-level consumer has a configured fetch size of
300KB. This actually becomes the "maxSize" value, right, in
FetchRequest? So in this example does the high-level consumer stall?

Thanks,

Philip

Send multiple messages to multiple topics in one request

0
0
Version 7 of Kafka allowed to send messages to multiple topics within one
request. I assume I can do
samething in version 8, am I correct? Does my code piece look correct?
List<KeyedMessage<String, String>> topicsMsgsList = new ArrayList();
KeyedMessage<String, String> data1 = new KeyedMessage<String,
String>(topic1, msg);
KeyedMessage<String, String> data2 = new KeyedMessage<String,
String>(topic2, msg);
topicsMsgsList.add(data1);
topicsMsgsList.add(data2);
producer.send( topicsMsgsList );

Thanks,
Vadim

Code Example for createMessageStreams

0
0
Good evening. Would it be possible to get sample code where the api bellow
is used in the consumer with sample of Decoder.
public <K,V> Map<String, List<KafkaStream<K,V>>>
createMessageStreams(Map<String, Integer> topicCountMap, Decoder<K>
keyDecoder, Decoder<V> valueDecoder);

Thanks so much in advance,
Vadim

Help with kafka-hadoop loader pipeline ???

0
0
Hi

We were trying to use the kafka-hadoop loader for loading messages from
kafka into the hadoop ecosystem
https://github.com/michal-harish/kafka-hadoop-loader

Our consumer is hadoop in this case. The code runs saying job successfull
and a _SUCESS file is created in hdfs.

But no output file is created in the given hdfs path, we tried sending both
json as well as protobuf files through kafka.

Please help if anyone has any idea on loading protobuf files into hadoop.

Kafka Hadoop Consumer for multiple brokers

0
0
Hi,

I was going through the hadoop-consumer in the contrib folder. There is
property that asks for the kafka server URI. This might sound silly but
from looking at it, it seems to be only for a single kafka broker.

What we have multiple brokers, how do we implement the hadoop-consumer for
it?

Regards,
Samir

kafka 0.8

0
0
Hello,

Just wanted to know where we are with the beta release for 0.8? More
importantly, is 0.8 going to be publicly available from a maven repository?
How about different versions of 0.8 built for different versions of scala?
(for example scala 2.8 vs 2.9 etc.)

Much appreciated.

Soby Chacko

facing the same problem

0
0
http://mail-archives.apache.org/mod_mbox/kafka-users/201301.mbox/%3CCAAG86fpamV2XeB5=XX19XC81q1F93AvN9G3r3hYjqZ-LnxxSow [ at ] mail.gmail.com%3E

Regards,
Gaurang Jhawar
USC
www.linkedin.com/pub/gaurang-jhawar/63/a24/6a6/

sync and async

0
0
Hi

What is the difference between an aync producer and a sync producer
with request.required.acks=0? Is there any case where an
sync producer with request.required.acks=0 is used?
Thanks.

Regards,

Libo
Viewing all 1519 articles
Browse latest View live




Latest Images