Zephyrnet Logo

Kafka Basics: Introduction to Kafka Consumer Group & How Can it Be Used?

Date:

What is Apache Kafka?

Image credit: Unsplash

Apache Kafka is a distributed data store designed for real-time data input and processing. Streaming data is data that is generated on a continuous basis by hundreds of data sources, which typically send data records in at the same time. A streaming platform must be able to cope with the constant influx of data and process it in a sequential and gradual manner.

Users can utilize Kafka for three different purposes:

Streams of records are published and subscribed to.
Streams of records should be stored in the same order that they were created.
Real-time processing of data streams
Kafka is most commonly used to create real-time streaming data pipelines and applications that adapt to data streams. It mixes communications, storage, and stream processing to provide both historical and real-time data storage and analysis.

Consumer group

A Kafka consumer group is a group of people who work together to consume info on a certain subject. The group’s consumers are divided into partitions for each of the themes. As new members join the group and old ones leave, the partitions are reassigned so that each member gets a proportional share of the partitions. Rebalancing the group is what it’s called.

The fundamental difference between the old “high-level” consumer and the new consumer is that the former relied on ZooKeeper for group management, whereas the latter relies on a Kafka-built group protocol. One of the brokers is designated as the group’s coordinator in this protocol, and he or she is in charge of managing the group’s members as well as their partition assignments.

Each group’s coordinator is chosen from among the leaders of the internal offsets subject __consumer offsets, which stores committed offsets. In essence, the group’s ID is hashed to one of the topic’s partitions, and the leader of that partition is chosen as the coordinator. As a result, consumer group administration is distributed fairly evenly across all brokers in the cluster, allowing the number of groups to grow as the number of brokers grows.

Kafka Consumer Configuration

Image credit: unsplash

A few of the most important configuration settings are outlined below, along with how they affect consumer behavior.

Default settings

The only mandatory setting is bootstrap.servers, but you should also set a client.id so you can easily link requests to the client instance that made them. In order to enforce client quotas, all consumers in a given group will usually have the same client ID.

Setting up a Group

If you’re using the simple assignment API and don’t need to keep offsets in Kafka, you should always configure group.id. The session.timeout.ms value can be overridden to adjust the session timeout.
The session.timeout.ms value can be overridden to adjust the session timeout. In the C/C++ and Java clients, the default timeout is 10 seconds, but you can extend it to avoid excessive rebalancing due to bad network connectivity or long GC pauses, for example.

The biggest disadvantage of adopting a longer session timeout is that it takes the coordinator longer to identify when a consumer instance has failed, which means it takes longer for another consumer in the group to take its partitions. Normal shutdowns, on the other hand, are triggered by the consumer sending an explicit request to the coordinator to quit the group, causing an immediate rebalance.

max.poll.interval.ms is another property that may have an impact on excessive rebalancing. This parameter sets the maximum amount of time between poll method calls before the consumer process is judged to have failed. If your programme requires more time to process messages than the default of 300 seconds, you can safely increase it.

Management of the Offset

The offset reset policy and whether auto-commit is enabled are the two most important options that affect offset management. First, if enable.auto.commit is set to true (which is the default), the consumer will commit offsets at the interval defined by auto.commit.interval.ms. The timeout is set to 5 seconds by default.
Auto-commit offsets are enabled by default on the consumer. You get “at least once” delivery by using auto-commit: Although Kafka ensures that no messages will be lost, it is possible that duplicates will occur. The auto.commit.interval.ms configuration property acts as a cron with a period chosen by the user. If the consumer crashes, all partitions held by the crashed consumer will have their positions reset to the last committed offset after a restart or rebalance. The last committed position could be as old as the auto-commit interval itself if this happens. Any messages received since the previous commit must be read anew.

Kafka Consumer Group Command Tool

Image credit: Unsplash

The status of consumer groups can be viewed using a Kafka admin utility.

Groups in alphabetical order

The kafka-consumer-groups function included with the Kafka distribution can be used to acquire a list of the active groups in the cluster. This could take a long time on a large cluster because it collects the list by inspecting each broker.
bin/kafka-consumer-groups –bootstrap-server host:9092 –list

Group Description

The kafka-consumer-groups function can also be used to gather data about a specific group. For instance, execute the following command to see the current assignments for the foo group:
bin/kafka-consumer-groups –bootstrap-server host:9092 –describe –group foo

Conclusion

In the above description we can see about Kafka consumer group and how it is used throughout the process.

Source: Plato Data Intelligence: PlatoData.io

spot_img

Latest Intelligence

spot_img

Chat with us

Hi there! How can I help you?