com.groupon.dse:baryon

A library for building Spark streaming applications that consume data from Kafka.

Лицензия

Лицензия

Группа

Группа

com.groupon.dse
Идентификатор

Идентификатор

baryon
Последняя версия

Последняя версия

1.0
Дата

Дата

Тип

Тип

jar
Описание

Описание

com.groupon.dse:baryon
A library for building Spark streaming applications that consume data from Kafka.
Ссылка на сайт

Ссылка на сайт

https://github.com/groupon/baryon
Система контроля версий

Система контроля версий

https://github.com/groupon/baryon

Скачать baryon

Как подключить последнюю версию

<!-- https://jarcasting.com/artifacts/com.groupon.dse/baryon/ -->
<dependency>
    <groupId>com.groupon.dse</groupId>
    <artifactId>baryon</artifactId>
    <version>1.0</version>
</dependency>
// https://jarcasting.com/artifacts/com.groupon.dse/baryon/
implementation 'com.groupon.dse:baryon:1.0'
// https://jarcasting.com/artifacts/com.groupon.dse/baryon/
implementation ("com.groupon.dse:baryon:1.0")
'com.groupon.dse:baryon:jar:1.0'
<dependency org="com.groupon.dse" name="baryon" rev="1.0">
  <artifact name="baryon" type="jar" />
</dependency>
@Grapes(
@Grab(group='com.groupon.dse', module='baryon', version='1.0')
)
libraryDependencies += "com.groupon.dse" % "baryon" % "1.0"
[com.groupon.dse/baryon "1.0"]

Зависимости

compile (14)

Идентификатор библиотеки Тип Версия
org.apache.kafka : kafka_2.10 jar 0.8.1.1
com.101tec : zkclient jar 0.7
com.groupon.dse : spark-metrics jar 1.0
org.apache.zookeeper : zookeeper jar 3.4.6
org.json4s : json4s-core_2.10 jar 3.2.10
org.json4s : json4s-jackson_2.10 jar 3.2.10
org.scala-lang : scala-library jar 2.10.4
org.slf4j : slf4j-api jar 1.7.10
com.typesafe.play : play-ws_2.10 jar 2.3.10
com.typesafe.play : play-json_2.10 jar 2.3.10
com.fasterxml.jackson.core : jackson-databind jar 2.4.4
com.fasterxml.jackson.module : jackson-module-scala_2.10 jar 2.4.4
com.fasterxml.jackson.core : jackson-core jar 2.4.4
com.ning : async-http-client jar 1.9.21

provided (4)

Идентификатор библиотеки Тип Версия
org.apache.spark : spark-core_2.10 jar 1.5.2
org.apache.spark : spark-streaming_2.10 jar 1.5.2
log4j : log4j jar 1.2.17
org.apache.hadoop : hadoop-common jar 2.2.0

test (2)

Идентификатор библиотеки Тип Версия
org.mockito : mockito-all jar 1.10.8
org.scalatest : scalatest_2.10 jar 2.2.4

Модули Проекта

Данный проект не имеет модулей.

Baryon

Baryon is a library for building Spark streaming applications that consume data from Kafka.

Baryon abstracts away all the bookkeeping involved in reliably connecting to a Kafka cluster and fetching data from it, so that users only need to focus on the logic to process this data.

For a detailed guide on getting started with Baryon, take a look at the wiki.

Why Baryon?

Spark itself also has libraries for interacting with Kafka, as documented in its Kafka integration guide. These libraries are well-developed, but there are certain limitations there that Baryon intends to address:

  • Code-independent checkpointing

    Baryon's Kafka state management system allows Kafka consumption state to be stored across multiple runs of an application, even when there are code changes. Spark's checkpointing system does not support maintaining state across changes in code, so users of Spark's Kafka libraries must implement the offset management logic themselves.

  • Improved error handling

    Baryon handles errors related to Kafka much more thoroughly than Spark's Kafka libraries, so users don't need to worry about handling Kafka problems in their code.

In addition to the above, there are a handful of additional features unique to Baryon:

  • Multiple consumption modes

    Baryon has two modes of consumption, the blocking mode and the non-blocking mode, which can be changed without any code changes. The blocking mode more or less corresponds to the consumption behavior of the "direct" approach, while the non-blocking mode has consumption behavior similar to the receiver-based approach.

  • Dynamically configured topics

    Baryon supports changes to the set of Kafka topics that are consumed while the application is running. Alongside this, configurations can be set at a per-topic level, which makes it easier to build a single application to process multiple, heterogeneous data streams.

  • Aggregated metrics

    Baryon uses the spark-metrics library to collect and aggregate useful metrics across the driver and executors. These include metrics like offset lag, throughput, error rates, as well as augmented versions of existing metrics that Spark provides. The metrics here are integrated with Spark's metrics system, so they are compatible with the reporting system that comes with Spark.

Quick Start

Add Baryon as a dependency:

<dependency>
    <groupId>com.groupon.dse</groupId>
    <artifactId>baryon</artifactId>
    <version>1.0</version>
</dependency>

If you want to add custom metrics that are integrated with Spark, use the spark-metrics that Baryon also uses:

<dependency>
    <groupId>com.groupon.dse</groupId>
    <artifactId>spark-metrics</artifactId>
    <version>1.0</version>
</dependency>

Take a look at the examples to see how to write the driver and a ReceiverPlugin.

com.groupon.dse

Groupon

Версии библиотеки

Версия
1.0