Correlated Iterators Processor

Sequentially process correlated data from sorted iterators.

Лицензия

Лицензия

Группа

Группа

com.teketik
Идентификатор

Идентификатор

cip
Последняя версия

Последняя версия

1.0
Дата

Дата

Тип

Тип

jar
Описание

Описание

Correlated Iterators Processor
Sequentially process correlated data from sorted iterators.
Ссылка на сайт

Ссылка на сайт

https://github.com/antoinemeyer/correlated-iterators-processor
Система контроля версий

Система контроля версий

https://github.com/antoinemeyer/correlated-iterators-processor

Скачать cip

Имя Файла Размер
cip-1.0.pom
cip-1.0.jar 14 KB
cip-1.0-sources.jar 6 KB
cip-1.0-javadoc.jar 59 KB
Обзор

Как подключить последнюю версию

<!-- https://jarcasting.com/artifacts/com.teketik/cip/ -->
<dependency>
    <groupId>com.teketik</groupId>
    <artifactId>cip</artifactId>
    <version>1.0</version>
</dependency>
// https://jarcasting.com/artifacts/com.teketik/cip/
implementation 'com.teketik:cip:1.0'
// https://jarcasting.com/artifacts/com.teketik/cip/
implementation ("com.teketik:cip:1.0")
'com.teketik:cip:jar:1.0'
<dependency org="com.teketik" name="cip" rev="1.0">
  <artifact name="cip" type="jar" />
</dependency>
@Grapes(
@Grab(group='com.teketik', module='cip', version='1.0')
)
libraryDependencies += "com.teketik" % "cip" % "1.0"
[com.teketik/cip "1.0"]

Зависимости

test (1)

Идентификатор библиотеки Тип Версия
org.junit.jupiter : junit-jupiter-api jar 5.7.1

Модули Проекта

Данный проект не имеет модулей.

Correlated Iterators Processor

The goal of this module is to offer a convenient and efficient way to iterate over correlated data contained within multiple sorted iterators.

Each sequential iteration of the resulting sorted stream allows the processing of the correlated data as a single unit of work.

Context

It is frequent for banking institutions to make available flat CSV files containing account information such as positions, transactions and/or other account information. Those files are usually sorted by account number and can become too large to mount entirely in memory.

Using Correlated Iterators Processor, it is possible to open a streamed iterator on those different files and process all the data related to an account as a chunk.

Example

Consider the two following data sets:

  • Data Set 1
Key Value
B value11
C value12
C value13
D value14
D value15
E value16
  • Data Set 2
Key Value
A value21
A value22
C value23
C value24
C value25
D value26
G value27

Opening an iterator on those two streams and running them through CIP using Key as the CorrelationKey would allow the following processing:

Key Data Set 1 Values Data Set 2 Values
A value21, value22
B value11
C value12, value13 value23, value24, value25
D value14, value15 value26
E value16
G value27

The corresponding java code would be:

CorrelatedIterables.correlate(
    dataSet1.iterator(), EntryA.class,
    dataSet2.iterator(), EntryB.class,
    new CorrelationDoubleStreamConsumer<String, EntryA, EntryB>() {
        @Override
        public void consume(String key, List<EntryA> aElements, List<EntryB> bElements) {
            //process the chunk
        }
    }
);

Usage

Maven dependency:

<dependency>
  <groupId>com.teketik</groupId>
  <artifactId>cip</artifactId>
  <version>1.0</version>
</dependency>

Main classes:

CorrelatedIterables contains a collection of convenient iterators to process multiple correlated iterators. If this does not contain what you need, have a look at CorrelatedIterable.

The java classes iterated should contain a field annotated with @CorrelationKey that will be used to find the correlations within all the iterators.

Версии библиотеки

Версия
1.0