Duke

Duke is a configurable record linkage engine.

Лицензия	Лицензия The Apache Software License, Version 2.0
Группа	Группа no.priv.garshol.duke
Идентификатор	Идентификатор duke
Последняя версия	Последняя версия 1.2
Дата	Дата 15 февр. 2014 г.
Тип	Тип jar
Описание	Описание Duke Duke is a configurable record linkage engine.
Ссылка на сайт	Ссылка на сайт https://github.com/larsga/Duke
Система контроля версий	Система контроля версий https://github.com/larsga/Duke

Скачать duke

Имя Файла	Размер
duke-1.2.pom
duke-1.2.jar	203 KB
duke-1.2-sources.jar	120 KB
duke-1.2-javadoc.jar	619 KB
Обзор

Как подключить последнюю версию

Apache Maven

<!-- https://jarcasting.com/artifacts/no.priv.garshol.duke/duke/ -->
<dependency>
    <groupId>no.priv.garshol.duke</groupId>
    <artifactId>duke</artifactId>
    <version>1.2</version>
</dependency>

Gradle Groovy

// https://jarcasting.com/artifacts/no.priv.garshol.duke/duke/
implementation 'no.priv.garshol.duke:duke:1.2'

Gradle Kotlin

// https://jarcasting.com/artifacts/no.priv.garshol.duke/duke/
implementation ("no.priv.garshol.duke:duke:1.2")

Apache Buildr

'no.priv.garshol.duke:duke:jar:1.2'

Apache Ivy

<dependency org="no.priv.garshol.duke" name="duke" rev="1.2">
  <artifact name="duke" type="jar" />
</dependency>

Groovy Grape

@Grapes(
@Grab(group='no.priv.garshol.duke', module='duke', version='1.2')
)

Scala SBT

libraryDependencies += "no.priv.garshol.duke" % "duke" % "1.2"

Leiningen

[no.priv.garshol.duke/duke "1.2"]

Зависимости

compile (4)

Идентификатор библиотеки	Тип	Версия
org.apache.lucene : lucene-core	jar	4.0.0
org.apache.lucene : lucene-analyzers-common	jar	4.0.0
org.apache.lucene : lucene-spatial	jar	4.0.0
org.mapdb : mapdb	jar	0.9.9

provided (2)

Идентификатор библиотеки	Тип	Версия
javax.servlet : servlet-api	jar	2.4
org.codehaus.fabric3.api : commonj	jar	1.1.1

test (2)

Идентификатор библиотеки	Тип	Версия
junit : junit	jar	4.11
com.h2database : h2	jar	1.3.154

Модули Проекта

Данный проект не имеет модулей.

Duke

Duke is a fast and flexible deduplication (or entity resolution, or record linkage) engine written in Java on top of Lucene. The latest version is 1.2 (see ReleaseNotes).

Duke can find duplicate customer records, or other kinds of records in your database. Or you can use it to connect records in one data set with other records representing the same thing in another data set. Duke has sophisticated comparators that can handle spelling differences, numbers, geopositions, and more. Using a probabilistic model Duke can handle noisy data with good accuracy.

Features

High performance.
Highly configurable.
Support for CSV, JDBC, SPARQL, NTriples, and JSON.
Many built-in comparators.
Plug in your own data sources, comparators, and cleaners.
Genetic algorithm for automatically tuning configurations.
Command-line client for getting started.
API for embedding into any kind of application.
Support for batch processing and continuous processing.
Can maintain database of links found via JNDI/JDBC.
Can run in multiple threads.

The GettingStarted page explains how to get started and has links to further documentation. The examples of use page lists real examples of using Duke, complete with data and configurations. This presentation has more of the big picture and background.

Contributions, whether issue reports or patches, are very much welcome. Please fork the repository and make pull requests.

Supports Java 1.7 and 1.8.

If you have questions or problems, please register an issue in the issue tracker, or post to the the mailing list. If you don't want to join the list you can always write to me at larsga [a] garshol.priv.no, too.

Using Duke with Maven

Duke is hosted in Maven Central, so if you want to use Duke it's as easy as including the following in your pom file:

<dependency>
  <groupId>no.priv.garshol.duke</groupId>
  <artifactId>duke</artifactId>
  <version>1.2</version>
</dependency>

Building the source

If you have Maven installed, this is as easy as giving the command mvn package in the root directory. This will produce a .jar file in the target/ subdirectory of each module.

Older documentation

This blog post describes the basic approach taken to match records. It does not deal with the Lucene-based lookup, but describes an early, slow O(n^2) prototype. This early presentation describes the ideas behind the engine and the intended architecture

Версии библиотеки

Версия
1.2 15 февр. 2014 г.
1.1 19 окт. 2013 г.
1.0 2 мар. 2013 г.
0.6 15 сент. 2012 г.

Duke

Лицензия

Группа

Идентификатор

Последняя версия

Дата

Тип

Описание

Ссылка на сайт

Система контроля версий

Скачать duke

Как подключить последнюю версию

Зависимости

compile (4)

provided (2)

test (2)

Модули Проекта

Duke

Using Duke with Maven

Building the source

Older documentation

Версии библиотеки