com.addthis:ahocorasick

Java implementation of Aho-Corasick dictionary matching algorithm

Лицензия	Лицензия Apache License, Version 2.0
Группа	Группа com.addthis
Идентификатор	Идентификатор ahocorasick
Последняя версия	Последняя версия 1.5.4
Дата	Дата 19 дек. 2015 г.
Тип	Тип jar
Описание	Описание Java implementation of Aho-Corasick dictionary matching algorithm
Ссылка на сайт	Ссылка на сайт https://github.com/addthis/aho-corasick
Организация-разработчик	Организация-разработчик AddThis
Система контроля версий	Система контроля версий https://github.com/addthis/aho-corasick

Скачать ahocorasick

Имя Файла	Размер
ahocorasick-1.5.4.pom
ahocorasick-1.5.4.jar	17 KB
ahocorasick-1.5.4-sources.jar	14 KB
ahocorasick-1.5.4-javadoc.jar	50 KB
Обзор

Как подключить последнюю версию

Apache Maven

<!-- https://jarcasting.com/artifacts/com.addthis/ahocorasick/ -->
<dependency>
    <groupId>com.addthis</groupId>
    <artifactId>ahocorasick</artifactId>
    <version>1.5.4</version>
</dependency>

Gradle Groovy

// https://jarcasting.com/artifacts/com.addthis/ahocorasick/
implementation 'com.addthis:ahocorasick:1.5.4'

Gradle Kotlin

// https://jarcasting.com/artifacts/com.addthis/ahocorasick/
implementation ("com.addthis:ahocorasick:1.5.4")

Apache Buildr

'com.addthis:ahocorasick:jar:1.5.4'

Apache Ivy

<dependency org="com.addthis" name="ahocorasick" rev="1.5.4">
  <artifact name="ahocorasick" type="jar" />
</dependency>

Groovy Grape

@Grapes(
@Grab(group='com.addthis', module='ahocorasick', version='1.5.4')
)

Scala SBT

libraryDependencies += "com.addthis" % "ahocorasick" % "1.5.4"

Leiningen

[com.addthis/ahocorasick "1.5.4"]

Зависимости

compile (3)

Идентификатор библиотеки	Тип	Версия
com.goldmansachs : gs-collections	jar	6.2.0
com.google.guava : guava	jar	17.0
org.slf4j : slf4j-api	jar	1.7.7

test (3)

Идентификатор библиотеки	Тип	Версия
junit : junit	jar	4.11
org.apache.commons : commons-lang3	jar	3.4
org.slf4j : slf4j-simple	jar	1.7.7

Модули Проекта

Данный проект не имеет модулей.

Ahocorasick

Introduction

This is the source code distribution for an implementation of the Aho-Corasick automaton in Java. It has implemented a simplified form of the path compression technique described in [http://dx.doi.org/10.1109/INFCOM.2004.1354682](Tuck et al. 2004).

This library is releated under the Apache License Version 2.0. For license information please see LICENSE. This is a modified version of https://bitbucket.org/jlanchas/aho-corasick/. The jlanchas implemenation was released under the BSD 3-clause license and it is a modified version of the original code written by Danny Yoo and located at https://hkn.eecs.berkeley.edu/~dyoo/java/index.html.

Building the jar

To compile the jar, run mvn package.

Use

<dependency>
  <groupId>com.addthis</groupId>
  <artifactId>ahocorasick</artifactId>
  <version>latest-and-greatest</version>
</dependency>

You can either install locally, or releases will eventually make their way to maven central.

Helper methods in the AhoCorasick class

To add strings to a tree now you can use the method #add(String), instead of #add(byte[] bytes, Object output).

To search strings now you have two options:

A progressive search, like in the previous version. The progressiveSearch call makes the first search and the next method advances in the search, providing the successive results. See example 1.
A complete search, in one call. The flags in the completeSearch method are used to indicate ** if overlapped results are allowed (true) or not (false). See example 2. ** if the method should return only outputs formed with valid tokens (using the StandardTokenizer provided by Lucene). See example 3.

Considering only tokens to create valid outputs

Optionally, you can indicate in the completeSearch methods that only tokens in the input text should be considered to located substrings. In the basic use, if you add to your tree the string al Ma and you search it in the input text Real Madrid, you will get one result. If you force the algorithm to consider only tokens (see example 3) you will not get results, because neither al nor Ma are tokens.

The tokenizer used is the StandardTokenizer provided by Lucene.

Examples

Example 1

A progressive search, like in the previous version.

:::java
	AhoCorasick tree = AhoCorasick.builder().build();
	tree.add("Input");
	tree.prepare();
	String inputText = "Input text";
	for (Iterator<SearchResult> iter = tree.progressiveSearch(inputText); iter.hasNext();) {
		SearchResult result = (SearchResult) iter.next();
		termsThatHit.addAll(result.getOutputs());
	}

Example 2

A complete search in one call, removing the overlapped results.

:::java
	AhoCorasick tree = AhoCorasick.builder().build();
	tree.add("Input");
	tree.add("In");
	tree.add("put");
	tree.add("Input text");
	tree.prepare();
	String inputText = "Input text";
	List<OutputResult> results = tree.completeSearch(inputText, false, false); // One result: 'Input text'

Example 3

Considering only tokens to create valid outputs.

:::java
	AhoCorasick tree = AhoCorasick.builder().build();
	tree.add("Input");
	tree.add("ut text");
	tree.add("text");
	tree.prepare();
	String inputText = "Input text";
	List<OutputResult> results = tree.completeSearch(inputText, true, true); // Two results: 'Input' and 'text'

AddThis

Версии библиотеки

Версия
1.5.4 19 дек. 2015 г.
1.5.3 18 дек. 2015 г.
1.5.2 18 дек. 2015 г.
1.5.1 18 дек. 2015 г.

com.addthis:ahocorasick

Лицензия

Группа

Идентификатор

Последняя версия

Дата

Тип

Описание

Ссылка на сайт

Организация-разработчик

Система контроля версий