Genetic Programming on Grok Regex Discovery for Spark

Spark algorithm that uses Genetic Programming discovers the Grok regex for a target set of text messages

Лицензия

Лицензия

MIT
Группа

Группа

com.github.chen0040
Идентификатор

Идентификатор

spark-ml-regex-generator
Последняя версия

Последняя версия

1.0.1
Дата

Дата

Тип

Тип

jar
Описание

Описание

Genetic Programming on Grok Regex Discovery for Spark
Spark algorithm that uses Genetic Programming discovers the Grok regex for a target set of text messages
Ссылка на сайт

Ссылка на сайт

https://github.com/chen0040/spark-ml-regex-generator
Система контроля версий

Система контроля версий

https://github.com/chen0040/spark-ml-regex-generator

Скачать spark-ml-regex-generator

Как подключить последнюю версию

<!-- https://jarcasting.com/artifacts/com.github.chen0040/spark-ml-regex-generator/ -->
<dependency>
    <groupId>com.github.chen0040</groupId>
    <artifactId>spark-ml-regex-generator</artifactId>
    <version>1.0.1</version>
</dependency>
// https://jarcasting.com/artifacts/com.github.chen0040/spark-ml-regex-generator/
implementation 'com.github.chen0040:spark-ml-regex-generator:1.0.1'
// https://jarcasting.com/artifacts/com.github.chen0040/spark-ml-regex-generator/
implementation ("com.github.chen0040:spark-ml-regex-generator:1.0.1")
'com.github.chen0040:spark-ml-regex-generator:jar:1.0.1'
<dependency org="com.github.chen0040" name="spark-ml-regex-generator" rev="1.0.1">
  <artifact name="spark-ml-regex-generator" type="jar" />
</dependency>
@Grapes(
@Grab(group='com.github.chen0040', module='spark-ml-regex-generator', version='1.0.1')
)
libraryDependencies += "com.github.chen0040" % "spark-ml-regex-generator" % "1.0.1"
[com.github.chen0040/spark-ml-regex-generator "1.0.1"]

Зависимости

compile (7)

Идентификатор библиотеки Тип Версия
org.slf4j : slf4j-api jar 1.7.20
org.slf4j : slf4j-log4j12 jar 1.7.20
com.github.chen0040 : java-data-frame jar 1.0.11
com.github.chen0040 : spark-ml-commons jar 1.0.1
org.apache.spark : spark-core_2.10 jar 1.6.0
com.github.chen0040 : spark-ml-genetic-programming jar 1.0.5
io.thekraken : grok jar 0.1.4

provided (1)

Идентификатор библиотеки Тип Версия
org.projectlombok : lombok jar 1.16.6

test (10)

Идентификатор библиотеки Тип Версия
org.testng : testng jar 6.9.10
org.hamcrest : hamcrest-core jar 1.3
org.hamcrest : hamcrest-library jar 1.3
org.assertj : assertj-core jar 3.5.2
org.powermock : powermock-core jar 1.6.5
org.powermock : powermock-api-mockito jar 1.6.5
org.powermock : powermock-module-junit4 jar 1.6.5
org.powermock : powermock-module-testng jar 1.6.5
org.mockito : mockito-core jar 2.0.2-beta
org.mockito : mockito-all jar 2.0.2-beta

Модули Проекта

Данный проект не имеет модулей.

spark-ml-regex-generator

Spark implementation that takes a set of texts and use genetic programming which discover regex for grok that will match other similar texts

Install

Add the following dependency to your POM file:

<dependency>
  <groupId>com.github.chen0040</groupId>
  <artifactId>spark-ml-regex-generator</artifactId>
  <version>1.0.1</version>
</dependency>

Usage

The sample code below shows how the gp regex cultivator discover the regex for the message "":

GpCultivator generator = new GpCultivator();
      generator.setDisplayEvery(2);
      generator.setPopulationSize(1000);
      generator.setMaxGenerations(50);

List<String> trainingData = new ArrayList<>();
trainingData.add("user root login at 127.0.0.1");

JavaSparkContext context = SparkContextFactory.createSparkContext("testing-1");
Grok generated_grok = generator.fit(context.parallelize(trainingData));

System.out.println("user root login at 127.0.0.1");
System.out.println(generator.getRegex()); // this is the regex generated


Match matched = generated_grok.match("user root login at 127.0.0.1");
matched.captures();
System.out.println(matched.toJson());

Below is the print out from the sample code above:

...
Generation: 4 (Pop: 1000), elapsed: 3 seconds
Global Cost: 0.2	Current Cost: 0.2
...
Global Cost: 0.14285714285714285	Current Cost: 0.16666666666666666
user root login at 127.0.0.1
%{LOGLEVEL} %{USER} %{URIPROTO} %{URIHOST} %{IPV4}
{"IPORHOST":"at","IPV4":"127.0.0.1","LOGLEVEL":"er","URIHOST":"at","URIPROTO":"login","USER":"root"}

Версии библиотеки

Версия
1.0.1