Genetic Programming Algorithms for Spark

Genetic Programming algorithms implemented in Java and for Apache Spark

Лицензия	Лицензия MIT
Категории	Категории Сеть
Группа	Группа com.github.chen0040
Идентификатор	Идентификатор spark-ml-genetic-programming
Последняя версия	Последняя версия 1.0.5
Дата	Дата 23 июн. 2017 г.
Тип	Тип jar
Описание	Описание Genetic Programming Algorithms for Spark Genetic Programming algorithms implemented in Java and for Apache Spark
Ссылка на сайт	Ссылка на сайт https://github.com/chen0040/spark-ml-genetic-programming
Система контроля версий	Система контроля версий https://github.com/chen0040/spark-ml-genetic-programming

Скачать spark-ml-genetic-programming

Имя Файла	Размер
spark-ml-genetic-programming-1.0.5.pom
spark-ml-genetic-programming-1.0.5.jar	11 KB
spark-ml-genetic-programming-1.0.5-sources.jar	3 KB
spark-ml-genetic-programming-1.0.5-javadoc.jar	29 KB
Обзор

Как подключить последнюю версию

Apache Maven

<!-- https://jarcasting.com/artifacts/com.github.chen0040/spark-ml-genetic-programming/ -->
<dependency>
    <groupId>com.github.chen0040</groupId>
    <artifactId>spark-ml-genetic-programming</artifactId>
    <version>1.0.5</version>
</dependency>

Gradle Groovy

// https://jarcasting.com/artifacts/com.github.chen0040/spark-ml-genetic-programming/
implementation 'com.github.chen0040:spark-ml-genetic-programming:1.0.5'

Gradle Kotlin

// https://jarcasting.com/artifacts/com.github.chen0040/spark-ml-genetic-programming/
implementation ("com.github.chen0040:spark-ml-genetic-programming:1.0.5")

Apache Buildr

'com.github.chen0040:spark-ml-genetic-programming:jar:1.0.5'

Apache Ivy

<dependency org="com.github.chen0040" name="spark-ml-genetic-programming" rev="1.0.5">
  <artifact name="spark-ml-genetic-programming" type="jar" />
</dependency>

Groovy Grape

@Grapes(
@Grab(group='com.github.chen0040', module='spark-ml-genetic-programming', version='1.0.5')
)

Scala SBT

libraryDependencies += "com.github.chen0040" % "spark-ml-genetic-programming" % "1.0.5"

Leiningen

[com.github.chen0040/spark-ml-genetic-programming "1.0.5"]

Зависимости

compile (6)

Идентификатор библиотеки	Тип	Версия
org.slf4j : slf4j-api	jar	1.7.20
org.slf4j : slf4j-log4j12	jar	1.7.20
com.github.chen0040 : java-data-frame	jar	1.0.11
com.github.chen0040 : spark-ml-commons	jar	1.0.1
org.apache.spark : spark-core_2.10	jar	1.6.0
com.github.chen0040 : java-genetic-programming	jar	1.0.14

provided (1)

Идентификатор библиотеки	Тип	Версия
org.projectlombok : lombok	jar	1.16.6

test (10)

Идентификатор библиотеки	Тип	Версия
org.testng : testng	jar	6.9.10
org.hamcrest : hamcrest-core	jar	1.3
org.hamcrest : hamcrest-library	jar	1.3
org.assertj : assertj-core	jar	3.5.2
org.powermock : powermock-core	jar	1.6.5
org.powermock : powermock-api-mockito	jar	1.6.5
org.powermock : powermock-module-junit4	jar	1.6.5
org.powermock : powermock-module-testng	jar	1.6.5
org.mockito : mockito-core	jar	2.0.2-beta
org.mockito : mockito-all	jar	2.0.2-beta

Модули Проекта

Данный проект не имеет модулей.

spark-ml-genetic-programming

Package provides java implementation of big-data genetic programming for Apache Spark

Install

Add the following dependency to your POM file:

<dependency>
  <groupId>com.github.chen0040</groupId>
  <artifactId>spark-ml-genetic-programming</artifactId>
  <version>1.0.5</version>
</dependency>

Features

Linear Genetic Programming
- Initialization
  - Full Register Array
  - Fixed-length Register Array
- Crossover
  - Linear
  - One-Point
  - One-Segment
- Mutation
  - Micro-Mutation
  - Effective-Macro-Mutation
  - Macro-Mutation
- Replacement
  - Tournament
  - Direct-Compete
- Default-Operators
  - Most of the math operators
  - if-less, if-greater
  - Support operator extension
Tree Genetic Programming
- Initialization
  - Full
  - Grow
  - PTC 1
  - Random Branch
  - Ramped Full
  - Ramped Grow
  - Ramped Half-Half
- Crossover
  - Subtree Bias
  - Subtree No Bias
- Mutation
  - Subtree
  - Subtree Kinnear
  - Hoist
  - Shrink
- Evolution Strategy
  - (mu + lambda)
  - TinyGP

Future Works

Grammar-based Genetic Programming
Gene Expression Programming

Usage of Linear Genetic Programming

Create training data

The sample code below shows how to generate data from the "Mexican Hat" regression problem. We can split the data generated into training and testing data:

import com.github.chen0040.gp.utils.CollectionUtils;

List<BasicObservation> data = Tutorials.mexican_hat().stream().map(s -> (BasicObservation)s).collect(Collectors.toList());
CollectionUtils.shuffle(data);
TupleTwo<List<BasicObservation>, List<BasicObservation>> split_data = CollectionUtils.split(data, 0.9);
List<BasicObservation> trainingData = split_data._1();
List<BasicObservation> testingData = split_data._2();

Create and train the LGP

The sample code below shows how the SparkLGP can be created and trained:

import com.github.chen0040.gp.lgp.LGP;
import com.github.chen0040.gp.commons.BasicObservation;
import com.github.chen0040.gp.commons.Observation;
import com.github.chen0040.gp.lgp.gp.Population;
import com.github.chen0040.gp.lgp.program.operators.*;

SparkLGP lgp = new SparkLGP();
lgp.getOperatorSet().addAll(new Plus(), new Minus(), new Divide(), new Multiply(), new Power());
lgp.getOperatorSet().addIfLessThanOperator();
lgp.addConstants(1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0);
lgp.setRegisterCount(6); // the number of register here is the number of input dimension of the training data times 3
lgp.setPerObservationCostEvaluator((Function<Tuple2<Program, BasicObservation>, Double>) tuple2 -> {
 Program program = tuple2._1();
 BasicObservation observation = tuple2._2();
 program.execute(observation);
 return Math.pow(observation.getOutput(0) - observation.getPredictedOutput(0), 2.0);
});
lgp.setDisplayEvery(2); // display iteration result every 2 iterations


JavaSparkContext context = SparkContextFactory.createSparkContext("testing-1");
Program program = lgp.fit(context.parallelize(trainingData)); 
System.out.println(program);

The number of registers of a linea program is set by calling LGP.setRegisterCount(...), the number of registers is usually the a multiple of the input dimension of a training data instance. For example if the training data has input (x, y) which is 2 dimension, then the number of registers may be set to 6 = 2 * 3.

The cost per observation evaluator computes the training cost of a 'program' on a particular 'observation' (which is an instance of trainingData).

The last line prints the linear program found by the LGP evolution, a sample of which is shown below:

instruction[1]: <If<	r[4]	c[0]	r[4]>
instruction[2]: <If<	r[3]	c[3]	r[0]>
instruction[3]: <-	r[2]	r[3]	r[2]>
instruction[4]: <*	c[7]	r[2]	r[2]>
instruction[5]: <If<	c[2]	r[3]	r[1]>
instruction[6]: </	r[1]	c[4]	r[2]>
instruction[7]: <If<	r[3]	c[7]	r[1]>
instruction[8]: <-	c[0]	r[0]	r[0]>
instruction[9]: <If<	c[7]	r[3]	r[4]>
...

Test the program obtained from the LGP evolution

The best program in the LGP population obtained from the training in the above step can then be used for prediction, as shown by the sample code below:

for(Observation observation : testingData) {
 program.execute(observation);
 double predicted = observation.getPredictedOutput(0);
 double actual = observation.getOutput(0);

 logger.info("predicted: {}\tactual: {}", predicted, actual);
}

Usage of Tree Genetic Programming

Here we will use the "Mexican Hat" symbolic regression introduced earlier.

Create and train the TreeGP

The sample code below shows how the TreeGP can be created and trained:

import com.github.chen0040.gp.treegp.TreeGP;
import com.github.chen0040.gp.commons.BasicObservation;
import com.github.chen0040.gp.commons.Observation;
import com.github.chen0040.gp.treegp.gp.Population;
import com.github.chen0040.gp.treegp.program.operators.*;

SparkTreeGP tgp = new SparkTreeGP();
tgp.getOperatorSet().addAll(new Plus(), new Minus(), new Divide(), new Multiply(), new Power());
tgp.getOperatorSet().addIfLessThanOperator();
tgp.addConstants(1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0);
tgp.setVariableCount(2); //equal to the number of input dimension of the training data
tgp.setPerObservationCostEvaluator(tuple2 -> {
 Solution program = tuple2._1();
 BasicObservation observation = tuple2._2();
 program.execute(observation);
 return Math.pow(observation.getOutput(0) - observation.getPredictedOutput(0), 2.0);
});
tgp.setDisplayEvery(2); // display iteration result every 2 iterations

JavaSparkContext context = SparkContextFactory.createSparkContext("testing-1");
Solution program = tgp.fit(context.parallelize(trainingData));

The cost per observation evaluator computes the training cost of a 'program' on a particular 'observation' (which is an instance of trainingData).

The program.mathExpress() call prints the TreeGP program found by the TreeGP evolution, a sample of which is shown below:

Trees[0]: 1.0 - (if(1.0 < if(1.0 < 1.0, if(1.0 < v0, 1.0, 1.0), if(1.0 < (v1 * v0) + (1.0 / 1.0), 1.0 + 1.0, 1.0)), 1.0, v0 ^ 1.0))

Test the program obtained from the TreeGP evolution

The best program in the TreeGP population obtained from the training in the above step can then be used for prediction, as shown by the sample code below:

for(Observation observation : testingData) {
 program.execute(observation);
 double predicted = observation.getPredictedOutput(0);
 double actual = observation.getOutput(0);

 logger.info("predicted: {}\tactual: {}", predicted, actual);
}

Версии библиотеки

Версия
1.0.5 23 июн. 2017 г.
1.0.4 23 июн. 2017 г.
1.0.3 23 июн. 2017 г.
1.0.2 20 июн. 2017 г.
1.0.1 6 июн. 2017 г.

Genetic Programming Algorithms for Spark

Лицензия

Категории

Группа

Идентификатор

Последняя версия

Дата

Тип

Описание

Ссылка на сайт

Система контроля версий