ok-ml-pipelines


Лицензия

Лицензия

Группа

Группа

ru.odnoklassniki
Идентификатор

Идентификатор

ok-ml-pipelines_2.10
Последняя версия

Последняя версия

0.2-spark1.6
Дата

Дата

Тип

Тип

jar
Описание

Описание

ok-ml-pipelines
ok-ml-pipelines
Ссылка на сайт

Ссылка на сайт

https://github.com/odnoklassniki/ok-ml-pipelines
Организация-разработчик

Организация-разработчик

ru.odnoklassniki
Система контроля версий

Система контроля версий

https://github.com/odnoklassniki/ok-ml-pipelines

Скачать ok-ml-pipelines_2.10

Как подключить последнюю версию

<!-- https://jarcasting.com/artifacts/ru.odnoklassniki/ok-ml-pipelines_2.10/ -->
<dependency>
    <groupId>ru.odnoklassniki</groupId>
    <artifactId>ok-ml-pipelines_2.10</artifactId>
    <version>0.2-spark1.6</version>
</dependency>
// https://jarcasting.com/artifacts/ru.odnoklassniki/ok-ml-pipelines_2.10/
implementation 'ru.odnoklassniki:ok-ml-pipelines_2.10:0.2-spark1.6'
// https://jarcasting.com/artifacts/ru.odnoklassniki/ok-ml-pipelines_2.10/
implementation ("ru.odnoklassniki:ok-ml-pipelines_2.10:0.2-spark1.6")
'ru.odnoklassniki:ok-ml-pipelines_2.10:jar:0.2-spark1.6'
<dependency org="ru.odnoklassniki" name="ok-ml-pipelines_2.10" rev="0.2-spark1.6">
  <artifact name="ok-ml-pipelines_2.10" type="jar" />
</dependency>
@Grapes(
@Grab(group='ru.odnoklassniki', module='ok-ml-pipelines_2.10', version='0.2-spark1.6')
)
libraryDependencies += "ru.odnoklassniki" % "ok-ml-pipelines_2.10" % "0.2-spark1.6"
[ru.odnoklassniki/ok-ml-pipelines_2.10 "0.2-spark1.6"]

Зависимости

compile (10)

Идентификатор библиотеки Тип Версия
org.scala-lang : scala-library jar 2.10.7
org.apache.spark : spark-core_2.10 jar 1.6.3
org.apache.spark : spark-mllib_2.10 jar 1.6.3
org.apache.spark : spark-sql_2.10 jar 1.6.3
org.apache.spark : spark-streaming_2.10 jar 1.6.3
com.esotericsoftware : kryo jar 4.0.1
org.apache.lucene : lucene-core jar 5.4.1
org.apache.lucene : lucene-analyzers-common jar 5.4.1
com.optimaize.languagedetector : language-detector jar 0.6
com.tdunning : t-digest jar 3.2

test (2)

Идентификатор библиотеки Тип Версия
org.scalatest : scalatest_2.10 jar 3.0.4
org.mockito : mockito-core jar 2.13.0

Модули Проекта

Данный проект не имеет модулей.

PravdaML

This project is used to define machine learning pipelines on top of Spark and was formerly known as ok-ml-pipelines. This an extension, not a replacement, of the Spark ML package with a focus on structural aspects of distributed machine learning deployments. Core features added by the project are:

  • Ability to add "transparent" technical stages to ML pipeline (eg. caching, sampling, repartitioning, etc.) - these stages are included into learning pipeline, but then automatically excluded from the resulting model not to influence inference performance.
  • Ability to execute certain pipeline stages in parallel to achieve better cluster utilization - provides an order of magnitude improvement for cross-validation, model segmentation, grid search and other ML stages with external parallelism.
  • Ability to collect extra information about the model (learning curve history, weights statistics and etc.) in a form of DataFrame greatly simplifies analysis of the learning process and helps to identify potential improvements.
  • Improved model evaluation capabilities allowing for extra metrics, including non-scalar (eg. full ROC-curve), and statistical analysis of the metrics.
  • Bayesian hyperparameter optimization (based on Photon-ML https://github.com/linkedin/photon-ml)

In addition to structural improvements there are few ML algorithms incorporated:

  • Language detection and preprocessing with a focus on ex-USSR languages.
  • LSH-based deduplication for texts.
  • Improved distributed implementation of variance reduced SGD.
  • Multi-label version of LBFGS with a matrix gradient.
  • Feature selection based on the stability of features importance in cross-validation.
  • Improved XGBoost integration (based on DLMC XGBoost for Spark https://xgboost.readthedocs.io/en/latest/jvm/xgboost4j_spark_tutorial.html)

Slides available from JBreak 2018 demo: https://cloud.mail.ru/public/77xY/GKAfB3mjn

Set of usage examples available on Zepl:

ru.odnoklassniki

OK.ru

Most famous Russian social network

Версии библиотеки

Версия
0.2-spark1.6
0.1-spark2.2
0.1-spark1.6