pravda-ml


Лицензия

Лицензия

Группа

Группа

ru.odnoklassniki
Идентификатор

Идентификатор

pravda-ml_2.11
Последняя версия

Последняя версия

0.6.2
Дата

Дата

Тип

Тип

jar
Описание

Описание

pravda-ml
pravda-ml
Ссылка на сайт

Ссылка на сайт

https://github.com/odnoklassniki/pravda-ml
Организация-разработчик

Организация-разработчик

ru.odnoklassniki
Система контроля версий

Система контроля версий

https://github.com/odnoklassniki/pravda-ml

Скачать pravda-ml_2.11

Как подключить последнюю версию

<!-- https://jarcasting.com/artifacts/ru.odnoklassniki/pravda-ml_2.11/ -->
<dependency>
    <groupId>ru.odnoklassniki</groupId>
    <artifactId>pravda-ml_2.11</artifactId>
    <version>0.6.2</version>
</dependency>
// https://jarcasting.com/artifacts/ru.odnoklassniki/pravda-ml_2.11/
implementation 'ru.odnoklassniki:pravda-ml_2.11:0.6.2'
// https://jarcasting.com/artifacts/ru.odnoklassniki/pravda-ml_2.11/
implementation ("ru.odnoklassniki:pravda-ml_2.11:0.6.2")
'ru.odnoklassniki:pravda-ml_2.11:jar:0.6.2'
<dependency org="ru.odnoklassniki" name="pravda-ml_2.11" rev="0.6.2">
  <artifact name="pravda-ml_2.11" type="jar" />
</dependency>
@Grapes(
@Grab(group='ru.odnoklassniki', module='pravda-ml_2.11', version='0.6.2')
)
libraryDependencies += "ru.odnoklassniki" % "pravda-ml_2.11" % "0.6.2"
[ru.odnoklassniki/pravda-ml_2.11 "0.6.2"]

Зависимости

compile (14)

Идентификатор библиотеки Тип Версия
org.scala-lang : scala-library jar 2.11.8
com.google.guava : guava jar 16.0.1
org.apache.spark : spark-core_2.11 jar 2.4.4
org.apache.spark : spark-mllib_2.11 jar 2.4.4
org.apache.spark : spark-sql_2.11 jar 2.4.4
org.apache.spark : spark-streaming_2.11 jar 2.4.4
com.esotericsoftware : kryo jar 4.0.1
org.apache.lucene : lucene-core jar 5.4.1
org.apache.lucene : lucene-analyzers-common jar 5.4.1
com.optimaize.languagedetector : language-detector jar 0.6
com.tdunning : t-digest jar 3.2
ml.dmlc : xgboost4j_2.11 jar 1.1.1
ml.dmlc : xgboost4j-spark_2.11 jar 1.1.1
org.mlflow : mlflow-client jar 1.2.0

test (2)

Идентификатор библиотеки Тип Версия
org.scalatest : scalatest_2.11 jar 3.0.4
org.mockito : mockito-core jar 2.13.0

Модули Проекта

Данный проект не имеет модулей.

PravdaML

This project is used to define machine learning pipelines on top of Spark and was formerly known as ok-ml-pipelines. This an extension, not a replacement, of the Spark ML package with a focus on structural aspects of distributed machine learning deployments. Core features added by the project are:

  • Ability to add "transparent" technical stages to ML pipeline (eg. caching, sampling, repartitioning, etc.) - these stages are included into learning pipeline, but then automatically excluded from the resulting model not to influence inference performance.
  • Ability to execute certain pipeline stages in parallel to achieve better cluster utilization - provides an order of magnitude improvement for cross-validation, model segmentation, grid search and other ML stages with external parallelism.
  • Ability to collect extra information about the model (learning curve history, weights statistics and etc.) in a form of DataFrame greatly simplifies analysis of the learning process and helps to identify potential improvements.
  • Improved model evaluation capabilities allowing for extra metrics, including non-scalar (eg. full ROC-curve), and statistical analysis of the metrics.
  • Bayesian hyperparameter optimization (based on Photon-ML https://github.com/linkedin/photon-ml)

In addition to structural improvements there are few ML algorithms incorporated:

  • Language detection and preprocessing with a focus on ex-USSR languages.
  • LSH-based deduplication for texts.
  • Improved distributed implementation of variance reduced SGD.
  • Multi-label version of LBFGS with a matrix gradient.
  • Feature selection based on the stability of features importance in cross-validation.
  • Improved XGBoost integration (based on DLMC XGBoost for Spark https://xgboost.readthedocs.io/en/latest/jvm/xgboost4j_spark_tutorial.html)

Slides available from JBreak 2018 demo: https://cloud.mail.ru/public/77xY/GKAfB3mjn

Set of usage examples available on Zepl:

ru.odnoklassniki

OK.ru

Most famous Russian social network

Версии библиотеки

Версия
0.6.2
0.6.1-spark2.3
0.6.1
0.6.0-spark2.3
0.6.0
0.5.6
0.5.5
0.5.4
0.5.3
0.5.2