ClearWSD CLI

Command line interfaces for non-programmatic training and experimentation.

Лицензия	Лицензия The GNU General Public License, Version 3
Категории	Категории CLI Взаимодействие с пользователем
Группа	Группа io.github.clearwsd
Идентификатор	Идентификатор clearwsd-cli
Последняя версия	Последняя версия 0.12.1
Дата	Дата 14 сент. 2020 г.
Тип	Тип jar
Описание	Описание ClearWSD CLI Command line interfaces for non-programmatic training and experimentation.

Скачать clearwsd-cli

Имя Файла	Размер
clearwsd-cli-0.12.1.pom
clearwsd-cli-0.12.1.jar	47 KB
clearwsd-cli-0.12.1-sources.jar	21 KB
clearwsd-cli-0.12.1-javadoc.jar	51 KB
Обзор

Как подключить последнюю версию

Apache Maven

<!-- https://jarcasting.com/artifacts/io.github.clearwsd/clearwsd-cli/ -->
<dependency>
    <groupId>io.github.clearwsd</groupId>
    <artifactId>clearwsd-cli</artifactId>
    <version>0.12.1</version>
</dependency>

Gradle Groovy

// https://jarcasting.com/artifacts/io.github.clearwsd/clearwsd-cli/
implementation 'io.github.clearwsd:clearwsd-cli:0.12.1'

Gradle Kotlin

// https://jarcasting.com/artifacts/io.github.clearwsd/clearwsd-cli/
implementation ("io.github.clearwsd:clearwsd-cli:0.12.1")

Apache Buildr

'io.github.clearwsd:clearwsd-cli:jar:0.12.1'

Apache Ivy

<dependency org="io.github.clearwsd" name="clearwsd-cli" rev="0.12.1">
  <artifact name="clearwsd-cli" type="jar" />
</dependency>

Groovy Grape

@Grapes(
@Grab(group='io.github.clearwsd', module='clearwsd-cli', version='0.12.1')
)

Scala SBT

libraryDependencies += "io.github.clearwsd" % "clearwsd-cli" % "0.12.1"

Leiningen

[io.github.clearwsd/clearwsd-cli "0.12.1"]

Зависимости

compile (5)

Идентификатор библиотеки	Тип	Версия
io.github.clearwsd : clearwsd-stanford	jar	0.12.1
org.mapdb : mapdb	jar	3.0.7
com.google.guava : guava	jar	27.0-jre
org.slf4j : slf4j-api	jar	1.7.25
ch.qos.logback : logback-classic	jar	1.2.3

provided (1)

Идентификатор библиотеки	Тип	Версия
org.projectlombok : lombok	jar	1.18.4

test (1)

Идентификатор библиотеки	Тип	Версия
junit : junit	jar	4.12

Модули Проекта

Данный проект не имеет модулей.

ClearWSD

ClearWSD is a word sense disambiguation tool for the JVM, with core modules available under an Apache 2.0 license. It provides simple APIs for integration with other libraries, as well as a command-line interface (CLI) for non-programmatic use. It is modular, allowing for alternative implementations of sub-components such as parsers or resources used for feature extraction.

It is meant for use in both research and production settings. Main features include

State-of-the-art results in verb sense disambiguation over VerbNet classes
Automatic optimization of feature subsets and hyperparameters
Production-ready pre-trained models
Easy training of new models using CLI
1000+ sense predictions per second on a 2014 MacBook Pro

API

The easiest way to make use of ClearWSD in your project is through Maven, by simply adding corresponding ClearWSD dependencies to your project's pom.xml.

Releases are distributed through Maven Central.

To try out ClearWSD in your project, you will need to include three modules, the first being clearwsd-core:

<dependency>
  <groupId>io.github.clearwsd</groupId>
  <artifactId>clearwsd-core</artifactId>
  <version>0.12.1</version>
</dependency>

and the second being a parser module, used for pre-processing and feature extraction. A wrapper for the NLP4J dependency parser is provided:

<dependency>
  <groupId>io.github.clearwsd</groupId>
  <artifactId>clearwsd-nlp4j</artifactId>
  <version>0.12.1</version>
</dependency>

Finally, to use pre-trained word sense disambiguation models (compatible with NLP4J), just add the following:

<dependency>
  <groupId>io.github.clearwsd</groupId>
  <artifactId>clearwsd-models</artifactId>
  <version>0.12.1</version>
</dependency>

You can then try out a pre-trained model (from OntoNotes) with the following:

import java.util.List;

import io.github.clearwsd.DefaultSensePredictor;
import io.github.clearwsd.SensePrediction;
import io.github.clearwsd.corpus.ontonotes.OntoNotesSense;
import io.github.clearwsd.parser.Nlp4jDependencyParser;

public class Test {
    public static void main(String[] args) {
        Nlp4jDependencyParser parser = new Nlp4jDependencyParser(); // load dependency parser
        DefaultSensePredictor<OntoNotesSense> wsd = DefaultSensePredictor.loadFromResource(
                "models/nlp4j-ontonotes.bin", parser); // load WSD model

        String sentence = "Mary took the bus to school (which " // 8 --> travel by means of
                + "took about 30 minutes), and studiously "     // 3 --> require or necessitate
                + "took notes about the Bolsheviks "            // 2 --> light verb usage
                + "taking over the Winter Palace";              // 9 --> claim or conquer, become in control of

        List<String> tokens = parser.tokenize(sentence); // split sentence into tokens

        // display sense predictions and their definitions
        for (SensePrediction<OntoNotesSense> prediction : wsd.predict(tokens)) {
            System.out.println(prediction.sense().getNumber() + " --> " + prediction.sense().getName());
        }
    }
}

Command Line Interface

ClearWSD provides a command-line interface for training, evaluation, and application of word sense disambiguation models.

To build ClearWSD, you will need Java 8 or above and Apache Maven.

On OS X/Linux, you can then build the project for CLI use:

git clone https://github.com/clearwsd/clearwsd.git
cd clearwsd
mvn package -DskipTests -P build-nlp4j-cli

To use the Stanford Parser wrapper module (GPL licensed) instead, use build-stanford-cli:

mvn package -DskipTests -P build-stanford-cli

You can see a help message and available options with the following command (assuming you have already followed the CLI setup instructions):

java -jar clearwsd-cli-*.jar --help

Usage: WordSenseCLI [options]
  Options:
    -model, -m
      Path to classifier model (for loading or saving)
    -input, -i
      Path to unlabeled input file for new predictions
    -train, -t
      Path to training data (required for training)
    -valid, -dev, -v
      Path to validation data
    -cv, -folds
      Number of cross-validation folds
      Default: 0
    -test
      Path to test data
    --itl, --interactive, --loop
      Start an interactive test session on provided model (after training 
      and/or testing)
      Default: false
    --om
      Output misses on evaluation data in separate files
      Default: false
    --reparse
      Reparse, even if a parsed file of the same name already exists
      Default: false
    --help, --usage
      Display usage
    -corpus
      Training/evaluation corpus type
      Default: Semlink
      Possible Values: [Semeval, Semlink]
    -dataExt
      Extension for training data file (only needed for Semeval XML corpora)
      Default: .data.xml
    -ext
      Parse file extension, appended to input file names to save parses
      Default: .dep
    -inventory, -inv
      Sense inventory
      Possible Values: [VerbNet, WordNet, OntoNotes, Counting]
    -inventoryPath
      Sense inventory path (optional)
    -keyExt
      Extension for sense key file (only needed for Semeval XML corpora)
      Default: .gold.key.txt
    -output, -o
      Path to output file where predictions on the input file are stored

Training

To train a new model, you must specify the path to a training data file with -train, as well as a path for the resulting saved model, using -model:

java -jar clearwsd-cli-*.jar -train path/to/training/file.txt -model path/to/save/model.bin

The default corpus (Semlink) expects files with an instance per line in the following format:

document_id <space> sentence_id <space> token# <space> lemma <space> sense_label <tab> sentence_text

sentence_text should be a single sentence containing the instance, with tokens separated by spaces:

example.txt 25 3 get comprehend-87.2-1	Oh , I get it .
example.txt 57 2 get get-13.5.1-1	Did you get that part ?

Evaluation

The CLI provides several modes of evaluation/application. You can perform cross-validation, test on a specific dataset, apply a trained model to raw text, or try out a model interactively by typing in test sentences.

Cross Validation

Specify the number of folds with -cv. -cv 5, for example, can be used for 5-fold cross validation.:

java -jar clearwsd-cli-*.jar -train path/to/training/file.txt -cv 5

Test Dataset

Specify a test file with -test:

java -jar clearwsd-cli-*.jar -test path/to/test/file.txt -model path/to/trained/model.bin

Application

To apply a trained model to new (raw) data, specify a path with -input. Optionally specify an output path with -output:

java -jar clearwsd-cli-*.jar -input path/to/raw/data.txt -output path/to/predictions.txt \
-model clearwsd-models/src/main/resources/models/nlp4j-ontonotes.bin

Interactive Testing

--loop or --itl can be used to start an interactive command line test loop, where you can input sentences and see predictions.

java -jar clearwsd-cli-*.jar --loop -model clearwsd-models/src/main/resources/models/nlp4j-verbnet-3.3.bin

After the parser and model finish loading, you should then be able to enter test sentences and see predicted senses:

Enter test input ("EXIT" to quit).
> please take notes

Please
take[25.2]
notes

> Take the train home.

Take[51.4.3]
the
train
home

> Take on the government

Take[98]
on
the
government

> Take the money out of the vault

Take[13.5.1]
the
money
out
of
the
vault

License

Please refer to the LICENSE.txt in individual modules.

ClearWSD

Версии библиотеки

Версия
0.12.1 14 сент. 2020 г.
0.12.0 30 июн. 2019 г.
0.10.0 7 мая 2019 г.

ClearWSD CLI

Лицензия

Категории

Группа

Идентификатор

Последняя версия

Дата

Тип

Описание

Скачать clearwsd-cli

Как подключить последнюю версию

Зависимости

compile (5)

provided (1)

test (1)

Модули Проекта

ClearWSD

API

Command Line Interface

Training

Evaluation

Cross Validation

Test Dataset

Application

Interactive Testing

License

ClearWSD

Версии библиотеки