Document Normalizer

Tools for normalizing documents before processing

Лицензия

Лицензия

Категории

Категории

ORM Данные
Группа

Группа

uk.ac.gate.plugins
Идентификатор

Идентификатор

document-normalizer
Последняя версия

Последняя версия

8.5
Дата

Дата

Тип

Тип

jar
Описание

Описание

Document Normalizer
Tools for normalizing documents before processing
Организация-разработчик

Организация-разработчик

GATE
Система контроля версий

Система контроля версий

https://github.com/GateNLP/gateplugin-DocumentNormalizer

Скачать document-normalizer

Как подключить последнюю версию

<!-- https://jarcasting.com/artifacts/uk.ac.gate.plugins/document-normalizer/ -->
<dependency>
    <groupId>uk.ac.gate.plugins</groupId>
    <artifactId>document-normalizer</artifactId>
    <version>8.5</version>
</dependency>
// https://jarcasting.com/artifacts/uk.ac.gate.plugins/document-normalizer/
implementation 'uk.ac.gate.plugins:document-normalizer:8.5'
// https://jarcasting.com/artifacts/uk.ac.gate.plugins/document-normalizer/
implementation ("uk.ac.gate.plugins:document-normalizer:8.5")
'uk.ac.gate.plugins:document-normalizer:jar:8.5'
<dependency org="uk.ac.gate.plugins" name="document-normalizer" rev="8.5">
  <artifact name="document-normalizer" type="jar" />
</dependency>
@Grapes(
@Grab(group='uk.ac.gate.plugins', module='document-normalizer', version='8.5')
)
libraryDependencies += "uk.ac.gate.plugins" % "document-normalizer" % "8.5"
[uk.ac.gate.plugins/document-normalizer "8.5"]

Зависимости

provided (1)

Идентификатор библиотеки Тип Версия
uk.ac.gate : gate-core jar 8.5

test (1)

Идентификатор библиотеки Тип Версия
uk.ac.gate : gate-plugin-test-utils jar 8.5

Модули Проекта

Данный проект не имеет модулей.

A simple PR to allow for basic document normalization. Should usually be run as the first PR in a pipeline after Document Reset. The PR edits the document content and so once it has been run over a document once, future executions will have no effect although will require processing time.

The PR works from a file of replacements. Essentially this file consists of pairs of lines. The first line specifics the text to replace, while the second line signifies what will be substituted in its place. The first line can be a regular expression, but back references cannot be used within the second line.

The most common use for this PR is to normalise punctuation symbols as WYSIWYG editors often automatically replace standard apostrophe and hyphen symbols with more fancy versions. This makes processing text difficult as gazetteer lists, JAPE grammars and other resources usually assume the use of the standard symbols, i.e. the ones on the keyboard. The default config file is aimed at normalizing such cases.

uk.ac.gate.plugins

GateNLP

GATE - General Architecture for Text Engineering

Версии библиотеки

Версия
8.5
8.5-alpha1