Project Group: de.cit-ec.scie

NER Core

de.cit-ec.scie : ner-core

This module forms the main component of the ontology-based named entity recognition (NER). It can store arbitrary directed ontology graphs and supports multiple labels (ontological surface forms) per ontology graph node. It implements an easy to use class NamedEntityRecognition which can be used to (fuzzily) find the ontology instances in the text. This module has no dependencies.

Last Version: 2.0.1

Release Date:

SCIE PDF Text Extractor

de.cit-ec.scie : pdf-extractor

This is an optimized version of Apache PDFBox. It allows to extract the rough structure of a document (pages, blocks of text and paragraphs as well as formatting information) and was made with the intent to optimize text extraction results for scientific papers. The output can easily be transformed to plaintext (toString) or to an XML format (toXML).

Last Version: 2.0.1

Release Date:

NER MapDB

de.cit-ec.scie : ner-mapdb

Provides a binding between the NER subsystem and the MapDB database for storing large ontologies, capable of managing hundred thousands of individual surface forms and ten thousands of ontology graph node.

Last Version: 2.0.1

Release Date:

SCIE Type System

de.cit-ec.scie : scie-typesystem

This is an internally used library containing the UIMA type system descriptors and the annotator templates for the SCIE project.

Last Version: 2.0.1

Release Date:

Webservice

de.cit-ec.scie : webservice

Module providing the webservice interface based on the Jetty embedded webserver and the FreeMarker template engine. Defines a simple format for providing textual annotations and produced output in HTML or JSON. This module has no dependencies to the other SCIE modules (except for the PDF text extractor) or the UIMA framework and thus can be used in any context, where text is annotated by an algorithm and should be presented to an end user.

Last Version: 2.0.1

Release Date:

SCIE Classifiers

de.cit-ec.scie : scie-classifiers

Library based on liblinear which allows to aggregate multiple UIMA annotations to compound UIMA annotations/higher order concepts/ relations by employing machine learning techniques.

Last Version: 2.0.1

Release Date:

SCIE Core

de.cit-ec.scie : scie-core

Contains the SCIE main application and the CLI interface. This project integrates the named entity recognition (NER), the PDF import and the classification and interfaces with the UIMA framework. The command line interface can be used to produce a set of UIMA XCAS files.

Last Version: 2.0.1

Release Date:

SCIE PDF Text Extractor GUI

de.cit-ec.scie : pdf-extractor-gui

This provides an easy Graphical User Interface for the SCIE pdf-extractor module.

Last Version: 2.0.1

Release Date:

NER Import

de.cit-ec.scie : ner-import

Tool used to import ontologies from various file formats (native simple XML used for the small ontologies, NCBI MeSH, NCBI Taxonomy) into the internal NER ontology database.

Last Version: 2.0.1

Release Date:

NER Webservice

de.cit-ec.scie : webservice-ner

Contains a debugging version of the SCIE Webservice, performing only ontology based Named Entity Recognition. Thus this webservice can be used to list the all the ontological named entities found in the input text.

Last Version: 2.0.1

Release Date:

SCIE Webservice

de.cit-ec.scie : webservice-scie

Contains the SCIE Webservice. This application will spawn a multiple instances of the scie-core application in a process pool, relay requests from the web frontend to the analysis process and parse the resulting XCAS into an interactive HTML output or JSON.

Last Version: 2.0.1

Release Date:

  • 1