CSVInputFormat

Hadoop2 InputFormat for reading multiline CSV files

Лицензия

Лицензия

Категории

Категории

Ant Компиляция и сборка CSV Данные Data Formats
Группа

Группа

in.ashwanthkumar
Идентификатор

Идентификатор

hadoop2-csv
Последняя версия

Последняя версия

2.0
Дата

Дата

Тип

Тип

jar
Описание

Описание

CSVInputFormat
Hadoop2 InputFormat for reading multiline CSV files
Ссылка на сайт

Ссылка на сайт

https://github.com/ashwanthkumar/hadoop2-csv
Система контроля версий

Система контроля версий

https://github.com/ashwanthkumar/hadoop2-csv

Скачать hadoop2-csv

Как подключить последнюю версию

<!-- https://jarcasting.com/artifacts/in.ashwanthkumar/hadoop2-csv/ -->
<dependency>
    <groupId>in.ashwanthkumar</groupId>
    <artifactId>hadoop2-csv</artifactId>
    <version>2.0</version>
</dependency>
// https://jarcasting.com/artifacts/in.ashwanthkumar/hadoop2-csv/
implementation 'in.ashwanthkumar:hadoop2-csv:2.0'
// https://jarcasting.com/artifacts/in.ashwanthkumar/hadoop2-csv/
implementation ("in.ashwanthkumar:hadoop2-csv:2.0")
'in.ashwanthkumar:hadoop2-csv:jar:2.0'
<dependency org="in.ashwanthkumar" name="hadoop2-csv" rev="2.0">
  <artifact name="hadoop2-csv" type="jar" />
</dependency>
@Grapes(
@Grab(group='in.ashwanthkumar', module='hadoop2-csv', version='2.0')
)
libraryDependencies += "in.ashwanthkumar" % "hadoop2-csv" % "2.0"
[in.ashwanthkumar/hadoop2-csv "2.0"]

Зависимости

compile (2)

Идентификатор библиотеки Тип Версия
log4j : log4j jar 1.2.14
org.apache.hadoop : hadoop-client jar 2.2.0

test (1)

Идентификатор библиотеки Тип Версия
junit : junit jar 4.10

Модули Проекта

Данный проект не имеет модулей.

Build Status

hadoop2-csv

Input format for hadoop able to read multiline CSVs

Run BasicTest.java to see it working. Check src/test/resource/test.csv to see a multiline demofile.

The key returned is the file position where the line starts and the value is a List with the column values

Zip files are supported.

More ideas to improve this are welcome.

Example:

If we read this CSV (note that line 2 is multiline):

Joe Demo,"2 Demo Street,
Demoville,
Australia. 2615",joe@someaddress.com
Jim Sample,"3 Sample Street, Sampleville, Australia. 2615",jim@sample.com
Jack Example,"1 Example Street, Exampleville, Australia.
2615",jack@example.com

The output is as follows:

==> TestMapper
==> key=0
==> val[0] = Joe Demo
==> val[1] = 2 Demo Street, 
Demoville, 
Australia. 261
==> val[2] = joe@someaddress.com

==> TestMapper
==> key=73
==> val[0] = Jim Sample
==> val[1] = 
==> val[2] = jim@sample.com

==> TestMapper
==> key=10
==> val[0] = Jack Example
==> val[1] = 1 Example Street, Exampleville, Australia. 261
==> val[2] = jack@example.com

License

https://www.apache.org/licenses/LICENSE-2.0.html

Credits

Personal fork of CSVInputFormat, but built against hadoop2. Please report the issues to the original fork.

Версии библиотеки

Версия
2.0