com.simiacryptus:hadoop-jgit-fs

Hadoop Filesystem Driver for Git

License

License

Categories

Categories

Git Development Tools Version Controls JGit General Purpose Libraries Utility
GroupId

GroupId

com.simiacryptus
ArtifactId

ArtifactId

hadoop-jgit-fs
Last Version

Last Version

2.1.0
Release Date

Release Date

Type

Type

jar
Description

Description

Hadoop Filesystem Driver for Git
Project Organization

Project Organization

SimiaCryptus Software

Download hadoop-jgit-fs

How to add to project

<!-- https://jarcasting.com/artifacts/com.simiacryptus/hadoop-jgit-fs/ -->
<dependency>
    <groupId>com.simiacryptus</groupId>
    <artifactId>hadoop-jgit-fs</artifactId>
    <version>2.1.0</version>
</dependency>
// https://jarcasting.com/artifacts/com.simiacryptus/hadoop-jgit-fs/
implementation 'com.simiacryptus:hadoop-jgit-fs:2.1.0'
// https://jarcasting.com/artifacts/com.simiacryptus/hadoop-jgit-fs/
implementation ("com.simiacryptus:hadoop-jgit-fs:2.1.0")
'com.simiacryptus:hadoop-jgit-fs:jar:2.1.0'
<dependency org="com.simiacryptus" name="hadoop-jgit-fs" rev="2.1.0">
  <artifact name="hadoop-jgit-fs" type="jar" />
</dependency>
@Grapes(
@Grab(group='com.simiacryptus', module='hadoop-jgit-fs', version='2.1.0')
)
libraryDependencies += "com.simiacryptus" % "hadoop-jgit-fs" % "2.1.0"
[com.simiacryptus/hadoop-jgit-fs "2.1.0"]

Dependencies

compile (3)

Group / Artifact Type Version
org.eclipse.jgit : org.eclipse.jgit jar
org.apache.hadoop : hadoop-common jar
com.amazonaws : aws-java-sdk-codecommit jar

provided (1)

Group / Artifact Type Version
org.slf4j : slf4j-api jar

test (4)

Group / Artifact Type Version
ch.qos.logback : logback-classic jar
org.slf4j : log4j-over-slf4j jar
org.junit.jupiter : junit-jupiter jar
org.apache.hadoop : hadoop-common test-jar

Project Modules

There are no modules declared in this project.

A JGit SDK-backed FileSystem driver for Hadoop

This is an experimental FileSystem for Hadoop that uses the JGit SDK. This has not been heavily tested yet. Use at your own risk.

Features:

  • Clones each given repo+branch once and uses a background thread to fetch updates
  • Proxies through to a read-only local filesystem driver for high speed
  • Default packaging uses an uber-jar for easy deployment
  • Download prebuilt jar from Maven Central

Import from Maven Central

<dependency>
    <groupId>com.simiacryptus</groupId>
    <artifactId>hadoop-jgit-fs</artifactId>
    <version>0.1</version>
</dependency>

Build Instructions

Build using maven:

$ mvn package

Copy jar and various dependencies to your hadoop libs dir (run 'hadoop classpath' to find appropriate lib dir):

$ cp target/hadoop-jgit-fs-0.1.jar /usr/lib/hadoop/lib/

Add the following keys to your core-site.xml file:

<!-- necessary for Hadoop to load our filesystem driver -->
<property>
  <name>fs.git.impl</name>
  <value>com.simiacryptus.hadoop_jgit.GitFileSystem</value>
</property>

You should now be able to run commands:

$ hadoop fs -ls git://github.com/SimiaCryptus/hadoop-jgit-fs.git/master/

Tunable parameters

These may or may not improve performance. The defaults were set without much testing.

  • fs.jgit.pull.lazy - Frequency (in seconds) of foreground fetches
  • fs.jgit.pull.eager - Frequency (in seconds) of background fetches
  • fs.jgit.dismount.seconds - Idle time (in seconds) to dismount repo driver
  • fs.jgit.dismount.delete - If true, files will be removed when repo driver dismounts
  • fs.jgit.datadir - Data directory to use for local storage
  • fs.jgit.auth.user - Username for authentication (Optional)
  • fs.jgit.auth.pass - Password for authentication (Optional)

Caveats

This is currently implemented as a FileSystem and not a AbstractFileSystem.

Changes

0.1

  • Created
com.simiacryptus

Simia Cryptus

Big Data Science and Artificial Intelligence

Versions

Version
2.1.0
2.0.0
1.8.0
1.7.2
1.7.1
1.7.0
1.6.0
1.5.1
1.5.0
1.4.23
1.4.20
1.4.17
0.1