File duplicate finder

A file duplicate finder written in Java 8 with native MD5 check support.

Лицензия

Лицензия

Категории

Категории

Java Языки программирования
Группа

Группа

com.github.cbismuth
Идентификатор

Идентификатор

fdupes-java
Последняя версия

Последняя версия

1.3.1
Дата

Дата

Тип

Тип

jar
Описание

Описание

File duplicate finder
A file duplicate finder written in Java 8 with native MD5 check support.
Ссылка на сайт

Ссылка на сайт

https://github.com/cbismuth/fdupes-java
Организация-разработчик

Организация-разработчик

Pivotal Software, Inc.
Система контроля версий

Система контроля версий

https://github.com/cbismuth/fdupes-java

Скачать fdupes-java

Как подключить последнюю версию

<!-- https://jarcasting.com/artifacts/com.github.cbismuth/fdupes-java/ -->
<dependency>
    <groupId>com.github.cbismuth</groupId>
    <artifactId>fdupes-java</artifactId>
    <version>1.3.1</version>
</dependency>
// https://jarcasting.com/artifacts/com.github.cbismuth/fdupes-java/
implementation 'com.github.cbismuth:fdupes-java:1.3.1'
// https://jarcasting.com/artifacts/com.github.cbismuth/fdupes-java/
implementation ("com.github.cbismuth:fdupes-java:1.3.1")
'com.github.cbismuth:fdupes-java:jar:1.3.1'
<dependency org="com.github.cbismuth" name="fdupes-java" rev="1.3.1">
  <artifact name="fdupes-java" type="jar" />
</dependency>
@Grapes(
@Grab(group='com.github.cbismuth', module='fdupes-java', version='1.3.1')
)
libraryDependencies += "com.github.cbismuth" % "fdupes-java" % "1.3.1"
[com.github.cbismuth/fdupes-java "1.3.1"]

Зависимости

compile (7)

Идентификатор библиотеки Тип Версия
org.springframework.boot : spring-boot-starter-web jar 1.4.1.RELEASE
org.springframework.boot : spring-boot-starter-logging jar 1.4.1.RELEASE
com.google.guava : guava jar 19.0
org.apache.spark : spark-network-common_2.11 jar 1.6.2
org.zeroturnaround : zt-exec jar 1.9
com.opencsv : opencsv jar 3.8
io.dropwizard.metrics : metrics-servlets jar 3.1.2

test (1)

Идентификатор библиотеки Тип Версия
org.springframework.boot : spring-boot-starter-test jar 1.4.1.RELEASE

Модули Проекта

Данный проект не имеет модулей.

fdupes-java

build coverage javadoc repository issues licence

Description

A command line duplicated files finder written in Java 8 which finds all duplicated files from input paths and their subdirectories.

Usage

Executable files are available on the release page, download the latest one and run the command line below.

java -jar fdupes-1.3.0.jar <PATH1> [<PATH2>]...

Output

Paths of duplicated files are reported in a duplicates.log file dumped in the current working directory.

Note: reported paths are double-quoted and whitespace-escaped to be *nix-compliant.

Options

Here are optional command line switches:

-Dlogging.level.fdupes=<LEVEL>    the logging level of fdupes-java        (default is INFO)
-Dlogging.level.root=<LEVEL>      the logging level of embedded libraries (default is WARN)

-Xmx<SIZE><UNIT>                  the max amount of memory to allocate (e.g. 512m)

-Dfdupes.parallelism=<NUMBER>     the numbers of threads to parallelize execution  (default is 1)
-Dfdupes.buffer.size=<SIZE><UNIT> the buffer size used for byte-by-byte comparison (default is 64k)

Note: logging levels must be one of: ALL, TRACE, DEBUG, INFO, WARN, ERROR, OFF.

Examples

Find duplicated files in a single directory and its subdirectories with default options:

java -jar fdupes-1.3.0.jar ~/pictures

Find duplicated files in a two directories plus one single file with custom options:

java -Xmx1g                       \
     -Dfdupes.parallelism=8       \
     -Dfdupes.buffer.size=3m      \
     -Dlogging.level.fdupes=DEBUG \
     -Dlogging.level.root=DEBUG   \
     -jar fdupes-1.3.0.jar        \
     ~/pictures                   \
     ~/downloads                  \
     ~/desktop/DSC00042.JPG

Note: <PATH1> [<PATH2>]... can be either regular files, directories or both.

Benchmark

Hardware
Processor Intel® Core™ i7-5500U CPU @ 2.40GHz × 4
Memory 15.4 Go
Disk SSD Samsung MZ7LN256 rev. 3L6Q
Software
OS Ubuntu 16.04 LTS 64-bit
Java JRE 1.8.0_92-b14 64-bit

Command line

java -Xmx8g                       \
     -Dfdupes.parallelism=8       \
     -Dfdupes.buffer.size=512k    \
     -Dlogging.level.fdupes=INFO  \
     -Dlogging.level.root=ERROR   \
     -jar fdupes-1.3.0.jar        \
     ~/Pictures/tmp
Results
Total files count 69406
Total files size 148 Go
Total duplicates count 8196
Total duplicates size 49,597.715 Mo
Execution time 3m1.164s

Requirements

Java 8 Runtime environment is the only requirement, it can be downloaded here.

Motivation

Original fdupes application has two major caveats fdupes-java works around.

When used together with options -s or --symlink, a user could accidentally preserve a symlink while deleting the file it points to.

Symlinks are ignored in fdupes-java.

Furthermore, when specifying a particular directory more than once, all files within that directory will be listed as their own duplicates, leading to data loss should a user preserve a file without its "duplicate" (the file itself!).

Duplicated input directories and files are filtered in fdupes-java.

Algorithms

  • Files are compared by file sizes, then by MD5 signatures, finally a buffered byte-by-byte comparison is done.
  • Original file is detected by comparing creation, last access and last modification time.

Issues

Here is how issues are triaged:

  • Bug: identifies an unexpected result or application behaviour.
  • Feature: adds an new end-user feature.
  • Enhancement: improves the way the application behaves but produces the same result.
  • Spike: improves implementation design but does not change application behaviour and produces the same result.

Credits

Written by Christophe Bismuth, licensed under the The MIT License (MIT).

This project is finely profiled with the awesome JProfiler from ej-technologies!

https://www.ej-technologies.com/products/jprofiler/overview.html

Версии библиотеки

Версия
1.3.1
1.3.0
1.2.0
1.2.0-RC4
1.2.0-RC3
1.2.0-RC2
1.2.0-RC1