GCHQ Synthetic Data Generator

A utility application used to generate Avro files of test data

Лицензия

Лицензия

Категории

Категории

Данные
Группа

Группа

uk.gov.gchq.data-gen
Идентификатор

Идентификатор

synthetic-data-generator
Последняя версия

Последняя версия

0.0.4
Дата

Дата

Тип

Тип

jar
Описание

Описание

GCHQ Synthetic Data Generator
A utility application used to generate Avro files of test data
Ссылка на сайт

Ссылка на сайт

https://github.com/gchq/synthetic-data-generator
Система контроля версий

Система контроля версий

https://github.com/gchq/synthetic-data-generator

Скачать synthetic-data-generator

Как подключить последнюю версию

<!-- https://jarcasting.com/artifacts/uk.gov.gchq.data-gen/synthetic-data-generator/ -->
<dependency>
    <groupId>uk.gov.gchq.data-gen</groupId>
    <artifactId>synthetic-data-generator</artifactId>
    <version>0.0.4</version>
</dependency>
// https://jarcasting.com/artifacts/uk.gov.gchq.data-gen/synthetic-data-generator/
implementation 'uk.gov.gchq.data-gen:synthetic-data-generator:0.0.4'
// https://jarcasting.com/artifacts/uk.gov.gchq.data-gen/synthetic-data-generator/
implementation ("uk.gov.gchq.data-gen:synthetic-data-generator:0.0.4")
'uk.gov.gchq.data-gen:synthetic-data-generator:jar:0.0.4'
<dependency org="uk.gov.gchq.data-gen" name="synthetic-data-generator" rev="0.0.4">
  <artifact name="synthetic-data-generator" type="jar" />
</dependency>
@Grapes(
@Grab(group='uk.gov.gchq.data-gen', module='synthetic-data-generator', version='0.0.4')
)
libraryDependencies += "uk.gov.gchq.data-gen" % "synthetic-data-generator" % "0.0.4"
[uk.gov.gchq.data-gen/synthetic-data-generator "0.0.4"]

Зависимости

compile (10)

Идентификатор библиотеки Тип Версия
com.github.javafaker : javafaker jar 1.0.1
com.fasterxml.jackson.core : jackson-core jar 2.10.0
com.fasterxml.jackson.core : jackson-annotations jar 2.10.0
com.fasterxml.jackson.core : jackson-databind jar 2.10.0
com.fasterxml.jackson.datatype : jackson-datatype-jdk8 jar 2.10.0
com.fasterxml.jackson.datatype : jackson-datatype-jsr310 jar 2.10.0
org.slf4j : slf4j-api jar 1.7.28
org.slf4j : slf4j-simple jar 1.7.28
commons-io : commons-io jar 2.6
org.apache.avro : avro jar 1.8.2

Модули Проекта

Данный проект не имеет модулей.

Synthetic Data Generator

Ever found yourself scrambling around to find test data and then when you find some it isn't in the quantity that you need? Or you can't generate the data multi threaded and so it takes too long to produce.

Look no further, we have a data generator that fakes up some classic human resources data about employees. We have also created the data structure to contain the types of complex data structures that can make computation expensive or difficult to truly test your platform.

This repo provides the code to generate as many Employee records as you want, split over as many Avro files as you desire. You can also optionally define the number of parallel threads used to generate your data.

An Employee objects contains the following fields:

class Employee {
    UserId uid;
    String name;
    String dateOfBirth;
    PhoneNumber[] contactNumbers;
    EmergencyContact[] emergencyContacts;
    Address address;
    BankDetails bankDetails;
    String taxCode;
    Nationality nationality;
    Manager[] manager;
    String hireDate;
    Grade grade;
    Department department;
    int salaryAmount;
    int salaryBonus;
    WorkLocation workLocation;
    Sex sex;
}

The manager field is an array of manager, which could potentially be nested several layers deep, in the generated example manager is nested 3-5 layers deep.

To use the generator you will need to have installed (git, maven and JDK 11).

To get started first clone this repo locally.

git clone https://github.com/gchq/synthetic-data-generator.git

Then cd into the synthetic-data-generator directory and build the codebase

mvn clean install

then to start the generator:

.createHRData.sh PATH EMPLOYEES FILES [THREADS]

where:

  • PATH is the relative path to generate the files
  • EMPLOYEES is the number of employee records to create
  • FILES is the number of files to spread them over
  • THREADS (optional) specifies the number of threads to use.

For example to generate 1,000,000 employee records, spread over 15 files, running the program with 4 threads, and writing the output files to /data/employee:

.createHRData.sh data/employee 1000000 15 4
uk.gov.gchq.data-gen

GCHQ

Версии библиотеки

Версия
0.0.4
0.0.3