DynamoDB Key Diagnostics Library for Java

The DynamoDB Key Diagnostics Library is a wrapper around the DynamoDB Java SDK that can be used to log key usage information for DynamoDB into a Kinesis stream, which can then be used to track hot spots (keys).

License

License

Categories

Categories

AWS Container PaaS Providers KeY Data Data Formats Formal Verification
GroupId

GroupId

com.amazonaws
ArtifactId

ArtifactId

dynamodb-key-diagnostics-library
Last Version

Last Version

1.1
Release Date

Release Date

Type

Type

jar
Description

Description

DynamoDB Key Diagnostics Library for Java
The DynamoDB Key Diagnostics Library is a wrapper around the DynamoDB Java SDK that can be used to log key usage information for DynamoDB into a Kinesis stream, which can then be used to track hot spots (keys).
Project URL

Project URL

https://aws.amazon.com/dynamodb
Source Code Management

Source Code Management

https://github.com/awslabs/dynamodb-key-diagnostics-library.git

Download dynamodb-key-diagnostics-library

How to add to project

<!-- https://jarcasting.com/artifacts/com.amazonaws/dynamodb-key-diagnostics-library/ -->
<dependency>
    <groupId>com.amazonaws</groupId>
    <artifactId>dynamodb-key-diagnostics-library</artifactId>
    <version>1.1</version>
</dependency>
// https://jarcasting.com/artifacts/com.amazonaws/dynamodb-key-diagnostics-library/
implementation 'com.amazonaws:dynamodb-key-diagnostics-library:1.1'
// https://jarcasting.com/artifacts/com.amazonaws/dynamodb-key-diagnostics-library/
implementation ("com.amazonaws:dynamodb-key-diagnostics-library:1.1")
'com.amazonaws:dynamodb-key-diagnostics-library:jar:1.1'
<dependency org="com.amazonaws" name="dynamodb-key-diagnostics-library" rev="1.1">
  <artifact name="dynamodb-key-diagnostics-library" type="jar" />
</dependency>
@Grapes(
@Grab(group='com.amazonaws', module='dynamodb-key-diagnostics-library', version='1.1')
)
libraryDependencies += "com.amazonaws" % "dynamodb-key-diagnostics-library" % "1.1"
[com.amazonaws/dynamodb-key-diagnostics-library "1.1"]

Dependencies

compile (9)

Group / Artifact Type Version
com.amazonaws : aws-java-sdk-dynamodb jar 1.11.459
com.amazonaws : aws-java-sdk-kinesis jar 1.11.459
com.amazonaws : amazon-kinesis-client jar 1.9.2
org.apache.commons : commons-math3 jar 3.2
com.google.guava : guava jar 26.0-jre
com.google.collections : google-collections jar 1.0-rc2
com.google.code.gson : gson jar 2.8.5
org.slf4j : slf4j-log4j12 jar 1.7.25
commons-logging : commons-logging jar 1.2

provided (2)

Group / Artifact Type Version
org.projectlombok : lombok jar 1.16.16
com.google.code.findbugs : annotations jar 3.0.1

test (4)

Group / Artifact Type Version
org.apache.commons : commons-lang3 jar 3.3.2
org.mockito : mockito-core jar 2.19.0
junit : junit jar 4.12
org.hamcrest : hamcrest-all jar 1.3

Project Modules

There are no modules declared in this project.

DynamoDB Key Diagnostics Library 🌑️

DynamoDB Key Diagnostics Library is a Java DynamoDB client wrapper that automatically logs your key usage information to Kinesis as your application reads/writes data from/to DynamoDB. You can then use Kinesis Data Analytics to feed into CloudWatch to monitor and alarm if any single key gets too hot, and to feed into S3/Athena/QuickSight to report on your detailed key usage and have heatmaps to help diagnose your application.

.
β”œβ”€β”€ README.md                                                   <-- This instructions file
β”œβ”€β”€ LICENSE.txt                                                 <-- Apache Software License 2.0
β”œβ”€β”€ NOTICE.txt                                                  <-- Copyright notices
β”œβ”€β”€ checkstyle.xml                                              <-- Checkstyle for the Key Diagnostics client
β”œβ”€β”€ pom.xml                                                     <-- Java dependencies, Docker integration test orchestration
β”œβ”€β”€ resources
β”‚   β”œβ”€β”€ python                                                  <-- Contains the AWS Lambda function for emitting hot key metrics and logs
β”‚   β”‚   β”œβ”€β”€ src
β”‚   β”‚   β”‚   └── diagnostics
β”‚   β”‚   β”‚       └── hot_key_logger_lambda.py                    <-- The actual Lambda function
β”‚   β”‚   └── tests
β”‚   β”‚       └── diagnostics
β”‚   β”‚           └── test_hot_key_logger_lambda.py               <-- Unit test for the Lambda function
β”‚   └── DynamoDB_Key_Diagnostics_Library.yml                    <-- CloudFormation template, see "Packaging and Deployment" section below
β”œβ”€β”€ src
β”‚   β”œβ”€β”€ main
β”‚   β”‚   └── java
β”‚   β”‚       └── com.amazonaws.services.dynamodb.diagnostics     <-- Main package for Key Diagnostics Client
β”‚   β”‚           β”œβ”€β”€ DynamoDBKeyDiagnosticsClient.java           <-- Contains inject methods for handler entrypoints
β”‚   β”‚           β”œβ”€β”€ DynamoDBKeyDiagnosticsClientAsync.java      <-- Similar to DynamoDBKeyDiagnosticsClient, but supports async DynamoDB APIs
β”‚   β”‚           β”œβ”€β”€ DynamoDBKeyDiagnosticsClientBuilder.java    <-- Provides dependencies like the DynamoDB client for injection
β”‚   β”‚           └── KinesisStreamReporter                       <-- Reporter class that asynchronously sends key usage information to a Kinesis stream
β”‚   └── test                                                    <-- Unit and integration tests
β”‚       └── java
β”‚           └── com.amazonaws.services.dynamodb.diagnostics     <-- Contains integration tests and unit tests.
β”‚
└── samples                                                     <-- Contains the Movies demo application that uses the Key Diagnostics client
     └── movies
         β”œβ”€β”€ main
         β”‚   β”œβ”€β”€ java
         β”‚   β”‚   └── com.amazonaws.services.dynamodb.diagnostics.demo
         β”‚   β”‚       └── MoviesApplication.java                 <-- Simulates hot key scenario with certain "hit movies"
         β”‚   β”‚
         β”‚   └── resources
         β”‚       └── log4j.properties                           <-- Log4j configuration for the demo application
         β”œβ”€β”€ checkstyle.xml                                     <-- Checkstyle for the Movies demo application
         └── pom.xml                                            <-- Java and the Key Diagnostics client dependencies

Prerequisites

To use the DynamoDB Key Diagnostics Library or run the demo, you must have the following:

  • Java 1.8 or later
  • Maven 3 or later
  • an AWS account

Setup process

Note: At this time, the library aggregates the metrics for keys at minute and second granularity. Depending on your business requirements, you may choose to modify the client to aggregate data differently. Currently, you can set up this post’s CloudFormation template in the following AWS Regions: US East (N Virginia), US West (Oregon), EU (Ireland), and EU (Frankfurt).

With the following setup, the DynamoDB Key Diagnostics Library will log the values of your partition key, sort key or any attributes for the selected DynamoDB table. The key usage information will be stored on S3, and specific hot keys will be logged and displayed through Amazon CloudWatch and Amazon QuickSight.

Step 1: Install the Key Diagnostics Library

To install the Key Diagnostics Library, run the following command:

mvn install

Step 2: Configure your AWS credentials

Configure your AWS CLI credentials, if you haven't already. The following AWS resources will be synthesized under the configured account.

aws configure

Make sure you have Amazon S3, AWS Lambda, Amazon Kinesis, Amazon CloudWatch and CloudFormation permissions with the configured credentials.

Step 3: Create and deploy the required AWS resources by using the CloudFormation template

You will now deploy a Lambda function for reporting and monitoring metrics. To do this, first upload the provided Lambda function to Amazon S3. If you don't have an Amazon S3 bucket already, create one. (Throughout the following instructions, replace the placeholder names with your own names.)

export BUCKET_NAME=my_cool_new_bucket
aws s3 mb s3://$BUCKET_NAME

Then, package the provided Hot Key Lambda function the Amazon S3 bucket.

aws cloudformation package \
    --template-file resources/DynamoDB_Key_Diagnostics_Library.yaml \
    --s3-bucket $BUCKET_NAME \
    --output-template-file packaged.yaml

You can then create the rest of the necessary AWS resources (such as the Amazon Kinesis Data Streams stream, Amazon Kinesis Data Analytics application, and CloudWatch alarm). Also, provide a CloudFormation stack name.

STACK_NAME=KeyDiagnosticsStack

aws cloudformation deploy \
    --template-file packaged.yaml \
    --stack-name $STACK_NAME \
    --capabilities CAPABILITY_IAM

Customizing your Kinesis Data Stream according to your DynamoDB Table

Depending on the provisioned capacity of your DynamoDB table, you may change the shard count of the Kinesis Data Stream used to process your requests. The default and minimum is 4 shards. To override the shard count, you add the following override instead:

SHARD_COUNT=10

aws cloudformation deploy \
    --template-file packaged.yaml \
    --stack-name postreview \
    --capabilities CAPABILITY_IAM
    --parameter-overrides KinesisSourceStreamShardCount=$SHARD_COUNT

CloudFormation does not automatically start the Kinesis Data Analytics application, so to start the application, navigate to the Amazon Kinesis console or run the following commands.

# Find out the Kinesis Analtyics Application Name by going to the Kinesis console or `aws kinesisanalytics list-applications`
KINESIS_ANALYTICS_APP_NAME="Put your application name here"

# Then, find out the InputID
INPUT_ID=`aws kinesisanalytics describe-application \
    --application-name $KINESIS_ANALYTICS_APP_NAME \
    --query 'ApplicationDetail.InputDescriptions[0].InputId'`

# Start the Kinesis Data Analytics app
aws kinesisanalytics start-application \
    --application-name $KINESIS_ANALYTICS_APP_NAME \
    --input-configurations Id=$INPUT_ID,InputStartingPositionConfiguration={InputStartingPosition=NOW}

You now are ready to run the demo Movies example application in the repository (step 3.1) or change your code to use the Key Diagnostics Library (step 3.2).

Step 3.1: Running the Demo

This demo uses the IMDb meta-data dataset to create an application that rates movies by putting in items into DynamoDB. Certain movies will be "trending", thus creating an uneven load on certain hash keys.

After you have installed the Key Diagnostics Library dependencies and setup all the AWS resources, navigate to the samples/movies/ directory.

Then, execute the demo by running:

KINESIS_STREAM_NAME="Put your Kinesis Data Stream name here"
REGION="Put the region where the Kinesis Data Stream and DynamoDB table are set up"

mvn package exec:java@movies -Dexec.args="trend $KINESIS_STREAM_NAME $REGION"

Step 3.2: Integrating with your existing DynamoDB code

To use the Key Diagnostics client, you first need to create a Kinesis stream that it can log to, then you can use that stream name along with the Kinesis client and the DynamoDB client you're wrapping to create the Key Diagnostics client. You also need to specify which key attributes in which tables you need to monitor - the easiest way to do that is to use the factory method that just monitors all the key attributes for all the tables and global secondary indexes in your account:

DynamoDBKeyDiagnosticsClient ddbClient = DynamoDBKeyDiagnosticsClient.monitorAllPartitionKeys(
    dynamoDB,
    kinesisClient,
    kinesisStreamName
);

If you do need to specify your own attributes to monitor (e.g. if you are considering creating a new global secondary index on a new attribute and are wondering if it has hot values) then you can create it as follows:

DynamoDBKeyDiagnosticsClient ddbClient = new DynamoDBKeyDiagnosticsClient(
    dynamoDB,
    kinesisClient,
    kinesisStreamName,
    ImmutableMap.of("MyTable", ImmutableList.of("MyAttribute"))
);

After you created the diagnostics client, you can then use it everywhere you would've used the regular AmazonDynamoDB client (it implements the AmazonDynamoDB interface). The diagnostics client creates a thread pool to asynchronously log the key usage information to Kinesis, so when you're done with it you should close() it so that it can shut down those threads.

Step 4: Visualization through Amazon Athena and QuickSight

If you are interested in creating dashboards or querying the key usage information, or wish to understand what the access patterns of certain attributes, we highly recommend setting up Athena and QuickSight.

First, go to the Athena Console, and put in the following under New query 1, then click Run Query. This will create an Athena database for the key usage information stored on S3:

CREATE DATABASE IF NOT EXISTS dynamodbkeydiagnosticslibrary
COMMENT 'Athena database for DynamoDB Key Diagnostics Library';

Then, create the Athena table. Following the demo app, we will use movies as the table name. If you synthesized the AWS with the provided CloudFormation template in Step 1, the S3 Location should be something similar to: s3:///keydiagnosticsstack-aggregatedresultbucket-ejkhrnvyw8ku/keydiagnostics/

CREATE EXTERNAL TABLE `movies`(
    `second` timestamp COMMENT 'Second aggregated results',
    `tablename` string COMMENT 'DynamoDB table name',
    `hashkey` string COMMENT 'The partition key attribute name',
    `hashkeyvalue` string COMMENT 'The partition key attribute value',
    `operation` string COMMENT 'DynamoDB operation',
    `totalio` float COMMENT 'Total IO consumed')
ROW FORMAT SERDE
    'org.openx.data.jsonserde.JsonSerDe'
STORED AS INPUTFORMAT
    'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
    'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
    's3:///keydiagnosticsstack-aggregatedresultbucket-ejkhrnvyw8ku/keydiagnostics/'

After setting the Athena table, you can use QuickSight to visualize the key usage pattern of your application:

  1. Go to the QuickSight console and click Manage data on the upper right.
  2. Click New data set, choose Athena and pick a data source name. Then, you should be able to select the Athena database and table created in the previous section.
  3. Choose Import to SPICE for quicker analytics, then click Visualize!
  4. Now you should be able to create graphs by filtering table names, time range, partition keys, operation etc. The following is a heat map that shows what movies are popular over a certain time range:

Sample heat map on Amazon Quicksight

Testing

Running unit tests

We use JUnit for testing our code. Unit tests mock out the AmazonDynamoDBClient and do not require connectivity to a DynamoDB endpoint. You can run unit tests with the following command:

mvn test

Running integration tests

Integration tests do not mock out the AmazonDynamoDBClient and require connectivity to a DynamoDB endpoint. As such, the POM starts DynamoDB Local from the Dockerhub image for integration tests.

mvn verify -P integration-tests
com.amazonaws

Amazon Web Services - Labs

AWS Labs

Versions

Version
1.1
1.0