Posts Tagged ‘DynamoDB’

Querying DynamoDB export data using Athena (with LIST attribute)

Written by mannem on . Posted in Athena, AWS BIG DATA

Presto supports multiple Array and JSON functions using which you can write queries to get the required results. There is no single way to define a CREATE TABLE and later your QUERIES. You can have CREATE TABLE predefine the schema structure and later your queries can refer the elements you need in the schema. Or you can define your schema as a string and later use functions on your queries to parse this string to get the required results.

In addition, Athena has some examples to use most of the presto functions :

Now, I was able to use the following CREATE TABLE syntax on the DynamoDB items having List of strings. I was able to flatten out the list of strings using some of the functions like CAST. Please note that this is not the only way to define table and query. My query might be over complexing what your are trying to get. There might be a simpler way as well. So, its really important that you understand the data types that you define and what each query returns and use the correct functions as each function takes a datatype and returns another datatype.

Generate your own CSV/TSV data quickly with urandom + hexdump

Written by mannem on . Posted in Data-sets

In this article, we will use hexdump + urandom which are included in most linux distributions to quickly generate random data. The values of the first row should be unique and can also be used as hash or index key.

This can be useful, if you wanna upload this data to NoSQL databases like DynamoDB with Primary key as col1 values (or with sort key as second column values)

Here are some one liners to generate data :

Search DynamoDB tables using Elasticsearch/Kibana via Logstash plugin

Written by mannem on . Posted in Dynamo DB, Elasticsearch

The Logstash plugin for Amazon DynamoDB gives you a nearly real-time view of the data in your DynamoDB table. The Logstash plugin for DynamoDB uses DynamoDB Streams to parse and output data as it is added to a DynamoDB table. After you install and activate the Logstash plugin for DynamoDB, it scans the data in the specified table, and then it starts consuming your updates using Streams and then outputs them to Elasticsearch, or a Logstash output of your choice.

Logstash is a data pipeline service that processes data, parses data, and then outputs it to a selected location in a selected format. Elasticsearch is a distributed, full-text search server. For more information about Logstash and Elasticsearch, go to

Amazon Elasticsearch Service is a managed service that makes it easy to deploy, operate, and scale Elasticsearch in the AWS Cloud.

This article includes an installation guide that is tested on EC2 instance where all the per-requsites are installed and Logstash is configured so that it connects to Amazon ElasticSearch using the input/Output plugins to start indexing records from DynamoDB. Click here to get all the instructions :

Logstash configuration:

After running a similar command on the shell, Logstash should successfully start and begin indexing the records from your DynamoDB table.

Throughput considerations:



Similar plugins:

Using DynamoDB as session provider with AWS SDK V3

Written by mannem on . Posted in Dynamo DB

The DynamoDB Session Handler is a custom session handler for PHP that allows developers to use Amazon DynamoDB as a session store. Using DynamoDB for session storage alleviates issues that occur with session handling in a distributed web application by moving sessions off of the local file system and into a shared location. DynamoDB is fast, scalable, easy to setup, and handles replication of your data automatically.

Setting up:

1. Make sure you have PHP >= 5.5.0
2. install AWS PHP SDK(v3) from here
3. Configure PHP SDK to use any of the credentials options as mentioned here:
4. See more details about DyanmoDB provided session handler here:
5. A DynamoDB table to store session info, with ‘id’ (String) as Hash key.

End to End PHP code with debug turned on:

> php sessionProvider.php
successfully connected

Now, check the DynamoDB table if the session information is stored successfully.

Here is the example structure(DynamoDB JSON format):

References :



Written by mannem on . Posted in Achitectures


When data arrives as a succession of regular measurements, it is known as time series information. Processing of time series information poses systems scaling challenges that the elasticity of AWS services is uniquely positioned to address.
This elasticity is achieved by using Auto Scaling groups for ingest processing, AWS Data Pipeline for scheduled Amazon Elastic MapReduce jobs, AWS Data Pipeline for intersystem data orchestration, and Amazon Redshift for potentially massive-scale analysis. Key architectural throttle points involving Amazon SQS for sensor message buffering and less frequent AWS Data Pipeline scheduling keep the overall solution costs predictable and controlled.

Online Game

Written by mannem on . Posted in Achitectures


A cost-effective online game architecture featuring automatic capacity adjustment, a highly available and high-speed database, and a data processing cluster for player behavior analysis.