Posts Tagged ‘search’

Search DynamoDB tables using Elasticsearch/Kibana via Logstash plugin

Written by mannem on . Posted in Dynamo DB, Elasticsearch

The Logstash plugin for Amazon DynamoDB gives you a nearly real-time view of the data in your DynamoDB table. The Logstash plugin for DynamoDB uses DynamoDB Streams to parse and output data as it is added to a DynamoDB table. After you install and activate the Logstash plugin for DynamoDB, it scans the data in the specified table, and then it starts consuming your updates using Streams and then outputs them to Elasticsearch, or a Logstash output of your choice.

Logstash is a data pipeline service that processes data, parses data, and then outputs it to a selected location in a selected format. Elasticsearch is a distributed, full-text search server. For more information about Logstash and Elasticsearch, go to

Amazon Elasticsearch Service is a managed service that makes it easy to deploy, operate, and scale Elasticsearch in the AWS Cloud.

This article includes an installation guide that is tested on EC2 instance where all the per-requsites are installed and Logstash is configured so that it connects to Amazon ElasticSearch using the input/Output plugins to start indexing records from DynamoDB. Click here to get all the instructions :

Logstash configuration:

After running a similar command on the shell, Logstash should successfully start and begin indexing the records from your DynamoDB table.

Throughput considerations:



Similar plugins:

UK police data

Written by mannem on . Posted in Data-sets provides a complete snapshot of crime, outcome, and stop and search data, as held by the Home Office at a particular point in history.

The actual data is located on S3 under bucket policeuk-data and can be accessed with a URL similar to , (Where yy,mm are year and month that can be replaced accordingly)

The Structure:

All files are organized by YEAR and MONTH.

Each month has a ZIP file with CSV files inside the zip file.

The January 2015 file contains data for all months starting from 2010-12 to 2015-01

Contents of a sample file:

The columns in the CSV files are as follows:

Reported byThe force that provided the data about the crime.
Falls withinAt present, also the force that provided the data about the crime. This is currently being looked into and is likely to change in the near future.
Longitude and LatitudeThe anonymised coordinates of the crime. See Location Anonymisation for more information.
LSOA code and LSOA nameReferences to the Lower Layer Super Output Area that the anonymised point falls into, according to the LSOA boundaries provided by the Office for National Statistics.
Crime typeOne of the crime types listed in the Police.UK FAQ.
Last outcome categoryA reference to whichever of the outcomes associated with the crime occurred most recently. For example, this crime's 'Last outcome category' would be 'Offender fined'.
ContextA field provided for forces to provide additional human-readable data about individual crimes. Currently, for newly added CSVs, this is always empty.

The Challenge:

  • The given data contains some inbuilt errors in the Easting, Northing , Crime_type fields.
  • Data is in CSV format with commas in data itself.
  • The CSV files contains column HEADERS i.e the first record in a CSV file is a header record containing column (field) names

What is unique ?

  • The same data can be accessed over API. The API is implemented as a standard JSON web service using HTTP GET and POST requests. Full request and response examples are provided in the documentation.
  • The response contains ID of the crime which may be unique and can used as HashKey while storing and Querying in NoSql.
  • The JSON file can also be used for as index document for Elasticsearch.

Example API call via REST:

Example Responce:

More details on API access can be found here: