Generate your own CSV/TSV data quickly with urandom + hexdump

Written by mannem on . Posted in Data-sets

In this article, we will use hexdump + urandom which are included in most linux distributions to quickly generate random data. The values of the first row should be unique and can also be used as hash or index key.

This can be useful, if you wanna upload this data to NoSQL databases like DynamoDB with Primary key as col1 values (or with sort key as second column values)

Here are some one liners to generate data :

UK police data

Written by mannem on . Posted in Data-sets

data.police.uk provides a complete snapshot of crime, outcome, and stop and search data, as held by the Home Office at a particular point in history.

The actual data is located on S3 under bucket policeuk-data and can be accessed with a URL similar to
https://policeuk-data.s3.amazonaws.com/archive/20yy-mm.zip , (Where yy,mm are year and month that can be replaced accordingly)

The Structure:

All files are organized by YEAR and MONTH.

Each month has a ZIP file with CSV files inside the zip file.

The January 2015 file 2015-01.zip contains data for all months starting from 2010-12 to 2015-01

Contents of a sample file:


The columns in the CSV files are as follows:

FieldMeaning
Reported byThe force that provided the data about the crime.
Falls withinAt present, also the force that provided the data about the crime. This is currently being looked into and is likely to change in the near future.
Longitude and LatitudeThe anonymised coordinates of the crime. See Location Anonymisation for more information.
LSOA code and LSOA nameReferences to the Lower Layer Super Output Area that the anonymised point falls into, according to the LSOA boundaries provided by the Office for National Statistics.
Crime typeOne of the crime types listed in the Police.UK FAQ.
Last outcome categoryA reference to whichever of the outcomes associated with the crime occurred most recently. For example, this crime's 'Last outcome category' would be 'Offender fined'.
ContextA field provided for forces to provide additional human-readable data about individual crimes. Currently, for newly added CSVs, this is always empty.

The Challenge:

  • The given data contains some inbuilt errors in the Easting, Northing , Crime_type fields.
  • Data is in CSV format with commas in data itself.
  • The CSV files contains column HEADERS i.e the first record in a CSV file is a header record containing column (field) names

What is unique ?

  • The same data can be accessed over API. The API is implemented as a standard JSON web service using HTTP GET and POST requests. Full request and response examples are provided in the documentation.
  • The response contains ID of the crime which may be unique and can used as HashKey while storing and Querying in NoSql.
  • The JSON file can also be used for as index document for Elasticsearch.

Example API call via REST: https://data.police.uk/api/crimes-street/all-crime?lat=52.629729&lng=-1.131592&date=2013-01

Example Responce:

More details on API access can be found here: data.police.uk/docs/


  • cloudformation

    cloudformation

    pipeline

    Data-pipelines

    directoryservice

    directoryservicez

    cloudtrail

    cloudtrail

    config

    config

    trustedadvisor

    Trustedadvisor

  • snap

    Snapshot

    glacier

    Glacie

    storagegw

    Storage Gatewa

    s3

    S3

    cloudFront

    Cloud Front

  • r53

    Route 53

    lambda

    lambd

    directConnect

    DirectConnect

    vpc

    VPC

    kinesis

    Kinesis

    emr

    Emr

  • sns

    SNS

    transcoder

    Transcoder

    sqs

    SQS

    cloudsearch

    Cloud Search

    appstream

    App Stream

    ses

    SES

  • opsworks

    opsworks

    cloudwatch

    Cloud Watch

    beanstalk

    Elastic Beanstalk

    codedeploy

    Code Deploy

    IAM

    IAM

  • dynamodb

    dynamodb

    rds

    RDS

    elasticache

    ElastiCache

    redshift

    Redshift

    simpledb

    simpledb