data.police.uk provides a complete snapshot of crime, outcome, and stop and search data, as held by the Home Office at a particular point in history.
The actual data is located on S3
under bucket policeuk-data
and can be accessed with a URL similar to
https://policeuk-data.s3.amazonaws.com/archive/20yy-mm.zip , (Where yy,mm are year and month that can be replaced accordingly)
The Structure:
All files are organized by YEAR and MONTH.
Each month has a ZIP
file with CSV
files inside the zip file.
The January 2015 file 2015-01.zip
contains data for all months starting from 2010-12
to 2015-01
|
[hadoop@ip-172-31-24-128 mnt]$ wget https://policeuk-data.s3.amazonaws.com/archive/2015-01.zip [hadoop@ip-172-31-24-128 mnt]$ unzip 2015-01.zip [hadoop@ip-172-31-24-128 mnt]$ ls 2010-12 2011-02 2011-04 2011-06 2011-08 2011-10 2011-12 2012-02 2012-04 2012-06 2012-08 2012-10 2012-12 2013-02 2013-04 2013-06 2013-08 2013-10 2013-12 2014-02 2014-04 2014-06 2014-08 2014-10 2014-12 2015-01.zip 2011-01 2011-03 2011-05 2011-07 2011-09 2011-11 2012-01 2012-03 2012-05 2012-07 2012-09 2012-11 2013-01 2013-03 2013-05 2013-07 2013-09 2013-11 2014-01 2014-03 2014-05 2014-07 2014-09 2014-11 2015-01 [hadoop@ip-172-31-24-128 mnt]$ cd 2011-04 ##Example file-name structure## 2011-04-avon-and-somerset-street.csv 2011-04-cumbria-street.csv 2011-04-gloucestershire-street.csv |
Contents of a sample file:
|
[hadoop@ip-172-31-24-128 2011-04]$ head -6 2011-04-avon-and-somerset-street.csv Crime ID,Month,Reported by,Falls within,Longitude,Latitude,Location,LSOA code,LSOA name,Crime type,Last outcome category,Context ,2011-04,Avon and Somerset Constabulary,Avon and Somerset Constabulary,1.04177957823,52.0373951227,On or near The Street,E01029877,Babergh 005A,Other crime,, ,2011-04,Avon and Somerset Constabulary,Avon and Somerset Constabulary,-2.49436551493,51.4181694243,On or near Keynsham Road,E01014399,Bath and North East Somerset 001A,Anti-social behaviour,, ,2011-04,Avon and Somerset Constabulary,Avon and Somerset Constabulary,-2.49436551493,51.4181694243,On or near Keynsham Road,E01014399,Bath and North East Somerset 001A,Anti-social behaviour,, ,2011-04,Avon and Somerset Constabulary,Avon and Somerset Constabulary,-2.50993031962,51.4108734058,On or near Ludlow Close,E01014399,Bath and North East Somerset 001A,Anti-social behaviour,, ,2011-04,Avon and Somerset Constabulary,Avon and Somerset Constabulary,-2.5119272035,51.4094350194,On or near Harlech Close,E01014399,Bath and North East Somerset 001A,Anti-social behaviour,, |
The columns in the CSV files are as follows:
Field | Meaning |
Reported by | The force that provided the data about the crime. |
Falls within | At present, also the force that provided the data about the crime. This is currently being looked into and is likely to change in the near future. |
Longitude and Latitude | The anonymised coordinates of the crime. See Location Anonymisation for more information. |
LSOA code and LSOA name | References to the Lower Layer Super Output Area that the anonymised point falls into, according to the LSOA boundaries provided by the Office for National Statistics. |
Crime type | One of the crime types listed in the Police.UK FAQ. |
Last outcome category | A reference to whichever of the outcomes associated with the crime occurred most recently. For example, this crime's 'Last outcome category' would be 'Offender fined'. |
Context | A field provided for forces to provide additional human-readable data about individual crimes. Currently, for newly added CSVs, this is always empty. |
The Challenge:
- The given data contains some inbuilt errors in the Easting, Northing , Crime_type fields.
- Data is in CSV format with commas in data itself.
- The CSV files contains column HEADERS i.e the first record in a CSV file is a header record containing column (field) names
What is unique ?
- The same data can be accessed over API. The API is implemented as a standard
JSON
web service using HTTP GET
and POST
requests. Full request and response examples are provided in the documentation.
- The response contains ID of the crime which may be unique and can used as
HashKey
while storing and Querying in NoSql
.
- The JSON file can also be used for as index document for
Elasticsearch
.
Example API call via REST: https://data.police.uk/api/crimes-street/all-crime?lat=52.629729&lng=-1.131592&date=2013-01
Example Responce:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
|
[ { category: "anti-social-behaviour", persistent_id: "", location_type: "Force", location_subtype: "", id: 20599642, location: { latitude: "52.6269479", longitude: "-1.1121716" street: { id: 882380, name: "On or near Cedar Road" }, }, context: "", month: "2013-01", outcome_status: null }, { category: "burglary", persistent_id: "aebd220e869a235ba92cde43f7e0df29001573b3df1b094bb952820b2b8f44b0", location_type: "Force", location_subtype: "", id: 20604632, location: { latitude: "52.6271606", longitude: "-1.1485111" street: { id: 882208, name: "On or near Norman Street" }, }, context: "", month: "2013-01", outcome_status: { category: "Under investigation", date: "2013-01" } }, ... ] |
More details on API access can be found here: data.police.uk/docs/