Methodology

Police UK's open data: what's in it, what's not.

AI Enhanced Solutions · 8 min read

Almost every UK crime statistic shown to the public in the mid-2020s — every postcode crime map, every neighbourhood report, every property-site safety panel — sits on top of a single data source: the open data published at data.police.uk. A public API and a set of monthly CSV downloads form the substrate that the consumer-facing layer is built on. Few of those consumer products explain where the numbers come from, what they include, or where the source is silent by design. This piece does. By the end, any figure derived from Police UK should be readable with the right level of confidence — neither dismissed for being approximate, nor trusted further than the source itself warrants.

What the API publishes

The release covers recorded crime from the 43 territorial forces of England and Wales. Each incident is tagged with one of 14 standard categories — anti-social behaviour, bicycle theft, burglary, criminal damage and arson, drugs, other theft, possession of weapons, public order, robbery, shoplifting, theft from the person, vehicle crime, violence and sexual offences, and a residual "other crime" bucket. Every record carries the month it was recorded and a street-level location moved to the nearest of a fixed set of anonymous map points.

Alongside crime records, the API publishes outcome status for incidents where one is known — whether a suspect was charged, the case was filed pending further evidence, or no further action was taken — along with neighbourhood data describing the boundaries and contact details of each policing team, and force-area data describing the 43 forces themselves. The same content is available as monthly CSV bundles and as a JSON API; both update on the same schedule and draw from the same underlying release.

What it deliberately doesn't publish

Several categories of information are absent by design. Exact addresses are never published — anonymisation is enforced before any record reaches the open feed, and no version of the API exposes precise coordinates. Victim and suspect identities are absent for the same reason. The release contains no incident narratives and no officer commentary; there is nothing to read between the lines of, because there is no prose in the data at all.

The release is also not real-time. Every figure is archival, published weeks after the month it covers. Two whole categories of crime sit outside the API entirely. Fraud is handled centrally by Action Fraud and published through a separate pipeline. Terrorism-related offences are held back at the discretion of the relevant force and rarely surface in the open release. A reader looking for either should not expect to find them in the API, and should not interpret their absence from a postcode report as a statement about local conditions.

How "snap to map point" works

The anonymisation technique deserves a section of its own, because it is the feature of the data most often misread at street level. Police UK does not publish the recorded coordinates of an incident. It publishes the location of the nearest entry from a fixed, public list of map points. The points are chosen to represent meaningful geography without identifying individual addresses — typically a road junction, the centre of a residential street, the centroid of a large commercial premise, or a notional point in the middle of a public space.

The consequence is that a single map point absorbs every crime in the area around it. A junction at the end of a residential road may be the assigned point for incidents from several adjoining streets. A high count attached to one map point therefore does not necessarily indicate that the listed location is itself a hotspot — it may indicate that the location is the snap target for a wider catchment. Because the list of map points is public, this is auditable. The practical effect is that street-level interpretation needs to stay one zoom level wider than the dot suggests.

The release cadence

Police UK publishes new data around the second week of each month, covering the month that ended approximately six weeks earlier. A query run in mid-March will typically include data through the end of January, with February still pending. The lag is consistent enough to plan around, but it is not contractual — individual forces occasionally miss a publication window and catch up the following month, leaving a temporary gap that fills retroactively. Reclassifications happen too: when a category definition changes nationally, the open data is sometimes restated for previous months, and a count read in one month may differ slightly from the same count read for the same period a year later.

The archive runs deep. The API exposes data going back to late 2010, so more than a decade of monthly counts is queryable for any area. Long-range comparisons are possible but need to be made carefully — the methodology is not identical at both ends of the window, and a national reclassification can put a step in a series that looks, on the chart, like a real-world event.

Coverage gaps worth keeping in mind

The phrase "UK crime data" is, strictly speaking, wrong when applied to Police UK. The release covers the 43 territorial forces of England and Wales and does not extend uniformly to the rest of the United Kingdom. British Transport Police is published as a separate feed within data.police.uk, covering crime on and around the rail network — distinct enough geographically that it is best read on its own rather than merged with territorial counts.

Police Scotland publishes through its own channels and does not contribute to the Police UK API. Police Service of Northern Ireland is on a different system again, with its own release schedule and category schema. A "UK" headline derived from data.police.uk is, in practice, an England-and-Wales territorial-police headline, plus or minus the BTP feed. None of this makes the data wrong; it makes the label imprecise. We try to use "England and Wales" where we can.

What we add on top

UK Crime Insights queries the Police UK API within a given radius of a postcode centroid — half a mile by default — and aggregates the returned records by category and by month. The AI safety summary is generated from the resulting aggregates, and the comparison feature runs the same query for two postcodes side by side. Every figure in a Crime Insights report can be reproduced by querying the API directly for the same area and month range. What we add is the radius logic, the aggregation, the trend smoothing, and the prose layer that condenses 14 category counts into a paragraph. The numbers themselves are Police UK's.

Practical implications for the reader

Three things are worth carrying into any interpretation of a figure based on this source. The first is that the data measures recorded crime, not experienced crime. The dark figure — the gap between incidents that happen and incidents that reach police attention — is real, varies by category, and is largest for offences with social barriers to reporting. A postcode total is a count of what was reported and recorded, which is a strict subset of what occurred.

The second is that the snap-to-map-point anonymisation is loose at the level of a single address. A dot on a street-level map should be read as evidence of activity in the surrounding catchment, not as a verdict on the building it lands on. The third is that long comparisons across years need a methodological audit. If a series shows a sharp move between, say, 2018 and 2019, the first question to ask is whether something changed in the recording or category schema during that window — if it did, the move on the chart may be an artefact rather than a change on the ground. We work through the meaningful shifts in a separate article.