AI methodology

How our AI safety summaries are written.

AI Enhanced Solutions · 7 min read

Every postcode report in UK Crime Insights ends with a paragraph of generated prose: a short summary that translates the numbers above into a few readable sentences. This article explains how it is produced, what the model is given, what it is not, and where it should be read cautiously. We have written it because language-model output in consumer products tends to feel more authoritative than it is, and a confident summary of a small sample is one of the easier ways to mislead a reader without meaning to. The summary is meant to save some arithmetic, not to carry weight independently of the figures it is summarising.

What goes into the model

The input is a structured statistical bundle, not free text. It contains the headline total for the 12-month window, the 14 Police UK category counts and shares, the monthly counts behind the trend chart, the postcode itself, and a one-line description of the surrounding geography classified as urban, suburban or rural based on density. If a comparison is being run, the same bundle is supplied for the second postcode, with derived deltas so the model is not doing arithmetic in its head.

That is the entire input. The model is not given the underlying incident records, police outcome text, nearby news articles, or local commentary. It does not know the user's identity or query history. It has no national average, no regional benchmark, no external reference point that has not been passed in. Everything in the output should trace to a number the model can see.

What the model is asked to do

The instruction is narrow. The model is asked to produce a short executive summary of the input, in plain language, with every claim grounded in a number it has been given. It is told to use neutral framing, to avoid recommending a course of action, to avoid forecasting future crime, and to avoid "safe" and "unsafe" as standalone judgements. It is asked to mention a trend slope only when the data supports one rather than noise, and to describe categories in the order their counts warrant.

Most of the work is in the constraints rather than the model choice. A capable general-purpose model will, by default, produce a fluent paragraph that sounds confident and reads well — and that default is the failure mode we are trying to avoid. The prompt narrows the space of acceptable outputs until the model is essentially performing a careful re-reading of the figures. Temperature is low enough that two runs over the same input produce variants, not different opinions.

What the model is explicitly asked not to do

Several things are ruled out. The model is not to predict future crime — we discuss the reasons in a separate piece. It is not to recommend a course of action: no "you should consider", no "we suggest". It is not to imply causality, because the data contains counts, not mechanisms. It is not to speculate about who might be committing crimes, since the input contains no such information. It is not to compare to a national average or any other denominator that has not been passed in. And it is not to use language that suggests certainty in small samples — "decisively safer" or "clearly higher" are out, because the same phrasing would read as confidently on a difference that is not real.

The constraints exist because language models default to confident, complete-sounding prose. A confident paragraph about a postcode with twelve recorded incidents in a year is exactly what we do not want — the prose outruns the data and the reader cannot tell.

Where it can still mislead

Three failure modes survive even with strict prompting, and we mention them here so a careful reader can watch for them.

The first is confident framing of small samples. When the underlying counts are in single digits per category — a quiet rural postcode may post two or three incidents in several categories across the year — even careful phrasing can give the impression of a stable pattern. The model is not claiming the pattern is stable; it is describing what is in front of it. But confident prose reads as a stronger signal than a small table of counts.

The second is normalising language. The model tends to describe a postcode's category mix as "a typical mix" or "broadly in line with what one might expect" because that phrasing sits comfortably in its training distribution. "Typical" has not been benchmarked. The model has no national distribution to compare against, so a phrase that implies one is misleading. We work against this in the prompt, but the pull towards balanced phrasing is strong.

The third is sycophancy: a tendency to soften unfavourable readings so the paragraph feels balanced. A postcode with a high anti-social behaviour count and otherwise low totals may receive a summary that gives the high category roughly equal billing with the low ones, because balance reads as fairness. This is a property of how these models are tuned.

How we evaluate it

We spot-check the summary on a rotating sample of postcodes spanning urban, suburban and rural geography and high, low and mixed-count distributions. For each sampled report we read the generated summary alongside the structured input and ask three questions. Does every quantitative claim trace to a number in the input? Does the qualitative framing match the shape of the data, or has the prose softened or sharpened it? And would a careful reader of only the summary come away with the same impression as one who read only the numbers?

Failures are logged and feed the next round of prompt refinement. The work is unglamorous and ongoing — closer to quality assurance than research. Most obvious failures are now rare. The remaining ones are subtler, which is why this article exists.

How a careful reader can run their own check

Three things a reader can do.

First, read the summary, then the numbers, in that order. If the prose feels confident in a way the figures do not support — a single-digit count described as a pattern, a flat trend described as a movement — trust the figures. The summary is a translation; the figures are the original.

Second, check whether any trend the summary names is one the data actually shows. A category trend inferred from one or two months at the end of the window is noise, not a slope. If the summary mentions a rise or a fall, scan the monthly counts for that category before carrying the claim further.

Third, compare the summary for one postcode to the summary for another of similar profile. If the two read differently, the difference is in the data, not the model — the prompt is the same and the input is the only thing that varies. A quiet way to confirm the model is responding to the figures rather than producing a generic paragraph.

Read it as a translation, not a verdict

The closing principle is the one we hold to ourselves when working on the prompt. The summary is a translation of structured data into prose. It saves the reader some arithmetic and makes the report more accessible to someone who would rather not stare at fourteen category counts. It does not add information, and it does not carry an authority of its own.

A careful reader treats it as they would any other translation — useful, mostly accurate, occasionally smoothing edge cases the original handles more honestly. Pair the summary with the numbers and the reader gets the value of both. Read it alone and the reader risks inheriting its small confidences, which is the one thing we have asked the model, and ourselves, to avoid.