DOMESDAY BOOK 2025
A personal project that has some overlap with my cutting-edge MS by Research work into the geospatial reasoning capabilities of frontier large language models (LLMs). Here, an enriched set of LLM prompts (one for each settlements over 100 people) uses open data to attempt to generate the best auto-produced written text summary of all known settlements in England and Wales.
This produces a final searchable PDF cultural artifact, as well as interesting findings on the geographic bias (towards larger places) of accurate geospatial knowledge recall in OpenAI language models... Scroll for interactive outputs of provisional results!
Searchable Text 'Domesday Book' Artefact
View the downloadable PDF by clicking on the document icon beside this text to find written text descriptions of all settlements in England and Wales!
But how much can we trust the LLM knows where these places are? See below for some provisional results on this...
LLM Errors: 'Large' Places (n=87)
As seen below, with a few (often severe) exceptions, the LLM error rate is not very high here. The places tested for the map shown below represent 'arced' differences (errors) for 'Large' BUA size classification places from the ONS BUA data for England and Wales (stated lat/lng vs geometric centroid).
LLM Errors: 'Small' Places (n=920)
Despite a higher sample size, the LLM error rate increases meaningfully as place size decreases. The places tested for the map shown below represent 'arced' differences (errors) for 'Small' BUA size classification places from the ONS BUA data for England and Wales (stated lat/lng vs geometric centroid).