West Yorkshire's public digital archives contain tens of thousands of duplicate images across municipal databases, a problem that archivists and IT managers at Leeds City Council have been quietly working to quantify since January 2026. The numbers are striking. Preliminary internal assessments, shared at a West Yorkshire Digital Infrastructure Forum session held at Leeds Civic Hall in March, indicated that duplicate or near-duplicate image files account for roughly 23 percent of total storage in some departmental repositories — a figure that translates directly into wasted expenditure on cloud hosting contracts renewed each April.
The issue matters right now because Leeds City Council is partway through a multi-year digitisation programme tied to its Smart Leeds initiative, which formally expanded its scope in late 2024 to include planning records, heritage photography and social care case files. As more analogue material gets scanned and uploaded, the duplication problem compounds. A file scanned twice by different departments, saved under different filenames, and uploaded to separate systems does not flag itself as a duplicate without dedicated deduplication software — and not every team within the council is running that software.
Where the Problem Is Most Visible in Leeds
Two organisations are at the centre of efforts to address this locally. Leodis, the photographic archive of Leeds maintained by Leeds Libraries and run out of the Central Library on Calverley Street, holds more than 100,000 digitised images of the city dating back to the mid-19th century. Cataloguers there have identified that a meaningful proportion of uploads received from community contributors between 2020 and 2025 duplicated material already held in the collection — sometimes the identical photograph submitted by two different donors, sometimes slightly cropped versions of the same image treated as distinct records. The library service has not published a final figure, but internal working documents presented at a Digital Heritage Leeds workshop in February 2026 put the estimate at several thousand affected records.
Meanwhile, the West Yorkshire Archive Service, which operates a reading room at Chapeltown Road as well as sites across the region, is running a parallel review of its born-digital holdings. Staff there have been piloting perceptual hashing tools — software that generates a fingerprint for each image and compares it against every other file in the system — since autumn 2025. The technology is not new, but applying it at scale to a collection that spans local government, church, and business records requires significant processor time and human review of flagged matches.
The Storage Costs Add Up Fast
Storage is not free, and the numbers underline why this has moved from a housekeeping issue to a budget one. Leeds City Council's ICT services division pays for cloud storage under contracts with rates that, across the public sector in England, typically run between £18 and £40 per terabyte per month depending on tier and provider, according to Crown Commercial Service framework guidance published in 2025. A heritage photography collection running to several terabytes, with 23 percent redundancy, means the council may be paying for hundreds of gigabytes of files it already has — month after month. At the lower end of that pricing range, even a modest one-terabyte reduction in duplicate storage saves roughly £216 a year per affected system. Multiply that across a dozen departmental image repositories and the annual saving becomes material.
The deduplication review is also about public usability, not just cost. Researchers using Leodis to find historical photographs of Headingley, Harehills, or the Victorian terraces around Burley Road currently encounter duplicate entries that create confusion about whether two listings represent genuinely different images or simply the same photograph catalogued under two names. That friction slows legitimate research.
Leeds Libraries has indicated it plans to publish updated guidance for community contributors submitting photographs to Leodis later this summer, with clearer submission protocols designed to reduce incoming duplicates at source. The West Yorkshire Archive Service's deduplication pilot is expected to produce a report by September 2026, which will inform decisions about whether to roll out the hashing tool across all five of its sites. Residents who use either service and believe they have submitted duplicate material are being encouraged to contact the relevant archive directly before the autumn review period closes.