A project funded by the European Union is making it possible to convert centuries-old handwriting into typed text. Transkribus is a software program that can scan, transcribe and search historical documents that are in English or Norwegian.
The program is supported by the READ project (Recognition and Enrichment of Archival Documents), an international collaboration involving 14 countries. The ability to scan an image into printed text has existed for decades, but converting older script and idiosyncratic writing styles are a challenge. In order to shore up their database of examples, volunteers for Transkribus have been combing through old texts, uploading scans and adding transcriptions manually.
After uploading an image, volunteers center pink lines over each line of text and then provide transcription text to build what is called a Handwritten Text Recognition (HTR) model. Drawing on the many examples, the program is building a neural network by comparing all of the data, so the database improves exponentially as more comparison text is added.
After five years of data compilation, Transkribus has gathered enough content to recognize 92 percent of the words in a Norwegian diary from the 19th century. Eventually the program will serve a dual purpose as a search engine for old texts.
While Transkribus is currently targeted at librarians, scholars, historians and archivists for research and training purposes, public users may eventually also have access to the program. This would make it possible to quickly transcribe an ancestor’s letters or genealogical records from Norway.