Handwritten Text Recognition (HTR)



Adam Matthew is delighted to be able to offer Handwritten Text Recognition (HTR) search technology for all the manuscript documents in Mass Observation Project. This delivers document-level search results highlighted in handwritten documents.
 

What is HTR?
How do I know that a document can be searched using HTR?
Can I see a transcript of the manuscript material?
Why can't I search the entire collection using HTR?
Can I use phrase-searching and Boolean operators like AND or OR?



What is HTR?

Handwritten Text Recognition is a technology that aims to deliver search results within documents in manuscript by interpreting handwriting. Adam Matthew has harnessed leading software employing artificial intelligence and probability, and which does not rely on transcripts, to produce very effective searchability of these documents.

[back to top]


 
How do I know that a document can be searched using HTR?

All handwritten documents in Mass Observation Project can be searched using HTR. All other documents, which are wholly or largely typed or printed, can be searched using standard Optical Character Recognition (OCR) technology.
Screenshot illustrating how to click on any manuscript document to perform an HTR search.

[back to top]


Can I see a transcript of the manuscript material?

The HTR technology we have used does not currently produce transcripts. The methodology used takes a different approach to identifying search terms based on artificial intelligence and probability. The result is a marked improvement in the accuracy of highlighted results over transcript-reliant systems.

[back to top]

 
Why can't I search the entire collection using HTR?

Our HTR technology does not yet support searching the entire set of manuscript documents at once through a basic site-level search. Each manuscript document can be searched individually using HTR via the search box in the document's image-viewer.

A basic site-level search or a search from the Advanced Search page will, in the case of manuscript documents, search only the metadata that Adam Matthew's editorial team has assigned to each one. Results in document metadata can be isolated in the results list by deselecting the 'Full text' button under the 'Current search criteria' heading above the list. (By contrast, deselecting the 'Metadata' button isolates results in the searchable full text of typed/printed documents.)

If there is a hit in a manuscript document's metadata for the term searched for, then the HTR software will automatically search for that term throughout that document if a user selects the document from the search-results list.

Alternatively, any HTR-enabled document can be searched directly by typing your chosen term into the search box in the document's image-viewer.

Screenshot illustrating highlighted hits in a manuscript document after an in-document HTR search.

Any results will be highlighted in the main image (with the first image that has results displayed), and also listed below in the form of snippets from all the images with results.

Click on a snippet to see its image in full, with the result(s) highlighted.
Screenshot illustrating snippets showing HTR results throughout a document, and how clicking on a snippet shows the image in full with the hit(s) highlighted. 

[back to top]


Can I use phrase-searching and Boolean operators like AND or OR?

Phrase-searching and Boolean operators are not currently supported. The HTR software searches for each entered search term individually, so a search for war AND peace will search for instances of all three words, whether occurring separately or together.

[back to top