The immediate goal is to let the OpenPlaques folk have an automatic service which machine-reads English Heritage Plaques (blue plaques – very common at historic sites in the UK) from their flickr photos and then squirt out the English text. Currently volunteers are transcribing the text by hand.
Below you’ll see a quick demo, I’ve used the bottle.py microframework to run my webservice, it takes a URL to an image, converts it to a TIFF image, passes it into Tesseract and presents the recognised text as a text output.
This isn’t live on the web yet (it needs a bit more work) but shortly it’ll be up for public use.
Update – following this tesseract image clean-up advice (isolate text region, threshold, convert to b&w) I can extract very clean text – contrast these results with what you see in the video.
1885 – 1963
PAINTER & DESIGNER
INN SIGNS (Note – I extracted the inner circle so Sussex isn’t shown)
WAS DONATED BY
JEAN & BRIAN CROSSLEY
OP BROCKHAM (Note – 1 typo here with OP)
JEAN’S 80th BIRTHDAY