Google’s Open Source OCR

email

google ocrOCR -Optical character recognition- is a type of software designed to translate images of handwritten or typewritten text (usually captured by a scanner) into machine-editable text. OCR also has the capability to translate pictures of characters into a standard encoding scheme representing them (e.g. ASCII or Unicode).

Why am I blogging about OCR?  Well because Google has its finger in this Open Source Pie.

OCR History

Tesseract was the original OCR engine developed at the HP Labs between 1985 and 1995. HP decided to abandon OCR research and, for ten years, the software’s development has been frozen. In 2005, HP made Tesseract open source (Apache License) and Google, together with a research institute, have continued the development of the program.

Why is it important for Google to be invloved in OCR?

OCR is useful for Google Book Search and it could be useful for Picasa or Image Search in addition to an object recognition engine. And, if Google improves the software, it could be launched as a successful alternative to commercial applications.

Watch this space…

Share on Tumblr


Head of SEO and Inbound Marketing at UK's largest media planning and buying agency. Omar has over 10 years experience in digital marketing with the last 6 being in large media agency environments, developing and implementing cutting edge digital campaigns for some of the world's best known brands. For the latest in digital marketing and industry news and updates, follow Omar's Twitter stream (@OmarKattan) or add him to one of your Google+ Circles: . The content of this article represents the personal views of the author and does not constitute professional advice.

Share This Post

Recent Articles

© 2012 Omar Kattan. All rights reserved. Site Admin · Entries RSS · Comments RSS
Powered by WordPress · Designed by Theme Junkie
  • RSS
  • Twitter
  • Google Plus
  • Tumblr
  • Facebook
  • LinkedIn
  • FriendFeed
  • Digg
  • Flickr
  • YouTube
  • Delicious