OCRopus is a state-of-the-art document analysis and OCR system, featuring pluggable layout analysis, pluggable character recognition, statistical natural language modeling, and multi-lingual capabilities.
The OCRopus engine is based on two research projects: a high-performance handwriting recognizer developed in the mid-90’s and deployed by the US Census bureau, and novel high-performance layout analysis methods.
Open-source project called Ocropus is an AI research group. Among other things, they are looking to establish advanced character-recognition technologies
"OCRopus is a state-of-the-art document analysis and OCR system, featuring pluggable layout analysis, pluggable character recognition, statistical natural language modeling, and multi-lingual capabilities."