Product Release2026-04-24

OCR microservice ships with dual engines and automatic fallback

The OCR microservice now supports Tesseract 5.8 LSTM and an alternate engine with automatic fallback. Accuracy becomes the property of the service rather than any single engine ; legacy OCR pipelines can be re-pointed without service interruption.

Optical character recognition is a quality-floor problem : no single engine handles every document type well. Printed financial statements suit one engine, handwritten signatures another, fixed-format invoices a third. Enterprises that standardise on a single OCR engine accept the floor of that engine's worst case ; the platform's approach inverts the relationship by treating engine choice as an internal implementation detail of the service.

Tesseract 5.8 LSTM as primary. The latest LSTM-based release of the canonical open-source OCR engine. Strong on printed text, strong language coverage, well-tuned defaults.
Alternate engine as fallback. A second engine kicks in when the primary returns below a confidence threshold or fails the layout-detection step. The dispatch is automatic ; callers see one result, not an engine selector.
Integration-tested output parity. The CI suite confirms both engines produce comparable output across a corpus of representative documents ; the fallback path is exercised on every release.
Legacy migration. Enterprises running a legacy OCR pipeline (commercial or in-house) repoint to the microservice with the same input contract ; accuracy at the boundary is at least the floor of the previous engine, often better.

See the feature →

← All posts