What is Optical Character Recognition (OCR) for Identity Verification?
Optical Character Recognition (OCR) technologies for identity verification extract text from images of government-issued IDs and translate it into machine-readable data.
This technology saves people the time and hassle of manually inputting data from printed or non-editable documents or images into a digital system, while also improving accuracy, enhancing fraud detection, ensuring global compliance, and helping business to expand globally.
Top-range OCR technologies, such as those that are purpose-built, not only extract and read data faster than humans, they also make fewer mistakes.
OCR technology in the age of the telegraph
OCR technology can be traced back to the early 20th century. In 1914, physicist Emanuel Goldberg invented a machine that could read characters and convert them into telegraph code. It is considered one of the earliest examples of OCR technology.
Later, Goldberg developed what he called a “Statistical Machine”, an electromechanical machine for searching microfilm archives using an optical code recognition system. In 1931, he was granted U.S. patent number 1,838,389 for the invention. IBM promptly acquired rights to the patent.
What obstacles can basic OCR technologies face?
Thousands of document types
Across the world, thousands of different types of identity documents are in use, and they all have their own formats, fonts, and security features. For businesses to be able to operate internationally, the OCR technology they use has to be able to successfully classify the wide range of document types that exist in the regions and countries in which they are active, and accurately extract and interpret the data that each one contains. To do this, their ML models need to be trained on large sets of IDs, which is not the case with top-of-market general-purpose OCR technologies.
Text readability
Basic OCR technologies can struggle to recognize unusual fonts. Documents that use multiple fonts can be particularly challenging to read.
Language limitations & special characters
Documents containing data in different languages require OCR technologies to seamlessly switch between recognition models, which can prove challenging.
Non-Latin scripts, such as Arabic or Chinese, can be harder to recognize due to their larger character sets, intricate shapes, contextual variations —some letters change shape depending on their position in a word— and subtle differences between characters.
While Latin-based alphabets typically have 26 characters —plus additional diacritics—, some non-Latin languages have thousands of characters. Chinese has over 50,000 characters, with around 8,000 in common use. Japanese is a particularly complex language as it combines three scripts: Kanji (Chinese characters), Hiragana, and Katakana. Even when processing Latin-based languages, subtle diacritical marks can be misread by basic OCR technologies.
Some non-Latin scripts are written from right to left or vertically, which can confuse basic OCR technologies that have been trained to process information from left to right.
Tricky symbols
Special service symbols —such as those that identify a US bank check— can be harder to read. Many general-purpose OCR technologies are not trained to read special symbols and therefore ignore them, meaning the information they contain is lost during the extraction process.
Confusing designs & similarities between different document types
Complex layouts with multiple columns, tables, or a mix of text and images can confuse basic OCR technologies. Insufficient contrast between text and background can also affect their performance. Obstructions such as overlapping objects can result in data being misinterpreted. Similar document types, such as a learner’s permit and a driving license, can confuse basic OCR technologies.
Dependency on third-party developers
OCR technologies that depend on third-party developers can be slower to adapt to changes, which impacts performance. These OCR technologies might misinterpret data from government documents on account of them having incorporated new elements that models aren’t familiar with or capable of recognizing.
Challenging environmental conditions
Poor lighting and shadows can impact accuracy.
Poor image quality
Low-quality, blurry images can cause OCR technologies to misinterpret data. Stains can also impact accuracy.
What risks can arise as a result of inaccurate OCR?
Identity fraud
Misinterpreted credentials or failure to detect counterfeit or tampered identity documents can result in unauthorized individuals gaining access to systems, areas, and services, putting people and organizations at risk of being exposed to criminal activity or dangerous behaviour.
Misinformation
Incorrect or incomplete data extraction can spread misinformation, propagate errors, and disrupt business operations. Decisions made in response to analysis of defective data can harm your organization.
Regulatory breaches
When extracting data for compliance purposes —such as KYC (Know Your Customer) requirements in banking—, inaccuracies can result in regulatory breaches, litigation, and criminal penalties.
Disrupted business operations
Poorly interpreted data can require extensive manual review, which is time-consuming and costly. Errors in extracting data slow down business operations.
Poor UX & damage to reputation
Data processing errors can damage your organization’s reputation, result in user dissatisfaction and loss of trust, and attract negative media attention. Errors such as false rejections —when OCR technology incorrectly flags valid documents as invalid— can frustrate users. If the OCR process is slowed down by complex conditions, languages, and layouts, this can impact the user experience and increase drop-off rates.
Missed opportunities for global expansion
The inability to recognize non-Latin languages and updates to international identity documents can hinder opportunities to expand globally and reach international clients or customers.
Data leaks
Errors in data classification could expose sensitive data.
Incode OCR technology guarantees accuracy and scalability
Incode’s purpose-built proprietary OCR technology uses machine learning to capture, classify, and process data from over 4900 global identity documents with near-perfect accuracy.
From capturing high-quality images in suboptimal conditions to parsing complex fonts, elements, and special symbols, our document-specific OCR pipelines are robust, scalable, and constantly evolving.
Purpose-built for global IDs
Unlike general-purpose solutions, purpose-built technologies are optimized for a specific task. In our case, Incode’s proprietary OCR technology is purpose-built to extract and process data from thousands of different types of identity documents from around the world, guaranteeing near-perfect accuracy.
In fact, Incode has been shown to outperform Google’s general-purpose OCR solution in extracting and interpreting data from key fields on identity documents such as name, expiry date, and document number.
Recognizes complex fonts & elements
Our powerful machine learning (ML) models enhance OCR performance by adapting to document-specific variations, including complex fonts, diacritics, symbols, and barcodes.
Built for global scalability
Our proprietary OCR technology extracts Latin and non-Latin based text from over 4900 full document types from 200+ countries and territories with unparalleled accuracy. Our 2-stage algorithm for classifying document types from around the world plays a key role in ensuring correct data extraction.
Ensures regulatory compliance
By improving accuracy, we help to ensure that your organization complies with industry regulations, which could otherwise result in penalties or reputational damage.
Works at lightning speed
Thanks to our powerful machine learning (ML) models and adaptability, our capture SDK outperforms humans by evaluating multiple frames within seconds.
Real-time feedback & image optimization
Our capture SDK employs smart frame selection and provides real-time feedback to optimize image quality. This ensures accuracy and completeness, even on low-end devices or in poor visual conditions. Our intelligent system will automatically detect the ID orientation during the capture phase.
Stay one step ahead
Our pipelines are tailored for different document types, ensuring we can quickly adapt to new document structures, layouts, and security elements. Our in-house team of developers act fast when updates are required and our in-house team of reviewers keep our database constantly updated.
How our OCR technology works
From capture to completion, this is our step-by-step guide to how Incode’s proprietary OCR technology uses machine learning to achieve near-perfect accuracy during identity verification.
Step 1: ID capture
Videostream processed
During the capture phase, our software development kit (SDK) processes videostream of the incoming document via the user’s camera. The orientation of the document is automatically detected. For certain documents types, NFC chips are read during the capture phase.
Quality estimation
During the capture phase, our machine learning (ML) model evaluates every frame of the videostream and estimates the quality of the image in a matter of milliseconds.
Real-time feedback
If the image quality is compromised —possibly due to low lighting conditions, the document not being entirely visible, a low-contrast background, or the image being out of focus— we provide real-time feedback so that the user can make adjustments before repeating the process —such as turning on an extra light, holding their phone further away to capture the entire document, or holding it more steadily to ensure the image is in focus.
Final quality check
If the ML model recognizes a frame in which the entire document is visible and recognizable, that frame is then sent to a server and checked by a larger ML quality estimation model. If the frame passes this final check, then the capturing process is complete.
User photo, if necessary
If the process is unsuccessful —again, possibly due to low lighting conditions, the document not being entirely visible, a low-contrast background, or the image being out of focus—, the user will be prompted to take a photo instead of our SDK analyzing multiple frames from a videostream. Their photo will then be sent to the larger ML quality estimation model for final approval.
Step 2: ID classification
Our two-stage classification system ensures precise document identification.
Candidate proposal
This is the process of generating a set of potential document types (or “candidates”).
In this phase, we generate potential document-type matches by leveraging a neural network to match the document’s key features. Here’s how the process works:
1_ Feature Extraction: The system extracts key characteristics from the document, such as text placement, field positions, layout, colors, and other essential attributes.
2_ Vector Representation: These features are used to generate a numerical vector representation of the document. This vector captures the essential details of the document that can help identify its type (e.g., passport, driver’s license, etc.).
3_ Matching: The system compares this vector with a database of known document types. It calculates the similarity between the vector of the incoming document and those of previously classified documents in the database.
4_ Candidate Generation: Based on this similarity matching, the system proposes a set of candidate document types that closely resemble the document being processed. These are the “candidate proposals.”
Refinement
This step refines the document type by analyzing specific details, using text-based analysis and other distinguishing features. Here’s how it works:
1_ Textual Analysis: The system inspects the text present on the document to identify specific words, phrases, or symbols that differentiate similar documents. For example, it may check for keywords like “permanent” vs. “temporary” on a residence permit or specific terms that distinguish a learner’s permit from a driver’s license.
2_ Field-Specific Checks: It also looks at the format and content of certain fields (e.g., document number, issuing authority, expiration date) to ensure they match the expected pattern for a particular document type.
3_ Final Document Type Selection: After analyzing these textual and format-specific details, the system refines its initial candidate proposals and selects the exact document type.
Step 3: ID OCR
Next, our Optical Character Recognition (OCR) technology extracts and interprets the text that appears on the document. This phase is split into two stages: detection and recognition.
Detection
During the detection stage, our algorithm identifies the precise locations of the words that appear on the document. It takes into account the information it has collected about the document type in order to recognize words of significance. This segmentation model is trained to determine word boundaries, enabling it to successful process denser texts.
Recognition
During the recognition stage, our algorithm uses an autoregressive language model based on a vision transformer model to recognize the detected words. This model outputs probabilistic predictions of the words’ content. The results from the ID classification stage provide the recognition model with information about the formatting of each field. The recognition model can then operate within each format to deliver near-perfect accuracy for structured fields or fields with special symbols.
Step 4: Barcode reader
Barcodes contain valuable user information, yet they can be challenging for OCR technologies to read. To mitigate this issue, we developed a ML model that restores and enhances poor-quality images of barcode to make them easy to read.
Step 5: Entities extraction & representation
We deliver near-perfect accuracy when identifying and extracting key entities like names, addresses, and document numbers. Our system processes the extracted text into structured data, accounting for differences in date formats, shifting field positions, and document-specific rules (like front vs. back address stickers).
Drive conversions and completion rates with our streamlined workflow
Our ML models handle all the heavy lifting, making every user interaction feel effortless. By delivering near-perfect results —even in suboptimal conditions—, we save our users time and minimize the need for manual intervention.
Features such as smart frame selection, automatic ID orientation detection, and real-time feedback help to ensure our process is straight-forward and easy to navigate. By simplifying and speeding up the process, we boost completion rates and drive conversions.