The History of Optical Character Recognition (OCR)
Unless you’ve not been keeping up with our blog, you should hopefully now have a clear understanding of what Optical Character Recognition (OCR) technology is and how it works. However, how much do you know about its history? Can you guess how long it’s been around? Reckon it’s older than the World Wide Web? What about the PC? Below, we’ll answer all your questions.
Yes, OCR Technology Has Been Around Longer Than Both the Web and the PC!
While the World Wide Web was invented in 1989 —by British computer scientist Tim Berners-Lee while working at CERN— and, according to the Computer History Museum, the first PC —the Kenbak-1— was released in 1971, OCR technology can be traced back to the early 20th century.
Laying the Groundwork (1914-1950s)
- 1914: In 1914, physicist Emanuel Goldberg invented a machine that could read characters and convert them into telegraph code. It is considered one of the earliest examples of OCR technology.
- 1931: Goldberg later developed what he called a “Statistical Machine”, an electromechanical machine for searching microfilm archives using an optical code recognition system.
- 1950s: David H. Shepard was working as a cryptanalyst at the Armed Forces Security Agency (AFSA) —now the U.S. National Security Agency (NSA)— in 1951 when he developed “Gismo” in his spare time. Built in an attic with his colleague Harvey Cook Jr., “Gismo” was a machine that could read all 26 letters of the alphabet as produced by a standard typewriter. In 1952, Shepard established his company, Intelligent Machines Research Co. (IMR). In the video below, you can watch a 1959 episode of the U.S. game show, “I’ve Got a Secret” on which David H. Shepard appeared with the secret “I invented a machine that read and writes.”
Click here to accept marketing cookies and load the video.
Early Commercialization (1950s-1960s)
- 1954: U.S. magazine Reader’s Digest became the first business to install an OCR reader in their office in 1954. It was used to convert typewritten sales reports into punched cards. Other business soon followed suit.
- 1959: In 1959, IBM developed the IBM 1287 OCR machine, notable as being the first commercially sold scanner capable of reading handwritten numbers.
- 1960s: Ray Kurzweil, who would become a key figure in the history of OCR technology, began researching pattern recognition technologies while still at high school. He won first prize at the 1965 International Science Fair for a pattern recognition software he created that analyzed works of classical music and created compositions imitating the style of a given composer.
The Birth of Kurzweil Computer Products & the Digitization of Library Resources (1970s-1980s)
- 1974: In 1974, Ray Kurzweil founded Kurzweil Computer Products, Inc. and developed the first omni-font optical character recognition system, a computer program capable of recognizing text written in any normal font.
- 1980s: In 1980, Ray Kurzweil sold Kurzweil Computer Products, Inc. to Xerox, which renamed it as Scansoft.
- 1980s: During the 1980s, OCR technology became widely used for digitizing print documents, particularly in libraries and offices. OCR systems were used to convert books, newspapers, and archival documents into searchable text. Universities and research institutions began experimenting with OCR to create digital catalogs and searchable archives.
The Rise of Digital OCR & Personal Computing (1990s-2000s)
- 1990s: Caere Corporation (later acquired by ScanSoft, which merged into Nuance), a company then led by Robert Noyce —known for his role in founding Intel—, introduced OmniPage, one of the first OCR programs to run on personal computers, in the late 1980s. Due to the increase in PC use, this gained popularity during the 1990s. During this decade, businesses and organizations were increasingly transitioning from paper-based workflows to digital systems, driving demand for OCR solutions like OmniPage.
- 1999: ABBYY released FineReader in July 1993, a high-accuracy OCR software for multiple languages.
- Late 1990s-2000s: OCR was integrated into scanner software and became standard in PDF document processing. Cynthia Breazeal’s work in robotics, machine learning, and human-computer interaction laid the groundwork for more intuitive, AI-driven OCR applications. Isabel Meirelles‘ research during the 2000s on information visualization and human-centered design influenced how OCR outputs would be presented to users and how users would interact with documents post-recognition.
AI-Powered OCR & Cloud-Based Solutions (2010s-2020s)
- 2015: Incode Technologies is founded by Ricardo Amper.
- 2018: In 2018, Google launched Tesseract 4.0, an open-source OCR system. It marked a major technological leap in OCR because it incorporated deep learning-based text recognition using LSTMs (Long Short-Term Memory networks). The adoption of deep learning and neural networks in 2018 saw OCR technologies improve recognition for handwritten text and complex layouts.
- 2020s: OCR is widely integrated into cloud services (e.g., Google Drive, Microsoft OneDrive) and AI-powered applications, making real-time text recognition accessible via smartphones and online platforms.
Incode Launches Its Proprietary OCR Technology
- 2023: In 2023, Incode Technologies launched its proprietary OCR technology for identity verification. Purpose-built to extract and process data from over 4900 types of identity documents around the world with near-perfect accuracy, it continues to ensure fewer data discrepancies, enhanced fraud detection, and global compliance.
Learn more about Incode’s proprietary OCR technology.