Scientific Advances Drive Innovative Technologies for ECM Recognition and Classification
November 13, 2009
“New scientific advances in artificial intelligence are allowing Optical Character Recognition (OCR) and Intelligent Document Recognition (IDR) products to significantly reduce the time and costs associated with identifying, sorting and extracting information from large and disparate volumes of paper documents – and have made the quest for a digital mailroom into a reality,” says David Rock, president of NovoDynamics, Inc.
NovoDynamics, an Ann Arbor, MI technology firm has applied advanced science, including artificial intelligence, pixel on page analytics and an array of other innovative technologies to help organizations burdened with paper documents to make their business processes more effective. While a truly paperless environment is likely not achievable, by applying next-generation OCR and IDR technology, processes can become much more cost effective.
As Enterprise Content Management (ECM) and Business Process Management (BPM) vendors attempted to rise to the “paperless” challenge with new OCR/IDR solutions, serious obstacles remained. The available technology was unsuccessful in addressing customers’ needs because it was unable to reliably recognize documents whose content was shifted or scaled, it couldn’t detect the languages on a page to automatically perform OCR, and it couldn’t extract information from degraded or low resolution scanned images. The adoption rate of IDR solutions was slow because customers were not achieving the cost reductions and improved efficiencies that they expected.
“Globalization has added additional pressure for effective OCR and IDR solutions because companies are being forced to do more with less. Effective ECM solutions must identify, sort and extract information in multiple languages to provide greater efficiencies and cost reductions,” says Rock.
As a case in point, imagine a multi-national financial services firm that receives documents from around the world in diverse languages. The documents may be originals, photocopies, faxes, faxes of faxes or even printed from a home computer. The firm needs to scan the documents it receives and then have the computer automatically recognize, sort, extract, translate and tag the documents for integration into a business process or for later search and retrieval. Traditional OCR/IDR solutions fall far short of being able to efficiently move the digital documents through the system quickly and easily. Typical IDR solutions are difficult to train, requiring customers to pay for costly professional services during installation, and their low recognition accuracy forces customers to manually sort a significant number of their documents. Typical OCR solutions are unable to automatically identify the language on a page and cannot extract information from low resolution or degraded documents.
The good news is that new advanced technologies, which leverage rich legacies in scientific approaches to artificial intelligence, are now available to overcome the challenges faced by the multi-national financial services firms, large insurance companies, manufacturing, government organizations, the security sector, and others who deal with voluminous amounts of paper documents that contain the critical information to run their operations efficiently. Whether the mission is to stay competitive in a business environment or better serve a constituency with public sector services, the ability to make the processing of information more automatic saves time and money.
New artificial intelligence and image analysis technology embedded in OCR/IDR solutions is available today that eliminates a huge cost hurdle by providing automatic document recognition training – no professional services or tedious, error-prone manual sorting needed. The computer does the work, not the customer. Tackling problems with degraded pages or ones that are rotated is also no problem. Recognizing pages whose content is shifted or scaled is also handled automatically as is the ability to find information stored on a page and automatically detect its language for conversion into computer text. Best of all, this new technology can easily and quickly, through an API, be integrated into third party solutions.
This “next generation” technology radically changes the way businesses can manage mission-critical information. It sets the bar at a much higher level and no doubt will continue to be refined as new scientific discoveries are made. But, for now, the digital mailroom is within reach.