Project Description



eXtracToolbox is a set of tools to extract information from scanned analogue (paper) documents. It uses advanced artificial intelligence algorithms to discover, recognize and extract information from unstructured contents. The solution uses, among others Named Entity Recognition and Relation Detection algorithms.

eXtracToolbox includes predefined configurations for extracting information from legal and official documents, e.g. notarial deeds, notifications from land and mortgage registers and a wide set of documents included in geodetic surveys, e.g. field sketches, coordinate lists.

The toolbox is prepared for integration with any file repositories, database systems (DBMS) or services. It has the ability to return data in any format and protocol.

Key features:

  • It allows you to extract information sets from any scanned raster documents (with or without OCR),
  • It includes a predefined configurations for official, legal and surveying documentation,
  • Its learning mechanisms make it capable of adapting to multiple types of documents,
  • It allows you to output the data to any format, e.g. xml, json, rdf, csv or import to a chosen database, e.g. PostreSQL, MSSQL, Oracle,
  • It provides ways for easy and quick integration with any system and service,
  • It has a documented API and a flexible interface.