Applications and Tools

tesseract

Tesseract is an open source optical character recognition (OCR) platform. OCR extracts text from images and documents without a text layer and outputs the document into a new searchable text file, PDF, or most other popular formats. Tesseract is highly customizable and can operate using most languages, including multilingual documents and vertical text.

OpenRefine

OpenRefine previously, known as Google Refine, is a common and important tool that helps us work with big messy data in many formats. It provides the ability to clean, format, or transform thousands of records at a time with a short learning curve for its basic functionalities. It will be particularly useful for those dealing with textual data in tables  supporting over 15 languages and the following formats: 

CmdSTAN

CmdSTAN is an implementation of the STAN software package for Statistical modeling. It is a powerful source-to-source compiler which compiles STAN code to C++ from which a fast and optimized executable can be generated.