Skip to content

OpenRefine

A tool that cleans, formats, or transforms thousands of records at a time.


OpenRefine requires an environment module

In order to use OpenRefine, you must first load the appropriate environment module:

module load openrefine

OpenRefine, previously known as Google Refine, is a tool that helps us work with big messy data in many formats. It provides the ability to clean, format, or transform thousands of records at a time with a short learning curve for its basic functionalities.

It is particularly useful for those dealing with textual data in tables supporting over 15 languages and the following formats:

  • comma-separated values (CSV) or text-separated values (TSV)
  • Text files
  • Fixed-width columns
  • JSON
  • XML
  • OpenDocument spreadsheet (ODS)
  • Excel spreadsheet (XLS or XLSX)
  • PC-Axis (PX)
  • MARC
  • RDF data (JSON-LD, N3, N-Triples, Turtle, RDF/XML)
  • Wikitext

Using OpenRefine on RCC Resources#

To run OpenRefine on RCC Desktop follow these steps:

  1. Open a Desktop session on Open OnDemand
  2. Start a Terminal in the newly loaded Desktop window
  3. Type on the Terminal the following commands:
$ module load openrefine
$ refine

Uploading Files#

Before creating your first project, your files have to be uploaded to your home directory. The easiest way to do this is to open the Files tab in the Open OnDemand interface and then click on the ** Upload ** button above your current files. If you need any of the other methods for uploading data please refer to our documentation for transferring files

Basic Functions#

Here, we shall discuss some of the basic commands of OpenRefine. For additional information please consult the official documentation on the OpenRefine website.

Most of the time, the data will open in a series of columns no matter the format that is used as input. For each of these column field titles, you will see a drop menu arrow. The most commonly used command here is "Facets".

To transform the data, you can use the rest of the commands found in the column's drop-down menus (Edit Cells, Edit Columns, Transpose, Reconcile), which all have their own variations and nuances.

Exporting Data#

To save your data, proceed as you would with any other Open OnDemand application (File -> Save). To export your data, you can either use the built-in export button on OpenRefine or download the document from your home directory through the Open OnDemand files interface.