OpenRefine
OpenRefine
OpenRefine previously, known as Google Refine, is a common and important tool that helps us work with big messy data in many formats. It provides the ability to clean, format, or transform thousands of records at a time with a short learning curve for its basic functionalities. It will be particularly useful for those dealing with textual data in tables supporting over 15 languages and the following formats:
- comma-separated values (CSV) or text-separated values (TSV)
- Text files
- Fixed-width columns
- JSON
- XML
- OpenDocument spreadsheet (ODS)
- Excel spreadsheet (XLS or XLSX)
- PC-Axis (PX)
- MARC
- RDF data (JSON-LD, N3, N-Triples, Turtle, RDF/XML)
- Wikitext
Running OpenRefine
To run OpenRefine on RCC Desktop follow these steps:
- Open a Desktop session on Open on Demand
- Start a Terminal in the newly loaded Desktop window
- Type on the Terminal the following commands
$ module load openrefine $ refine
Uploading files
Before creating your first project, your files have to be uploaded to your home directory. The easiest way to do this is to open the Files tab on the Open OnDemand interface and then click on the Upload button above your current files. If you need any of the other methods for uploading data please check: Data Transfer with SFTP, SCP, or RSYNC | FSU Research Computing Center
Basic Functions
Although OpenRefine has many uses, it is quite rudimentary in its approach which necessitates the user to get very comfortable with its new terminology. here we shall discuss some of the basic commands of OpenRefine. For additional information please consult the program-specific documentation found here: OpenRefine user manual | OpenRefine
Most of the time the data will open in a series of columns, no matter the format that is used as input. For each of these column field titles, you will see a drop menu arrow. The most commonly used command here is facets of which there are different types the facet will open up a menu on the left side bar. Here you can filter the items within that particular column by characters they contain, end with, start with, the numbers they have, or any combinations of these, all using the RegEx guidelines. In addition these drop-down menus let you sort and adjust what records are viewable.
To transform the data you can use the rest of the commands found in the columns drop-down menus ( Edit Cells, Edit Columns, Transpose, Reconcile) which all have their own variations and nuances. but with these columns, you will be able to update all of your files using just a bit of RegEx and the built-in filters.
Exporting Data
To save your data proceed as you would with any other Open OnDemand applications. To export your data you can either use the built-in export button on OpenRefine or download the said document from your home directory on the Open OnDemand mainpage.