On January 20, 2025, during President Donald Trump’s inauguration festivities at Washington D.C.’s Capital One Arena, Elon Musk made a hand gesture that ignited…
Tips, tricks and tools for data visualisation — keeping it clean!
This is the first article in a three-part series focusing on data visualization and some of the tools, tips and tricks to help you clean and prepare data for the creation of data visualization followed by examining some imperative tools to help create visually stimulating representations.
Long text-driven articles still have their place, and are extremely important when we start looking at SEO, but information and knowledge sharing has shifted towards the use of graphical data representation as a much more robust way to do this.
Why? Because visual content is far better received, engaged with and shared than basic text driven data. Due to these learnings data visualization has become increasingly important and markers and social media strategists need to understand and adapt to the best ways to communicate their data in ways that are visually stimulating and enticing.
Data visualization ranges from cleaning and combining complex datasets, to lavishly designed infographics, but in order to be effective one needs to not only understand the audience, but how to use data visualization techniques to best appeal to them.
The first step is to make sure your data is clean.
Before any sort of visual imagery can be created you need to ensure that the data you are working with is clean. This could include converting a messy Excel file into the right column order, reformatting numeric values and expanding or contracting acronyms. For example “UK” and “U.K.” and “United Kingdom” are all the same value represented in different ways. “Cleaning” data so the information is consistent helps the workflow when you get down to the design stage.
Let’s be honest no one really likes cleaning up, but luckily there are a number of tools available to help with this tiresome process. Like cleaning up in the traditional sense, data cleaning can be taxing and time-consuming — a bit of a grudge act to be honest — but it’s necessary in order to get the best results and representation at your data.
To keep your calm and speed up the process — before we even explore the actual data visualization tools it’s a good idea to give these cleaning programs a try to make sure your data is in tip-top condition before you begin.
Data Wrangler is a web-based tool which in a few steps converts your messy, unclean data into something you are able to use effectively in the visualization space. This was developed by the data vis team at Stanford who are pretty clued up when it comes to new techniques to extract data.
2. Open Refine
Similarly to Data Wrangler, Open Refine (a rebrand of Google Refine) helps rework your data into a usable format but instead of running as a web-based service this can be run locally on your machine. This is a good tool to use if you are concerned with security.
Both Data Wrangler and Open Refine help clean up messy data but they go deeper. They are intelligent in that they are able to detect when a data point might be wrong. These programs are able to pick up missing commas, full stops or unit symbols, which if used incorrectly could add or subtract three zeros from an amount. For example, 100,000 and 100K are the same value represented differently and these tools pick this up and flag them for your attention.
Consolidating acronyms is another great feature that these tools allow you to do painlessly. For example SAPS and S.A.P.S will be merged together as the tool will identify them as the same thing, which can be a massive time, and sanity, saver when trying to analyse results.
3. Tabula
A big issue with cleaning up data is extracting CSV data from PDF documents. Tabula has a simple interface where you simply select the area that you want to extract data in a PDF from and it returns a preview and a CSV file. This can be extremely helpful for corporates and companies who are required to release reports or white papers and generally do so in PDF formats — which makes the data collation and cleaning process quite laborious if these reports include the data you need.