F5.5G Leap-forward Development of Broadband in Africa The Africa Broadband Forum 2024 (BBAF 2024) was successfully held in Cape Town, South Africa recently, under…
The Excel formulas SEO pros love: Deduplicate [Part 2]
Crunching data efficiently in Excel can free up surprising amounts of time and energy.
Data manipulation in Excel can be very time-consuming for entrepreneurs who do their own SEO. It can be doubly so if they aren’t frequent users of Excel.
Professional SEOs use Excel a great deal and so learn the tricks. It takes time but for them it is worth looking through specialist guides to using Excel for SEO purposes. They also often downloading and learn how to use specialist Excel add-ins like SEO Tools For Excel.
This doesn’t really work for a time-pressed entrepreneur unless they are already very confident with Excel.
In a previous article I noted that there were two formulas that would result in the biggest time saving.
In that article we walked through using the first formula, Concatentate. In this article we will walk-through the second, Deduplicate.
DEDUPLICATE
Introduction
Looking for and removing duplicate data can be very time-consuming. No one tool guarantees that it has a complete database of SEO data. Therefore to create a master list often the best available option is to draw data from a number of sources and combine it. This combined data list will then initially contain multiple duplicates which need to be removed. Once they have been removed, the final list will then offer the most complete view available.
This is frequently used to create a master list of URLs to work with when planning a site migration, and when creating a master list of inbound links to a site. It can also be very useful when looking at lists of search terms used and lists of PPC keywords that you are bidding on.
The Basics
DEDUPLICATE is very simple to use. Let’s work through a very simple example to see how it works. The screen shots and steps below are from Excel 2010.
Below is a very simple set of data. As you can see there is one row that is duplicated in both columns, the news-tips page. Then each individual column contains multiple duplicates.
The first step is to click anywhere in your data.
Next select the “Data” tab in your ribbon, and then click “Remove Duplicates” in the Data Tools section.
This will bring up the “Remove Duplicates” dialogue box as below. In the background you will also see that all of your data is now highlighted.
First you need to check that Excel has correctly understood if your data has headers or not. Note it will not highlight your headers, only the underlying data.
Secondly you need to choose the columns on which you would like to deduplicate.
Scenario 1
In this instance we are asking Excel to look at both columns and remove those rows where the entries in both columns are duplicated.
Clicking OK you will see your data move as the duplicates are removed. A new dialogue box will appear to tell you how many duplicates have been removed. As you can see in this case only one row was duplicated. Although the URL in one case was a hyperlink and in the other not Excel still saw them as duplicates.
Scenario 2
In this instance we just want to find a list of unique URLs regardless of the title tags. Therefore we only want to look to deduplicate the first column, “Web Page”, and so uncheck the “Title Tag” column.
In this case Excel finds and removes the two duplicates in the “Web Page” column, regardless of their entries in the “Title Tag column. Note that it will only retain the first instance that it finds.
Using deduplicate is as easy as that and it can save a huge amount of time.
For people who are looking to learn more about deduplicate and other useful Excel functions like how to highlight duplicated cells rather than removing them I would recommend looking at this article.