Locating duplicate values in Excel can be more challenging than it seems. While there are built-in tools like Conditional Formatting and “Remove Duplicates,” they often fall short for more complex, real-world tasks.
The approach you take depends entirely on your data and your goal. To find the right solution, you first need to define what you mean by “duplicate.”
Key Questions to Ask when Finding Duplicates
What constitutes a duplicate? Is it repeating words, entire cells, or entire rows?
If comparing cells:
- Do we mark all occurrences as duplicates, or just the second and subsequent ones?
- Should we treat “Apple” and “Apple ” (with extra spaces) as duplicates?
Where are the duplicates? In a single column, between two columns, or across multiple columns?
Are we dealing with implicit duplicates? (e.g., different formatting or extra spaces).
Let’s start with some common scenarios.
Using Conditional Formatting to Highlight Duplicates
The Conditional Formatting tool is useful for visually highlighting duplicate cells within a selected range.
How to use it:
- Select the range of cells you want to check.
- Go to the Home tab, click Conditional Formatting > Highlight Cells Rules > Duplicate Values.
- Choose your preferred formatting style.
The Caveat:
This tool finds duplicates within the entire selected range. This is a critical point that’s often overlooked. If you select two columns together, a duplicate found only in the first column will be highlighted, even if it has no match in the second column.
This can be problematic if you plan to delete the highlighted cells, as you might accidentally remove both instances from the same column. A simple workaround is to remove duplicates from each column individually first, but there are better methods, which we’ll cover next.
Using Formulas for Greater Control
Using a custom formula to find duplicates offers much more flexibility. The only difficulty is constructing the formula, but I’ll walk you through it.
Finding Duplicates Within a Single Range
You can use an array formula to check if any duplicates exist in a range:
=SUMPRODUCT((COUNTIF(range, range)-1)>0)
You might wonder how this is better than Conditional Formatting. The advantage is that formulas are easy to modify and enhance.
For example, you can make the formula “space-insensitive” by wrapping the ranges in a TRIM function, allowing it to find duplicates that only differ by extra spaces.
=SUMPRODUCT(--(TRIM(cell)=TRIM(range)))>1
This version checks how many times a single (trimmed) cell value appears in the (trimmed) range. If you use this as a custom rule in Conditional Formatting, it can detect these implicit duplicates.
Pro Tip: When using this in Conditional Formatting, use absolute references for the range (e.g., $A$1:$B$4) but a relative reference for the target cell (e.g., A1). This allows the rule to check each cell individually against the entire range.
A Faster Solution: The SEM Tools Add-in
I used to spend a significant amount of time using the methods above. While they work, I found them to be too time-consuming for regular use.
This led me to create several dedicated tools for finding duplicates in my Excel add-in, SEMTools. Here’s a look at what they can do:
- Find Duplicate Cells (Except First): Highlights all the second, third, and subsequent duplicates in a column, leaving the first occurrence unmarked.
- Find All Duplicate Cells: Highlights every instance of a duplicate, including the first one, making it easy to filter and review all of them.
- Find Duplicates (Ignoring Spaces): Treats phrases that only differ in spaces as duplicates. This tool first normalizes the spaces and then performs the duplicate search. (A separate “Remove Extra Spaces” tool is also available for cleanup).