How to Count and Label Duplicate Values Without COUNTIF or Range.Find
How can I use VBA to calculate and label all the duplicates in a given range in a way that’s faster than using COUNTIF or Range.Find?
When you’re dealing with large data sets (say +250K rows), speed matters.
Let’s say you have 322K rows with IDs. There are about 2000 IDs that show up only ONCE in the data, and the rest come up more than once — we want to find and label all of the IDs that only occur once.
The high-level strategy for this uses two Scripting.Dictionary objects, which I’m going to shorten to just “Dictionary” for the rest of this tutorial:
- Create a Dictionary of distinct IDs (dicDistincts)
- Create a Dictionary of IDs that occur more than once by using the Dictionary from step #1 — these are our duplicates! (dicDuplicates)
- Loop through all the IDs, marking IDs that are found in the duplicates Dictionary (dicDuplicates) as “Duplicate” and all others as “Unique”
Before we dive in, though, you need to make sure you have the Microsoft Scripting Runtime added to this project!
This 13-second gif walks through the steps, but in case it stops working:
- Open the VBA Editor window
- Click “Tools” from the File menu
- Select “References” from within the Tools menu
- Scroll down until you find “Microsoft Scripting Runtime”
- Check the box next to the “Microsoft Scripting Runtime”
- Click OK
Jackpot — you now have access to the Dictionary object! Let’s return to our regularly-scheduled programming…
We’re labeling uniques and duplicates here, and the super-commented code below does exactly that:
Here’s a link to the code above so you can review it side-by-side with the walk through below. Right-click, “Open in new window”.
You KNOW we’re gonna use that 4-step VBA process, so let’s review quickly:
Step 1 – Setup
Step 2 – Exploration
Step 3 – Execution
Step 4 – Cleanup
All of the Step 1 – Setup takes place on lines 21 through 29.
First, we assign wksIDs on line 20 to make it easy to reference this Worksheet. On line 21, we quickly find the last-occupied row and store it in lngLastRow using LastOccupiedRowNum — one of the staples of the VBA Toolbelt. (You’re using the VBA Toolbelt, right? If not, get it here.)
Let’s take a minute to closely review lines 22 and 23, where the CompareMode property of each Dictionary is set to vbTextCompare.
Dictionary keys in VBA default to CASE-SENSITIVE, which is something we would like to avoid here. In order to ensure a Dictionary ignores the difference between upper and lower case letters, you must set the CompareMode property to vbTextCompare like we do on line 22 and 23.
Finally, on lines 26 and 30, we create Variant Arrays representing the IDs themselves (varIDs) as well as the status of the IDs (varStatus), which will be “Unique” or “Duplicate”.
Since we know varStatus will be exactly as long as varID (each row needs to be labeled “Unique” or “Duplicate”), we initialize it to simply match varIDs to start.
Phew! We’re done with our Step 1 – Setup, and that means it’s time to start exploring this bitch.
Step 2 – Exploration takes place between lines 34 to 58, which is a single For Each loop through the Variant Array containing the IDs (varIDs).
First, on line 37, we assign the ID to a handy short variable, strID, using CStr to convert it to a String then leveraging Trim to remove any leading or trailing spaces.
Line 40 is essentially a guard clause. We don’t care about blank IDs, so we skip them.
The magic really starts between lines 44-54!
We initially check the distinct Dictionary (dicDistincts) to see if the ID has already been observed by using the Dictionary.Exists(Key) method, which returns a handy True or False.
If this particular ID is NOT already in the dicDistincts Dictionary, we add it.
If this particular ID is already in the dicDistincts Dictionary, we know we have a duplicate! In that case, we attempt to assign it to the duplicate-containing Dictionary, dicDuplicates.
Once again, on line 52, we check to see if this value already exists in the Dictionary. If so, we do nothing except move on to the next loop.
Wahoo! We’re done with the Step 2 – Exploration and will be diving into our deceptively simple Execution step.
Step 3 – Execution takes advantage of the dicDuplicates, the Dictionary we assembled on line 53, that contains every ID that is repeated. By working through the full column of IDs and checking to see if that ID exists in dicDuplicates, we can easily label the “Duplicate” and “Unique” IDs!
Step 3 – Execution kicks off on line 69, where we start by setting the lngIdx variable to 1. (We’ll be using this variable as a counter as we walk down the Variant Array containing all IDs.)
Line 70 is where we start looping through all of the IDs contained in the Variant Array containing them, varIDs. At each step, we check to see if this ID exists in the duplicates Dictionary, dicDuplicates, by using the Dictionary.Exists method, which returns True or False depending on (you guessed it) whether or not the ID is there.
If the ID exists in dicDuplicates, we know we’re dealing with a dupe — and that means setting the value in varStatus, the Variant Array we created to label each ID, to “Duplicate (on line 72). If dicDuplicates does NOT contain the ID, then we know it’s unique and label it as such!
Now that varStatus has “Unique” or “Duplicate” for every ID, all that’s left to do is write those contents out to the Worksheet, which we handle on line 80. Jackpot — our Execution step is done!
Not much to mop up here, so Step 4 – Cleanup is simply a message box letting the end user know that the macro is finished.
(In this particular case, I included a little timer to make sure the performance was good — that’s why you see dblStart and Timer in the mix.)
Here’s an 8 minute YouTube video demo and walkthrough that compares the performance of this method as well as an alternative, then covers the finer points from above:
Identifying and labeling uniques and duplicates with ease? If not, let me know and I’ll help you get what you need! And if you’d like more step-by-step, no-bullshit VBA guides delivered direct to your inbox, join my email newsletter below.
Get the VBA Toolbelt!
Quit digging through old projects and forums like a chump! Download the VBA Toolbelt and start with the most common Excel tasks already done for you.