How to Combine Multiple Excel Workbooks into One Worksheet with VBA
Stop me if you’ve heard this one before:
I need to make a pivot table but the data is spread out in lots of different Excel files…
Damnit!
This situation blows, since your analysis depends on a pivot table and combining a bunch of workbooks together by hand is terrible.
Fortunately, VBA will make short work of this pain in the ass. Here goes:
Let’s break this challenge down step-by-step using the 4-step VBA process as our guide:
Step 1 – Setup
Step 2 – Exploration
Step 3 – Execution
Step 4 – Cleanup
Our Step 1 – Setup is covered by lines 16-18 — short and sweet.
First, on line 16, we assign the folder name where the individual Excel files are stored. (You’ll want to change this to your folder, but in this example we are targeting C:\blog\example_data_in_here.)
Then, on lines 16-17, we create a new Workbook (where Dst is short for “destination”, i.e. output) to store the data from each individual file, then assign the first Worksheet in that Workbook as the Dst Worksheet.
Boom! One down, three to go.
Step 2 – Exploration begins on line 21, where we take advantage of the Dir function to loop through the directory we set up moments ago (StrDirContainingFiles) and identify every file that ends in “.xlsx”. (That’s what the asterisk character, “*”, is doing there at the end of the line.)
Lines 22 through 25 store each file name inside a Collection (named colFileNames), which will make it SUPER easy to iterate through each file a little bit later in the code.
Yes, we COULD have conducted the bulk of the code inside this Dir loop, but I prefer using a Collection here because it reduces the number of nested loops in our subroutine. Each nested loop you add is another layer of complexity for you to mentally keep track of… fuck that. Programming is hard enough — avoid deeply-nested loops whenever you can.
The “use a Collection” strategy also makes it really easy to verify that the loop worked and pulled in the data we expected, which you can check for yourself by un-commenting lines 27-31.
Let’s keep it moving though 🙂
With those file names stored neatly in colFileNames, we begin looping through it on line 35.
On line 38, strFilePath is assigned to be the original source folder string (strDirContainingFiles, which is “C:\blog\example_data_in_here” in this example), a backslash (“\”), and the file name from colFileNames.
We’ll immediately take advantage of that full file path to the Excel file on line 41, where we open that Workbook and save a reference to it as wbkSrc (where “Src” is short for “Source”). Line 42 assigns the target worksheet, named “data” in this example, to wksSrc.
Exploration continues on lines 46-47, where we take advantage of the LastOccupiedRowNum and LastOccupiedColNum functions (which are defined at the very bottom as well as in the VBA Toolbelt, which you should be using) to easily identify the last-occupied row and last-occupied column on the source Worksheet.
This is critical here! By dynamically determining the last column and last row on each loop, we can be confident that we’re getting all the data from each Worksheet.
With the last-occupied row and last-occupied column numbers stored in lngSrcLastRow and lngSrcLastCol respectively, we store the full data range on lines 48-51, starting from the top-left corner and extending to the bottom-right.
This is another great checkpoint: we can verify that all the data has been correctly identified and stored in rngSrc by un-commenting out lines 53-55 and calling rngSrc.Select to highlight all the cells. Smooth!
Here’s where things get a bit more interesting…
On lines 60-62, we check to see if this is NOT the first iteration.
Why? Good question!
On the first loop, we want to include the headers, but each subsequent time we do NOT want to include the headers.
Each loop after the first, we adjust rngSrc to skip the first row like this:
Set rngSrc = rngSrc.Offset(1, 0).Resize(rngSrc.Rows.Count – 1)
Here’s the two-step process in slow-motion:
- .Offset(1, 0): this shifts the whole range down one row, meaning the first row is no longer included, but also means we now have a blank row at the bottom of rngSrc
- .Resize(rngSrc.Rows.Count – 1): this adjusts the bottom row of rngSrc up one row by reducing the total count of rows that are included in the Range
Nice!
This puts us at another good checkpoint, where we can verify that the header is no longer part of rngSrc by un-commenting out lines 64-68.
Phew — with that, Step 2 – Exploration is complete!
Let’s dive into Step 3 – Execution, which kicks off on line 73.
Here, we again check the iteration. If this is the first loop, our target cell is easy — it’s A1, since the Worksheet is empty. (That’s what we handle on lines 74 and 75.) On the other hand, if this is NOT the first loop, then we:
- Again take advantage of LastOccupiedRowNum, which comes with the VBA Toolbelt and works on ANY Worksheet, to identify the last-occupied (on line 77)
- Set the target Range to be one cell down from the last-occupied row (on line 78)
The actual copy / paste step happens on line 80, where we call rngSrc.Copy and pass in rngDst (which we just set on line 78) as the Destination.
Almost there you guys, stay with me!
The last challenge within Step 3 – Execution is to add a column identifying which data file a given row came from by writing the Worksheet name into a far-right column. Let’s get to it!
Lines 86-89 cover another first loop special case — if this is the first time through, then we need to make sure we add a header name! By taking advantage of LastOccupiedColNum, which, again, is implemented for you both below AND in the VBA Toolbelt, we know that lngDstLastCol + 1 gives us the column right next to the last-occupied column.
We name this column header “Source Filename” on line 88.
Since we know that each row of data from the last paste (which happened on line 80) came from one of the different Excel files, we can take advantage of the Range.Value property to quickly write the file name to each of those rows.
First, we need to identify the first row of data that was just pasted in.
We used this same exact row number back on line 78, so we essentially copy that logic and assign lngDstFirstFileRow to be lngDstLastRow + 1.
Next, we need to figure out the last row that will get this file name data, and on line 103 we do just that.
Guess which function rides in to our rescue? Yep — it’s our old standby LastOccupiedRowNum.
Now that we know the first row and the last row of the range of cells that will need to be populated with the file name, all that’s left to do is get the right column number!
And wouldn’t you know it, LastOccupiedColNum, on line 104, assigns lngDstLastCol that exact value 🙂
Now that we know the three critical components for a Range:
- The first row, which is stored in lngDstFirstFileRow
- The last row, which is stored in lngDstLastRow
- The column those rows need to be applied in, which is stored in lngDstLastCol
We can write the file name easy peasy!
On lines 107-108, we use the values from #1, #2, and #3 above to store the target Range.
Damn son! That brings us to a great checkpoint — by un-commenting lines 112-113, we can easily verify (using Range.Select) that the correct Range is defined.
Finally, the last Execution task occurs on line 117, where we actually do the file name writing. Calling wbkSrc.Name returns the file name (in the first case, it will be “AT_Apr_16.xlsx”), which is why we assign the rngFile.Value to it.
Jackpot — Step 3 – Execution is all done!
And of course, we wrap up with Step 4 – Cleanup.
Nice and short here too: on line 122, we close the source data Workbook (with SaveChanges set to False, since we do not want to modify those files at all), and on line 127 we throw a quick “Data combined!” message box to let the user know the job is done!
Want to see this code in action? Here’s an 14-minute video guide:
Combining many individual Excel files into a single file with VBA smoothly? If not, let me know and I’ll help you get what you need! And if you’d like more step-by-step, no-bullshit VBA guides delivered direct to your inbox, join my email newsletter below.