How to Combine Multiple Excel Workbooks into One Worksheet with VBA

combined

Combined Data is Better Data

Stop me if you’ve heard this one before:

I need to make a pivot table but the data is spread out in lots of different Excel files…

Damnit!

This situation blows, since your analysis depends on a pivot table and combining a bunch of workbooks together by hand is terrible.

Fortunately, VBA will make short work of this pain in the ass. Here goes:


Here’s a link to the code above so you can review it side-by-side with the walk through below. Right-click, “Open in new window”.

Let’s break this challenge down step-by-step using the 4-step VBA process as our guide:

Step 1 – Setup
Step 2 – Exploration
Step 3 – Execution
Step 4 – Cleanup

Our Step 1 – Setup is covered by lines 16-18 — short and sweet.

First, on line 16, we assign the folder name where the individual Excel files are stored. (You’ll want to change this to your folder, but in this example we are targeting C:\blog\example_data_in_here.)

Then, on lines 16-17, we create a new Workbook (where Dst is short for “destination”, i.e. output) to store the data from each individual file, then assign the first Worksheet in that Workbook as the Dst Worksheet.

Boom! One down, three to go.

Step 2 – Exploration begins on line 21, where we take advantage of the Dir function to loop through the directory we set up moments ago (StrDirContainingFiles) and identify every file that ends in “.xlsx”. (That’s what the asterisk character, “*”, is doing there at the end of the line.)

Lines 22 through 25 store each file name inside a Collection (named colFileNames), which will make it SUPER easy to iterate through each file a little bit later in the code.

but-couldnt-you

Umm why not do the whole thing in this Dir loop?

Yes, we COULD have conducted the bulk of the code inside this Dir loop, but I prefer using a Collection here because it reduces the number of nested loops in our subroutine. Each nested loop you add is another layer of complexity for you to mentally keep track of… fuck that. Programming is hard enough — avoid deeply-nested loops whenever you can.

The “use a Collection” strategy also makes it really easy to verify that the loop worked and pulled in the data we expected, which you can check for yourself by un-commenting lines 27-31.

Let’s keep it moving though 🙂

With those file names stored neatly in colFileNames, we begin looping through it on line 35.

On line 38, strFilePath is assigned to be the original source folder string (strDirContainingFiles, which is “C:\blog\example_data_in_here” in this example), a backslash (“\”), and the file name from colFileNames.

We’ll immediately take advantage of that full file path to the Excel file on line 41, where we open that Workbook and save a reference to it as wbkSrc (where “Src” is short for “Source”). Line 42 assigns the target worksheet, named “data” in this example, to wksSrc.

Exploration continues on lines 46-47, where we take advantage of the LastOccupiedRowNum and LastOccupiedColNum functions (which are defined at the very bottom as well as in the VBA Toolbelt, which you should be using) to easily identify the last-occupied row and last-occupied column on the source Worksheet.

This is critical here! By dynamically determining the last column and last row on each loop, we can be confident that we’re getting all the data from each Worksheet.

With the last-occupied row and last-occupied column numbers stored in lngSrcLastRow and lngSrcLastCol respectively, we store the full data range on lines 48-51, starting from the top-left corner and extending to the bottom-right.

This is another great checkpoint: we can verify that all the data has been correctly identified and stored in rngSrc by un-commenting out lines 53-55 and calling rngSrc.Select to highlight all the cells. Smooth!

Here’s where things get a bit more interesting…

On lines 60-62, we check to see if this is NOT the first iteration.

jackie-chan-why

Why are we checking to see if this is the first loop?

Why? Good question!

On the first loop, we want to include the headers, but each subsequent time we do NOT want to include the headers.

Each loop after the first, we adjust rngSrc to skip the first row like this:

Set rngSrc = rngSrc.Offset(1, 0).Resize(rngSrc.Rows.Count – 1)

Here’s the two-step process in slow-motion:

Nice!

This puts us at another good checkpoint, where we can verify that the header is no longer part of rngSrc by un-commenting out lines 64-68.

Phew — with that, Step 2 – Exploration is complete!

Let’s dive into Step 3 – Execution, which kicks off on line 73.

Here, we again check the iteration. If this is the first loop, our target cell is easy — it’s A1, since the Worksheet is empty. (That’s what we handle on lines 74 and 75.) On the other hand, if this is NOT the first loop, then we:

The actual copy / paste step happens on line 80, where we call rngSrc.Copy and pass in rngDst (which we just set on line 78) as the Destination.

almost-there

Almost done with this one, I promise!

Almost there you guys, stay with me!

The last challenge within Step 3 – Execution is to add a column identifying which data file a given row came from by writing the Worksheet name into a far-right column. Let’s get to it!

Lines 86-89 cover another first loop special case — if this is the first time through, then we need to make sure we add a header name! By taking advantage of LastOccupiedColNum, which, again, is implemented for you both below AND in the VBA Toolbelt, we know that lngDstLastCol + 1 gives us the column right next to the last-occupied column.

We name this column header “Source Filename” on line 88.

Since we know that each row of data from the last paste (which happened on line 80) came from one of the different Excel files, we can take advantage of the Range.Value property to quickly write the file name to each of those rows.

First, we need to identify the first row of data that was just pasted in.

We used this same exact row number back on line 78, so we essentially copy that logic and assign lngDstFirstFileRow to be lngDstLastRow + 1.

Next, we need to figure out the last row that will get this file name data, and on line 103 we do just that.

Guess which function rides in to our rescue? Yep — it’s our old standby LastOccupiedRowNum.

Now that we know the first row and the last row of the range of cells that will need to be populated with the file name, all that’s left to do is get the right column number!

And wouldn’t you know it, LastOccupiedColNum, on line 104, assigns lngDstLastCol that exact value 🙂

Now that we know the three critical components for a Range:

  1. The first row, which is stored in lngDstFirstFileRow
  2. The last row, which is stored in lngDstLastRow
  3. The column those rows need to be applied in, which is stored in lngDstLastCol

We can write the file name easy peasy!

On lines 107-108, we use the values from #1, #2, and #3 above to store the target Range.

Damn son! That brings us to a great checkpoint — by un-commenting lines 112-113, we can easily verify (using Range.Select) that the correct Range is defined.

Finally, the last Execution task occurs on line 117, where we actually do the file name writing. Calling wbkSrc.Name returns the file name (in the first case, it will be “AT_Apr_16.xlsx”), which is why we assign the rngFile.Value to it.

Jackpot — Step 3 – Execution is all done!

And of course, we wrap up with Step 4 – Cleanup.

Nice and short here too: on line 122, we close the source data Workbook (with SaveChanges set to False, since we do not want to modify those files at all), and on line 127 we throw a quick “Data combined!” message box to let the user know the job is done!

Want to see this code in action? Here’s an 14-minute video guide:

Combining many individual Excel files into a single file with VBA smoothly? If not, let me know and I’ll help you get what you need! And if you’d like more step-by-step, no-bullshit VBA guides delivered direct to your inbox, join my email newsletter below.

Get the VBA Toolbelt!

Quit digging through old projects and forums like a chump! Download the VBA Toolbelt and start with the most common Excel tasks already done for you.

No spam, ever. Unsubscribe at any time. Powered by ConvertKit