How Can You Extract Several Values from a String?
How do you extract several values from a string with VBA?
Pulling values out of a string is something you probably already do all the time with Excel’s handy “Text to Columns” built-in.
As amazing as “Text to Columns” is, though, sometimes the info you’re trying to get is just nestled too deeply, in too complicated a manner, for “Text to Columns” to work. What then?
Let’s say you’re given some data that was spit out by someone in a different department’s arcane system, and it looks like this:
(L-8735:3[701.4][22/09/2015])(L-8844:5[458.32][30/10/2015])
(L-8744:5[935.2][22/09/2015])(L-8792:3[1402.8][29/10/2015])(L-8802:3[2104.2][29/10/2015])
(L-8753:4[14.9][28/09/2015])
You want to take ALL values and dates contained by square brackets and move them to their own cells, so you have something like this:
To pull this off we’re going to use Regular Expressions (regex or regexp for short) — but don’t worry, we’re going to break it down step-by-step and TOTALLY demystify the process.
Before we get to business, though, we need to first enable Regular Expressions in VBA.
From the VBA editor screen, select the “Tools” option, then click “References…”. In the next pop-up window, check the box next to “Microsoft VBScript Regular Expressions 5.5”, then click “OK” — that’s it! You now have access to the RegExp
object.
Now, let’s get on with the code:
Per usual, let’s walk down this script using our 4-step VBA process to guide the way:
Step 1 – Setup
Step 2 – Exploration
Step 3 – Execution
Step 4 – Cleanup
Lines 13-20 cover our Step 1 – Setup here — short and sweet. First, we set a reference to the worksheet (which is named “raw” in this case). Then, we set strPattern
, which is the string-matching pattern our RegExp
object will use, on the next line.
Let’s spend a little time on strPattern
— there’s a lot packed into a little space.
FIRST THINGS FIRST THOUGH — don’t think that I dug into my memory and just conjured this sucker up. I use Rubular to iteratively build matchers, and you should too. By seeing the match results in real time, you can quickly build up what you need and move on.
With that out of the way, let’s talk about what’s contained in that strPattern
variable:
- The outer
()
parenthesis means you want numbered groups rather than one big blob of matches - The
\[
(and the\]
later) finds square brackets. That slash is called an “escape”, which is important here because square brackets have a special “range” meaning in Regular Expressions - The
\S
tells yourRegExp
to match any non-whitespace character - The
*
tells yourRegExp
to collect zero or more of the previous\S
- The
?
tells yourRegExp
to be non-greedy, and break each group when the]
is found
Phew! Let’s move on.
Finally, we create the actually RegExp
object (called rgx
in this example) on lines 15-20. The most important step in the RegExp
object setup is on line 19, where we assign rgx.Pattern
to be the strPattern
we created above.
Lines 24-30 are our Step 2 – Exploration. We identify the last-occupied row, then create a Range
object that contains all the cells we want to investigate… Easy cheesy!
Lines 33-59 handle our Step 3 – Execution, and that phase kicks off with a For Each...Next
loop around the entire target Range
.
The first thing we do in our loop is establish the raw string we’d like to match against. From there, we leverage the RegExp.Test
method on our strRaw
, which returns True
if there are any matches and False
otherwise.
If RegExp.Test
returns True
, then we create an object (called objMatches
to hold all of our matches on line 42. Then, beginning on line 46, we loop through the objMatches
object, removing square brackets from each value, and eventually writing the output to the neighboring columns (on line 50). Whaddup!
The Step 4 – Cleanup is a cinch in this script, as we only need to let the user know our macro has finished — and that’s exactly what happens on line 58.
And with that, we can now extract an arbitrary number of square bracket-enclosed values from a string. Nice!