Thus far in my young career, I’ve had to parse a lot of data. The data in question is usually a mish-mash of numeric and non-numeric data such as “N120 X3.243 Y0.2 Z0.1 A90 B0” (in this case, I’m talking about G-Code). For most tasks, I have to write code to extract either one or a few of the values, and optionally manipulate and change them to something else. For the most part, I stuck with tokenizing and anazlyzing the data using the tools that were available to me, given the language I was using at the time. When worse came to worse, I wrote my own routines to look for and extract what I needed. And then one day, I stumbled across regular expressions (aka: regex or reg ex). The power that they offer when faced with an ugly parsing task is phenomenal. I’m still not 100% comfortable with the syntax and it’s a tad more difficult when there are different regex engines, many with their own nuances and gotchas. As such, the regex implentation in the .NET Framework is slightly different than, JavaScript‘s implementation, which is slightly different than Python‘s and everybody wants to one-up Perl, but they’re all extremely similar on the whole. Up to this point, when I’ve written regex’s, they’ve been pretty simple and I’m usually either testing for an occurance or extracting a substring.
Yesterday, I finally tried a search and replace, regex-style. I had to go through a lot of phone numbers in a format like “Phone: (xxx)xxx-xxxx” or “Tel: (xxx)xxx-xxxx” and so on. I needed to convert them to something like “xxx-xxx-xxxx”. Replacing it was easy and here’s how I did it with VBS:
' Purpose: Format phone numbers like xxx-xxx-xxxx ' I: phone number combined with other text, formatted ' like (xxx)xxx-xxxx ' O: phone number formatted like xxx-xxx-xxxx Function FormatPhoneNumber(strPhoneString) Dim objRegEx ' WSH RegEx object Dim objMatches ' Matches object Dim strOldPhoneNum ' extracted phone# in old format Dim strNewPhoneNum ' phone# in new format ' Init the return value to an empty string strNewPhoneNum = vbNullString ' Bail on this record if the phone number is blank If (strPhoneString = vbNullString) Then FormatPhoneNumber = strNewPhoneNum Exit Sub End If ' Pull the phone number out from the format it's ' currently in (Ex: "Phone: (xxx)xxx-xxxx") and ' format it like this (Ex: "xxx-xxx-xxxx"). Set objRegEx = New RegExp objRegEx.Pattern = "\((\d{3})\)(\d{3})-(\d{4})$" objRegEx.IgnoreCase = True objRegEx.Global = True Set objMatches = objRegEx.Execute(strPhoneString) If (objMatches.Count <> 0) Then For Each Match in objMatches strOldPhoneNum = Match.Value Next strNewPhoneNum = objRegEx.Replace(strOldPhoneNum, _ "$1-$2-$3") End If ' Cleanup Set objRegEx = Nothing Set objMatches = Nothing ' Return the formatted phone number FormatPhoneNumber = strNewPhoneNum End Function |
Obviously, I had to group the digits of the original phone number into three groups so that I could reuse it in the Replace() method. Handy indeed 🙂