What are Regular Expressions?

Regular Expressions or sometimes abbreviated as regex, are well … complicated at best. Just looking at them can make seasoned engineers want to run and hide. However, they are incredibly useful for use in building applications as well as a good tool to have for coding interviews and challenges. This post is going to break down the basics of Regular Expressions.

Regular Expressions are essentially ways to search through a string of text. It allows us to search through that text to do things like validation, get certain pieces, advanced find-and-replace, and more. But at its core, it it used to search through a string of text in an advanced way. This website is a great resource if you want to practice your own Regular Expressions.

Let’s get started with a simple example for searching for a word in a sentence, the top line is the regex, the bottom line is the sentence we’ll be testing it against:

/love/g"I love learning new things"

When building out a Regular Expression, it starts and ends with the two ‘/’ seen above, and everything between them is the Regular Expression. That part after the second slash, the ‘g’ in our example is what’s referred to as a flag. The ‘g’ flag stands for ‘global’, which matches anywhere in the string, global also allows multiple matches, so if we had a longer sentences, our Regular Expression would find every use of the word ‘love’, and yes it would be case sensitive. Removing the global flag would make the regex match only the first use of the string.

Another popular flag to use is ‘i’ which stands for ‘case insensitive’, so adding that in addition to the global flag would find all uses of the word ‘love’ regardless of upper or lower case.

Let’s update our example sentence to practice some more:

/e/g"I love learning new things in my spare time, even while sleeping"

just go with it…

You can probably already guess that this regex will search for the character ‘e’ and they are a few of them in this sentence. But if we wanted to match any cases of a double letter we can update the regex with the ‘+’:

/e+/g"I love learning new things in my spare time, even while sleeping"

This will match all ‘e’s in the string, but will also look for multiple ‘e’s that occur in a row, like the two ‘e’s in the word sl’ee’ping.

Let’s look for more character options:

/ea?/g"I love learning new things in my spare time, even while sleeping"

This will check for the character ‘e’ but if it is followed by an ‘a’ it will be found as well, like the ‘ea’ in l’ea’rning. The character right before the ‘?’ is treated as optional.

Some more helpful search items:

//matches any word character
//matches any white space

To validate length of a word, we implement the ‘{}’ in combination with matching words:

//matches any 4 digits in a row
//matches 4 or more digits in a row
//matches any set of 4 or 5 characters in a row

If we wanted to find other ways to validate characters that are in range of something, we can use ‘[]’:

//matches any lower case character from a-z
//matches any lower or upper case character from a-z ...which is all of them
//we can also validate numbers too!

If you wanted to match only the beginning of the entire text, use the carrot ‘^’ symbol:

/^I/g //matches the beginning of the text if it begins with an upper case 'I'

While the ‘^’ matches at the very beginning of the line, the ‘$’ symbol matches at the very end of a line.

Now if the creative juices are flowing, you may be thinking about all the possible ways you could implement a regex, or maybe you’re realizing that you’ve been on the receiving end of one. Let’s put all of this together to validate a phone number.

The first thing you might get caught up on is that there are multiple ways a user on an application might write their number:

1234567890 //if this is how you write out your number, we can't be friends123-456-7890123 456 7890(123) 456-7890 //this is how I personally do it ;)

If we’re building out an application, we need to find a way to accept all these different ways of writing out a phone number. We’ll use the ‘\d’ to help validate digits.

/\d{10}/g//this searches for 10 digits in a row, but will only work for the first example.

If the digits are separated by dashes or spaces, we need to check for groups of numbers. Reminder that those dashes or spaces will be optional.

/\d{3}-?\d{3}-?\d{4}/g//this matches for groups of three with an optional dash, followed by a group of 4. This will match the first two examples./\d{3}[ -]?\d{3}[ -]?\d{4}/g//notice adding the space in the group with the dash now matches the first three examples.

With the final example, we need to account for parentheses, spaces, and dashes, as well as the digits themselves.

/\(?\d{3}\)?[ -]?\d{3}[ -]?\d{4}/g//This will match all four examples!

Those slashes in front of each parenthesis are because ‘(‘ and ‘)’ are special characters and we need to add the ‘\’ to make sure we’re looking for that.

If you are still reading this and haven’t started throwing full bottles of wine at your computer screen. That is how regular expressions work. They are helpful tools to validate strings and can be used in real life on phone numbers and email addresses. Reminder that you can head here to practice and learn more!