Regular Expressions

Salman khalid
4 min readJul 10, 2018

These are very useful in extracting relevant information from specific text which could a data coming from an API or log files and documents. In this article I would like to discuss more practical details of regular expressions from basic level. I took all examples from this link: https://regexone.com/

Example 1: In this example we have to write a pattern which matches with all of the following strings:

1. abcdefg
2. abcde
3. abc
Pattern: abc

Explanation: Pattern should be a single string which should match with all of the above strings. So it will be “abc” because its present in all of the above given strings.

‘\d’ can be used in place of any digits from 0 to 9. The preceding slash differentiates it from the simple ‘d’.

Example 2: In this example, a ‘\d’ can be used to create a pattern which can be matched with the following strings

1. abc123xyz
2. define “123”
3. var g = 123;
Pattern: \d\d\d

Explanation: As I have mentioned earlier we can use ‘\d’ to detect a digit anywhere in a string. So in the above example I used three consecutive \d to match numbers. Alternatively I could also used simple 123 to match given strings.

WildCard (.) Metacharacter

It is represented as dot ‘.’. It can be used to represent any single character like a letter, digit or white space. In order to specifically mention dot, you need to escape the dot by using a slash \.

Example 3:

1. cat.
2. 896.
3. ?=+.
Pattern: ...\.

In the above example we have three letters in the first string, three numbers in the second and three special characters in the last string before dot. So three dots along with escaped sequence of dot will be used.

Square Brackets([abc]):

‘[qwe]’ notation is used when you have to match single characters present with in bracket notations. This current bracket notation will match either q,w or e. Lets look at other example for more detail.

Example 4:

1. fat
2. cat
3. rat
Pattern: [fcr]at

Like wise we can use ‘^’ notation to exclude some characters while matching with an input string.

Example 5:

1. Match fog
2. Match jog
3. Skip log
Pattern: [^l]og

In the above example we want to match the first two strings but skip the last one so in this case we will use ‘^’ notation to skip the first ‘l’ character.

Character Ranges (a-z):

With the help of ‘-’ we can define sequential ranges for characters which includes alphabets and numbers. It is a very useful tool when we have to match strings with sequential characters.

Example 6:

1. Ana
2. Bob
3. Cpc
Pattern: [^a-c][n-p][a-c]

Curly Braces a{5}:

We can also specify a number of characters using curly braces. Like if we want to match this string ‘aaaaa’ then we can either do it using \d\d\d\d\d but there is a better way of achieving this behaviour using curly braces. We can match this string using this pattern a{5}.

Example 7:

1. wazzzzzup
2. wazzzup
Pattern: wa{3,5}up

Asterisk and Plus (* and +)

These two characters are some how quite similar with little difference. Former represents 0 or more and latter represents 1 or more.

Example 8:

1. aaaabcc
2. aabbbbc
3. aacc
4. Skip this one a
Pattern: a{2,4}b*c+

Optional Characters (?)

Optional metacharacter is very useful when there is chance of certain character to occur or not. Like the pattern ab?c will match either the strings “abc” or “ac” because the b is considered optional.

Example 9:

1. 1 Record Found?
2. 2 Records Found?
3. 200 Records Found?
Pattern: \d+ files? found\?

White Spaces (\s)

When dealing with real world cases, we may encounter multiple types of white spaces. Most common of them are space (_), new line (\n), tab (\t) and carriage return (\r). These are cases can be handled with just (\s).

Example 10:

1. 1. abc
2. 2. abc
3. 3. abc
Pattern: \d\.\s+abc

Starting and ending (^ and $)

We can use hat(^) and dollar($) to define a pattern for both start and end of line. Previously, we have been matching a specific pattern in entire string but this way we can explicitly specify start and end of line.

Example 11:

1. Match Mission: successful
2. Skip Last Mission: unsuccessful
3. Skip Next Mission: successful upon capture of target
Pattern: ^Mission: successful$

Match groups ‘()’

Regular expressions allow us to not just match text but also to extract information for further processing. This is done by defining groups of characters and capturing them using the special parentheses ( and ) metacharacters.

Example 12:

1. file_record_transcript.pdf
2. file_07241999.pdf
Pattern: ^(file.+)\.pdf$

In the above example, we want to capture the file name without its extension so its extension has been defined outside parenthesis.

Nested Groups

They can used where we have to extract extra information from a string along with its primary info.

Like in the following example, if we have to also extract the year alongwith the entire date.

Example 13:

1. Jan 1987
2. May 1969
3. Aug 2011
Pattern: ^(\w+\s?(\d+))$

Example 14:

1. 1280x720
2. 1920x1600
3. 1024x768
Pattern: ((\d+)\x(\d+))

In the above example, we are extracting dimensions separately.

Conditional Operator (|):

This operator can be used where we need either this or that operation. Like in the following example we are selecting either ‘cats’ or ‘dogs’.

1. I love cats
2. I love dogs
Pattern: I love (cats|dogs)

--

--