Regular expressions operate by moving character by character, from left to right, through a piece of text. When regex finds a character that matches the first piece of the expression, it looks to find a continuous sequence of matching characters.
Match text
abc
Match the string “abc”
Alternation
abc|def
Match the string “abc” OR “def”
Note that the | symbol matches the entire expression before or after itself. Use Grouping (see below) to OR specific parts of the regex expression. e.g. “Hello (Dave|David)!”
Character Sets
Square brackets [] are used to match one character from a series of possible characters
ab[cd]ef[gh ]ij
Match the string where any combination of the series characters are used, e.g.: “abcefgij” or “abdefgij” or “abdefhij” or “abdef ij”
“abcdefgij”, for example, will not match.
^ character negates the character set, so will allow any character OTHER than the characters used
ab[^cd]ef[gh]ij
“abzefgij” would match, “abdefgij” would not.
Wildcards
. = Match a single character
...
Will match any 3 characters, but there must be 3 characters.
(If you need o use a period as a character in a regex expression, you escape it with a forward slash “\.” )
...\.
“qur.” would match, “qurk” wouldn’t match
See also the “Quantifiers – 0 or More, 1 or More” section below for multicharacter wildcard options
Ranges
Use a hyphen to specify a range of character values that are allowed (remember we are still only matching to a single character)
[a-c] is the same as saying [abc]
[a-cA-D] is the same as saying [abcABCD]
[a-z]
[A-Z]
[a-zA-Z]
[0-9]
[a-zA-Z0-9]
Some typical ranges that are often used
Shorthand Character Classes
\w
“word character” class, represents regex range [A-Za-z0-9_]
\d
“digit character” class, represents regex range [0-9]
\s
“whitespace character”, represents regex range [ \t\r\n\f\v]
(Matches a single space, tab, carriage return, line break, form feed, or vertical tab)
\W
“non-word character” class, represents regex range [^A-Za-z0-9_] (any character not included in the range represented by \w )
\D
“non-digit character” class, represents regex range [^0-9] (any character not included in the range represented by \d )
\S
“non-whitespace character”, represents regex range [^ \t\r\n\f\v] (any character that is not included in the range represented by \s )
An example: \d\s\w\w
Matches a digit character, followed by a whitespace character, followed by 2 word characters
Grouping
()
Groups a section of the regex expression. E.g. Hello (Dave|David)!
will match for “Hello Dave!”, “Hello David!”,
Quantifiers – Fixed
Specify the number of characters to match using the last expression
{}
Specify the quantity of the previous character definition to match
“\w{4}” would match exactly 4 word characters
“\w{2,5}” would match min 2 word characters and max 5 word characters. The quantifier is greedy – it will match the greatest quantity of characters it possibly can when a min and max is specified.
Quantifiers – Optional
Specify that the previopus character definition is optional
?
Some examples:
“abc?d” will match for “abd” and “abcd”
“Hello (cheeky )?boy”, will match for “Hello boy” and “Hello cheeky boy”
“Hello (loud |cheeky )?boy”, will match for “Hello boy”, “Hello cheeky boy” and “Hello loud boy”
Quantifiers – 0 or More, 1 or More
*
Preceeding character doesn’t need to appear, or can appear an unlimited number of times
+
Preceeding character must appear 1 or more times
Examples
“abc*d” will match for “abd”, “abcd”, “abcccccd”, etc
“abc+d” will match for “abcd”, “abcccccd”, etc, but not for “abd”
Anchors (start and end markers)
^
Start of string marker – will not match if tested string starts before this point
$
End of strng marker – will not match if tested string continues past this point
Examples
“^bcd ef” will match “bcd ef”, “bcd efghi”, etc, but not match for “abcd ef”
“^bcd ef$” will match “bcd ef”, but not match for “abcd ef” or “bcd efghi”
Special character escaping
Characters used as special characters in regex that you need to escape if you want to specify that actual character in your regex:
\\
\.
\?
\*
\+
\{
\(
\)
\^
\$
\[
\|