Regular expressions operate by moving character by character, from left to right, through a piece of text. When regex finds a character that matches the first piece of the expression, it looks to find a continuous sequence of matching characters.

Match text

abc

Match the string “abc”

Alternation

abc|def

Match the string “abc” OR “def”
Note that the | symbol matches the entire expression before or after itself.  Use Grouping (see below) to OR specific parts of the regex expression.  e.g. “Hello (Dave|David)!”

Character Sets

Square brackets [] are used to match one character from a series of possible characters

ab[cd]ef[gh ]ij

Match the string where any combination of the series characters are used, e.g.: “abcefgij” or “abdefgij” or “abdefhij” or “abdef ij”
“abcdefgij”, for example, will not match.

^ character negates the character set, so will allow any character OTHER than the characters used

ab[^cd]ef[gh]ij

“abzefgij” would match, “abdefgij” would not.

Wildcards

. = Match a single character

...

Will match any 3 characters, but there must be 3 characters.
(If you need o use a period as a character in a regex expression, you escape it with a forward slash “\.” )

...\.

“qur.” would match, “qurk” wouldn’t match

See also the “Quantifiers – 0 or More, 1 or More” section below for multicharacter wildcard options

Ranges

Use a hyphen to specify a range of character values that are allowed (remember we are still only matching to a single character)

[a-c] is the same as saying [abc]

[a-cA-D] is the same as saying [abcABCD]

[a-z]
[A-Z]
[a-zA-Z]
[0-9]
[a-zA-Z0-9]

Some typical ranges that are often used

Shorthand Character Classes

\w

“word character” class, represents regex range [A-Za-z0-9_]

\d

“digit character” class, represents regex range [0-9]

\s

“whitespace character”, represents regex range [ \t\r\n\f\v]
(Matches a single space, tab, carriage return, line break, form feed, or vertical tab)

\W

“non-word character” class,  represents regex range [^A-Za-z0-9_] (any character not included in the range represented by \w )

\D

“non-digit character” class, represents regex range [^0-9] (any character not included in the range represented by \d )

\S

“non-whitespace character”, represents regex range [^ \t\r\n\f\v] (any character that is not included in the range represented by \s )

An example: \d\s\w\w

Matches a digit character, followed by a whitespace character, followed by 2 word characters

Grouping

()

Groups a section of the regex expression.  E.g. Hello (Dave|David)!
will match for “Hello Dave!”, “Hello David!”, 

Quantifiers – Fixed

Specify the number of characters to match using the last expression

{}

Specify the quantity of the previous character definition to match
“\w{4}” would match exactly 4 word characters
“\w{2,5}” would match min 2 word characters and max 5 word characters.  The quantifier is greedy – it will match the greatest quantity of characters it possibly can when a min and max is specified. 

Quantifiers – Optional

Specify that the previopus character definition is optional

?

Some examples:

“abc?d” will match for “abd” and “abcd”
“Hello (cheeky )?boy”, will match for “Hello boy” and “Hello cheeky boy”
“Hello (loud |cheeky )?boy”, will match for “Hello boy”,  “Hello cheeky boy” and “Hello loud boy”

Quantifiers – 0 or More, 1 or More

*

Preceeding character doesn’t need to appear, or can appear an unlimited number of times

+

Preceeding character must appear 1 or more times

Examples

“abc*d” will match for “abd”, “abcd”, “abcccccd”, etc
“abc+d” will match for “abcd”, “abcccccd”, etc, but not for “abd”

Anchors (start and end markers)

^

Start of string marker – will not match if tested string starts before this point

$

End of strng marker – will not match if tested string continues past this point

Examples

“^bcd ef” will match “bcd ef”, “bcd efghi”, etc, but not match for “abcd ef”
“^bcd ef$” will match “bcd ef”, but not match for “abcd ef” or “bcd efghi”

Special character escaping

Characters used as special characters in regex that you need to escape if you want to specify that actual character in your regex:

\\
\.
\?
\*
\+
\{
\(
\)
\^
\$
\[
\|

Feel free to comment if you can add help to this page or point out issues and solutions you have found. I do not provide support on this site, if you need help with a problem head over to stack overflow.

Comments

Your email address will not be published. Required fields are marked *