Excel Function: REGEXTEST (2/3)

Character Quantity

Quantifiers allow you to specify how many times a character can appear. To specify a quantity or a range of quantities, add {} after the concerned character.

For example:

{2}: the character must appear exactly 2 times
{3,}: the character must appear 3 times or more
{1,4}: the character must appear between 1 and 4 times

The regex "Composer{0,}" checks here if the text contains this word with none, one, or multiple "r":

=REGEXTEST(A2,"Composer{0,}")

excel function regextest character class quantity 2

There are also 3 handy shortcuts for quantifying a character:

*: the character must appear 0 times or more (equivalent to {0,})
+: the character must appear 1 time or more (equivalent to {1,})
?: the character must appear 0 or 1 time (equivalent to {0,1})

The regex "Composer*" is therefore identical to the previous one:

=REGEXTEST(A2,"Composer*")

excel function regextest character class quantifier 2

Testing a Structure

To test the structure of a text (for example a reference, a URL, a phone number, etc.), you will need to use several elements seen so far in a single expression.

For example, to validate the regex "^XLP-[0-9]{4}-" the text must:

^XLP-: start with "XLP-"
[0-9]{4}: be followed by 4 characters in the range 0 to 9
-: be followed by a "-"

=REGEXTEST(A2,"^XLP-[0-9]{4}-")

excel function regextest ref 2

Special Characters

Metacharacters are special characters that have a specific role (you already know ^ $ [ ] | ? * + { }).

To cancel the effect of a metacharacter and treat it as a simple character, precede it with a \ (except when the metacharacter is inside a character class).

Thanks to \, the regex " \?$" can check if the text ends with " ?":

=REGEXTEST(A2," \?$")

excel function regextest backslash 2

The metacharacters are \ ^ $ . [ ] | ( ) ? * + { }.

Also note that \t equals a tab, \r a carriage return, and \n a line break.

Character Class Shortcuts

As you now know, character classes define allowed or disallowed characters. There are some handy shortcuts to simplify this:

\d: a digit (equivalent to [0-9])
\D: anything but a digit (equivalent to [^0-9])
\w: an alphanumeric character or a "_" (equivalent to [a-zA-Z0-9_], also including Unicode letters, like accented characters)
\W: anything but an alphanumeric character or a "_" (equivalent to [^a-zA-Z0-9_], excluding Unicode letters)
\s: a whitespace character, that is, a space, a line break, a carriage return, or a tab [ \n\r\t]
\S: anything but a whitespace character (equivalent to [^ \n\r\t])
\b: a word boundary (start or end of a word), between a \w and a \W
.: any character except a line break (equivalent to [^\n])

The regex "^[A-Z]{2}.\w{4}$" checks if the text starts with 2 characters from A-Z, followed by any character (except line break), followed by (and ending with) 4 alphanumeric characters:

=REGEXTEST(A2,"^[A-Z]{2}.\w{4}$")

excel function regextest shortcuts class 2

Case-Insensitive Search

To search for the word "excel" without considering case sensitivity, add (?i) at the beginning of the regex "(?i)excel":

=REGEXTEST(A2,"(?i)excel")

excel function regextest case insensitive 2

Or, even simpler, you can directly enter the value 1 as the third argument of the REGEXTEST function:

=REGEXTEST(A2,"excel",1)

excel function regextest argument case insensitive 2

Email Address Validation

A regex to validate an email address (simplified version) could look like "^[a-z0-9_.-]+@[a-z0-9_.-]{2,}\.[a-z]{2,}$".

To validate this regex, the text must:

^[a-z0-9_.-]+: start with 1 or more lowercase letters, digits, "_" or "." or "-"
@: be followed by "@"
[a-z0-9_.-]{2,}: be followed by 2 or more lowercase letters, digits, "_" or "." or "-"
\.: be followed by a dot
[a-z]{2,}$: be followed (and end) with 2 or more characters from a-z

=REGEXTEST(A2,"^[a-z0-9_.-]+@[a-z0-9_.-]{2,}\.[a-z]{2,}$")

excel function regextest email 2

The shortcut \w is not used here instead of [a-zA-Z0-9_] because \w also includes accented characters, which is not desirable here.