Excel Function: REGEXEXTRACT

The REGEXEXTRACT function extracts one or more parts of a text based on a regular expression.

Usage:

=REGEXEXTRACT(text, regex)

or:

=REGEXEXTRACT(text, regex, return, case)

If you do not know what a regular expression is or how to write one, first start with the full tutorial that accompanies the REGEXTEST function before continuing.

Extract the first found word

The REGEXEXTRACT function here returns the first found word among the 2 possibilities ("fun" or "terrifying"):

=REGEXEXTRACT(A2,"fun|terrifying")

excel function regexextract word

Optional arguments

The third argument allows you to specify the expected return type:

For example, if you enter the value 1 in the last example, it will return a horizontal array with all found matches:

=REGEXEXTRACT(A2,"fun|terrifying",1)

excel function regexextract words

If you do not want to consider case sensitivity, enter the value 1 as the fourth argument (by default the value is 0, meaning case sensitivity is considered).

Extract a reference

The REGEXEXTRACT function here returns a part of the reference number corresponding to the expression "[A-Z]\d{2,}" (an uppercase letter followed by 2 or more digits):

=REGEXEXTRACT(A2,"[A-Z]\d{2,}")

excel function regexextract reference

Extract a number

The regex "\d+" will extract here all digits until encountering a non-digit character:

=REGEXEXTRACT(A2,"\d+")

excel function regexextract numbers

If you enter the value 1 as the third argument, you will get a horizontal array with the different extracted numbers:

=REGEXEXTRACT(A2,"\d+",1)

excel function regexextract numbers array

Extract at the end

To extract the digits at the end of the text, add a $ at the end of the regex:

=REGEXEXTRACT(A2,"\d*$")

excel function regexextract numbers end

Extract multiple parts

To extract different groups of digits, add parentheses around each part of the text to extract. In this case, we will for example extract only groups 2 and 3:

=REGEXEXTRACT(A2,"\d+-(\d+)-(\d+)",2)

excel function regexextract multiple strings

You can see here that the 2 groups to capture have been placed between () and that the value 2 was passed as the third argument of the function to consider only the capture groups.

Complex extraction

To extract a value in data that is not regular at all, first search for a constant (the goal here is to extract the price):

excel complex extraction regexextract

You can see that the price is always preceded by the word "price".

The following formula extracts the price:

=REGEXEXTRACT(A2,"price\W*(\d+(?:\.\d{1,2})?)",2,1)

To better understand:

excel function regexextract complex extraction

Another option to avoid worrying about capturing parentheses (and leaving the third argument to its default value) is to use an assertion:

=REGEXEXTRACT(A2,"(?<=price\W{0,3})\d+(\.\d{1,2})?",0,1)

excel function regexextract extraction assertion

Backreference

The following formula will find the first number that appears twice in a row in the list of numbers by using a backreference:

=REGEXEXTRACT(A2,"(\d+)\W+\1",2)

excel function regexextract backreference