Excel Function: REGEXEXTRACT

The REGEXEXTRACT function extracts one or more parts of a text based on a regular expression.

Usage:

=REGEXEXTRACT(text, regex)

or:

=REGEXEXTRACT(text, regex, return, case)

If you do not know what a regular expression is or how to write one, first start with the full tutorial that accompanies the REGEXTEST function before continuing.

Extract the first found word

The REGEXEXTRACT function here returns the first found word among the 2 possibilities ("fun" or "terrifying"):

=REGEXEXTRACT(A2,"fun|terrifying")

excel function regexextract word

Optional arguments

The third argument allows you to specify the expected return type:

0: Return the first found string
1: Return all found strings as an array
2: Return capture groups as an array (to use this option you must use parentheses in the regex to define the elements to capture)

For example, if you enter the value 1 in the last example, it will return a horizontal array with all found matches:

=REGEXEXTRACT(A2,"fun|terrifying",1)

excel function regexextract words

If you do not want to consider case sensitivity, enter the value 1 as the fourth argument (by default the value is 0, meaning case sensitivity is considered).

Extract a reference

The REGEXEXTRACT function here returns a part of the reference number corresponding to the expression "[A-Z]\d{2,}" (an uppercase letter followed by 2 or more digits):

=REGEXEXTRACT(A2,"[A-Z]\d{2,}")

excel function regexextract reference

Extract a number

The regex "\d+" will extract here all digits until encountering a non-digit character:

=REGEXEXTRACT(A2,"\d+")

excel function regexextract numbers

If you enter the value 1 as the third argument, you will get a horizontal array with the different extracted numbers:

=REGEXEXTRACT(A2,"\d+",1)

excel function regexextract numbers array

Extract at the end

To extract the digits at the end of the text, add a $ at the end of the regex:

=REGEXEXTRACT(A2,"\d*$")

excel function regexextract numbers end

Extract multiple parts

To extract different groups of digits, add parentheses around each part of the text to extract. In this case, we will for example extract only groups 2 and 3:

=REGEXEXTRACT(A2,"\d+-(\d+)-(\d+)",2)

excel function regexextract multiple strings

You can see here that the 2 groups to capture have been placed between () and that the value 2 was passed as the third argument of the function to consider only the capture groups.

Complex extraction

To extract a value in data that is not regular at all, first search for a constant (the goal here is to extract the price):

excel complex extraction regexextract

You can see that the price is always preceded by the word "price".

The following formula extracts the price:

=REGEXEXTRACT(A2,"price\W*(\d+(?:\.\d{1,2})?)",2,1)

To better understand:

price: Search for "price" (note that the fourth argument is 1, so case is ignored)
\W*: 0, 1, or more non-alphanumeric characters
(: Opening parenthesis to capture the price (+ third argument at 2)
\d+: One or more digits (the first part of the price)
(?:: The separator and the decimal are not always present, so you must group the separator and decimal in parentheses (and later specify that it can appear 0 or 1 times) but to avoid the function returning the value of these parentheses, "?:" was added to specify that it is a non-capturing group.
\.\d{1,2}: A dot and 1 or 2 digits
)?: Closing parenthesis of the decimal indicating that it can appear 0 or 1 time
): Closing parenthesis of the price

excel function regexextract complex extraction

Another option to avoid worrying about capturing parentheses (and leaving the third argument to its default value) is to use an assertion:

=REGEXEXTRACT(A2,"(?<=price\W{0,3})\d+(\.\d{1,2})?",0,1)

excel function regexextract extraction assertion

Backreference

The following formula will find the first number that appears twice in a row in the list of numbers by using a backreference:

=REGEXEXTRACT(A2,"(\d+)\W+\1",2)

excel function regexextract backreference