Excel Function: REGEXEXTRACT
The REGEXEXTRACT function extracts one or more parts of a text based on a regular expression.
Usage:
=REGEXEXTRACT(text, regex)
or:
=REGEXEXTRACT(text, regex, return, case)
Extract the first found word
The REGEXEXTRACT function here returns the first found word among the 2 possibilities ("fun" or "terrifying"):
=REGEXEXTRACT(A2,"fun|terrifying")

Optional arguments
The third argument allows you to specify the expected return type:
- 0: Return the first found string
- 1: Return all found strings as an array
- 2: Return capture groups as an array (to use this option you must use parentheses in the regex to define the elements to capture)
For example, if you enter the value 1 in the last example, it will return a horizontal array with all found matches:
=REGEXEXTRACT(A2,"fun|terrifying",1)

If you do not want to consider case sensitivity, enter the value 1 as the fourth argument (by default the value is 0, meaning case sensitivity is considered).
Extract a reference
The REGEXEXTRACT function here returns a part of the reference number corresponding to the expression "[A-Z]\d{2,}" (an uppercase letter followed by 2 or more digits):
=REGEXEXTRACT(A2,"[A-Z]\d{2,}")

Extract a number
The regex "\d+" will extract here all digits until encountering a non-digit character:
=REGEXEXTRACT(A2,"\d+")

If you enter the value 1 as the third argument, you will get a horizontal array with the different extracted numbers:
=REGEXEXTRACT(A2,"\d+",1)

Extract at the end
To extract the digits at the end of the text, add a $ at the end of the regex:
=REGEXEXTRACT(A2,"\d*$")

Extract multiple parts
To extract different groups of digits, add parentheses around each part of the text to extract. In this case, we will for example extract only groups 2 and 3:
=REGEXEXTRACT(A2,"\d+-(\d+)-(\d+)",2)

You can see here that the 2 groups to capture have been placed between () and that the value 2 was passed as the third argument of the function to consider only the capture groups.
Complex extraction
To extract a value in data that is not regular at all, first search for a constant (the goal here is to extract the price):

You can see that the price is always preceded by the word "price".
The following formula extracts the price:
=REGEXEXTRACT(A2,"price\W*(\d+(?:\.\d{1,2})?)",2,1)
To better understand:
- price: Search for "price" (note that the fourth argument is 1, so case is ignored)
- \W*: 0, 1, or more non-alphanumeric characters
- (: Opening parenthesis to capture the price (+ third argument at 2)
- \d+: One or more digits (the first part of the price)
- (?:: The separator and the decimal are not always present, so you must group the separator and decimal in parentheses (and later specify that it can appear 0 or 1 times) but to avoid the function returning the value of these parentheses, "?:" was added to specify that it is a non-capturing group.
- \.\d{1,2}: A dot and 1 or 2 digits
- )?: Closing parenthesis of the decimal indicating that it can appear 0 or 1 time
- ): Closing parenthesis of the price

Another option to avoid worrying about capturing parentheses (and leaving the third argument to its default value) is to use an assertion:
=REGEXEXTRACT(A2,"(?<=price\W{0,3})\d+(\.\d{1,2})?",0,1)

Backreference
The following formula will find the first number that appears twice in a row in the list of numbers by using a backreference:
=REGEXEXTRACT(A2,"(\d+)\W+\1",2)
