Capturing Groups
Regular Expressions: Capturing Groups
What is a capturing group in Regex?
View Answer:
How is a capturing group represented in regex?
View Answer:
let re = /(abc)/;
In this case, abc
is the capturing group. When a match is found for the pattern within the parentheses, that match is "captured" for future use.
Here's an example that uses a captured group:
let str = 'abc123abc456';
let re = /(abc)\d+/g;
let match;
while ((match = re.exec(str)) !== null) {
console.log('found ' + match[1] + ' at position ' + match.index);
}
In this code, we're searching for instances of 'abc' followed by one or more digits. When a match is found, 'abc' (the capturing group) is printed to the console along with the index at which it was found.
It's important to note that the match array will contain one item for each capturing group, in addition to one item for the entire match. The first item (index 0) is always the entire match, and the subsequent items (index 1 and above) correspond to each capturing group in the order they appear in the regular expression.
Why are capturing groups useful?
View Answer:
What's a non-capturing group in regex?
View Answer:
Here's a JavaScript code example that demonstrates the usage of a non-capturing group in a regular expression:
const regex = /(?:ab)+c/;
const string = 'ababcabc';
const matches = string.match(regex);
console.log(matches); // Output: [ 'ababc' ]
In the above example, the regular expression /(?:ab)+c/
is used to match a sequence of "ab" repeated one or more times, followed by a "c". The non-capturing group (?:ab)
is used to group the "ab" pattern without capturing it as a separate group. The match result [ 'ababc' ]
indicates that the non-capturing group was matched.
What is a capturing group relative to a match method in regular expressions?
View Answer:
// Example: repeating word pattern
console.log('Gogogo now!'.match(/(go)+/gi)); // "Gogogo"
// Example: group of domains
let regexp = /(\w+\.)+\w+/g;
console.log('site.com my.site.com'.match(regexp)); // site.com,my.site.com
// Example: email
let regexp = /[-.\w]+@([\w-]+\.)+[\w-]+/g;
console.log('my@mail.com @ his@site.com.uk'.match(regexp));
// my@mail.com, his@site.com.uk
How does the regex engine memorize the matches in parentheses?
View Answer:
let str = '<h1>Hello, world!</h1>';
let tag = str.match(/<(.*?)>/);
console.log(tag[0]); // <h1>
console.log(tag[1]); // h1
How does a nested group of parentheses work in regular expressions?
View Answer:
let str = '<span class="my">';
let regexp = /<(([a-z]+)\s*([^>]*))>/;
let result = str.match(regexp);
console.log(result[0]); // <span class="my">
console.log(result[1]); // span class="my"
console.log(result[2]); // span
console.log(result[3]); // class="my"
What happens when an option group does not exist in a regex match?
View Answer:
let match = 'a'.match(/a(z)?(c)?/);
console.log(match.length); // 3
console.log(match[0]); // a (whole match)
console.log(match[1]); // undefined
console.log(match[2]); // undefined
/////////////////
let match = 'ac'.match(/a(z)?(c)?/);
console.log(match.length); // 3
console.log(match[0]); // ac (whole match)
console.log(match[1]); // undefined, because there's nothing for (z)?
console.log(match[2]); // c
What is the outcome on parentheses when we use regex.matchAll()?
View Answer:
// Using Array.from to create an new array
let results = '<h1> <h2>'.matchAll(/<(.*?)>/gi);
// results - is not an array, but an iterable object
console.log(results); // [object RegExp String Iterator]
console.log(results[0]); // undefined (*)
results = Array.from(results); // let's turn it into array <--
console.log(results[0]); // <h1>,h1 (1st tag)
console.log(results[1]); // <h2>,h2 (2nd tag)
// Using a LOOP to get our results - recommended
let results = '<h1> <h2>'.matchAll(/<(.*?)>/gi);
for (let result of results) {
console.log(result);
// first console.log: <h1>,h1
// second: <h2>,h2
}
// DESTRUCTURING:
let [tag1, tag2] = '<h1> <h2>'.matchAll(/<(.*?)>/gi);
// Full destructuring example:
let results = '<h1> <h2>'.matchAll(/<(.*?)>/gi);
let [tag1, tag2] = results;
console.log(tag1[0]); // <h1>
console.log(tag1[1]); // h1
console.log(tag1.index); // 0
console.log(tag1.input); // <h1> <h2>
What are named groups in capturing used for in regex?
View Answer:
Syntax: let dateRegexp = /(?‹year›[0-9]{4})-(?‹month›[0-9]{2})-(?‹day›[0-9]{2})/
// Basic Approach
let dateRegexp = /(?<year>[0-9]{4})-(?<month>[0-9]{2})-(?<day>[0-9]{2})/;
let str = "2019-04-30";
let groups = str.match(dateRegexp).groups;
console.log(groups.year); // 2019
console.log(groups.month); // 04
console.log(groups.day); // 30
// Complex Approach
let dateRegexp = /(?<year>[0-9]{4})-(?<month>[0-9]{2})-(?<day>[0-9]{2})/g;
let str = "2019-10-30 2020-01-01";
let results = str.matchAll(dateRegexp);
for(let result of results) {
let {year, month, day} = result.groups;
console.log(`${day}.${month}.${year}`);
// first console.log: 30.10.2019
// second: 01.01.2020
}
How do capturing groups work in a replacement string?
View Answer:
// Basic Example:
let str = 'John Bull';
let regexp = /(\w+) (\w+)/;
console.log(str.replace(regexp, '$2, $1')); // Bull, John
// More Complex example using capturing groups
let regexp = /(?<year>[0-9]{4})-(?<month>[0-9]{2})-(?<day>[0-9]{2})/g;
let str = '2019-10-30, 2020-01-01';
console.log(str.replace(regexp, '$<day>.$<month>.$<year>'));
// 30.10.2019, 01.01.2020
How do non-capturing groups work in regular expressions?
View Answer:
let str = 'Gogogo John!';
// ?: exludes 'go' from capturing
let regexp = /(?:go)+ (\w+)/i;
let result = str.match(regexp);
console.log(result[0]); // Gogogo John (full match)
console.log(result[1]); // John
console.log(result.length); // 2 (no more items in the array)
How do you create a non-capturing group?
View Answer:
let re = /(?:abc)\d+/;
In this case, (?:abc)
is a non-capturing group. This will match the characters 'abc' followed by one or more digits, but 'abc' will not be a separate item in the resulting matches.
If you use this in a exec
call, you will see that only the entire match is captured:
let str = 'abc123abc456';
let re = /(?:abc)\d+/g;
let match;
while ((match = re.exec(str)) !== null) {
console.log('found ' + match[0] + ' at position ' + match.index);
}
In this code, we're searching for instances of 'abc' followed by one or more digits. The non-capturing group (?:abc)
means that only the entire match is captured and printed to the console.
How do we refer back to a capturing group within the same regex pattern?
View Answer:
let re = /(\b\w+\b)\s+\1/;
In this pattern, (\b\w+\b)
is a capturing group that matches a word, and \1
is a backreference that refers to the contents of the first (and in this case, only) capturing group. The \s+
matches one or more space characters.
Here's an example of using this pattern:
let str = 'Hello hello, how are you?';
let re = /(\b\w+\b)\s+\1/gi; // the 'i' flag makes it case-insensitive
let match;
while ((match = re.exec(str)) !== null) {
console.log('found repeated word ' + match[1] + ' at position ' + match.index);
}
In this code, we're searching for repeated words in the string. The capturing group (\b\w+\b)
matches a word, and \1
refers to that word. If the same word is found again immediately after one or more space characters, it's a match. The matched word and the position of the match are then printed to the console.
Can we name capturing groups in regex?
View Answer:
Yes, JavaScript regular expressions support named capturing groups. You can give a capturing group a name by using the syntax (?<name>...)
.
Here's an example:
let re = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
In this case, (?<year>\d{4})
, (?<month>\d{2})
, and (?<day>\d{2})
are named capturing groups. They will match four digits for the year, two digits for the month, and two digits for the day, respectively.
You can then refer to these groups by name when examining the match results:
let str = 'Today is 2023-06-30';
let re = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
let match = re.exec(str);
if (match !== null) {
let {year, month, day} = match.groups;
console.log(`Year: ${year}, Month: ${month}, Day: ${day}`);
}
In this code, we're searching for a date in the format 'yyyy-mm-dd'. The named capturing groups (?<year>\d{4})
, (?<month>\d{2})
, and (?<day>\d{2})
match the year, month, and day, respectively. If a match is found, the year, month, and day are printed to the console.
Named capturing groups are a relatively recent addition to JavaScript and may not be supported in all environments. As of my knowledge cutoff in September 2021, they are supported in Node.js 10 and later, and in recent versions of most major browsers.
How can you refer to a named capturing group?
View Answer:
let re = /(?<word>\b\w+\b)\s+\k<word>/;
In this pattern, (?<word>\b\w+\b)
is a named capturing group that matches a word, and \k<word>
is a backreference that refers to the contents of the named capturing group. The \s+
matches one or more space characters.
Here's an example of using this pattern:
let str = 'Hello hello, how are you?';
let re = /(?<word>\b\w+\b)\s+\k<word>/gi; // the 'i' flag makes it case-insensitive
let match;
while ((match = re.exec(str)) !== null) {
console.log('found repeated word ' + match.groups.word + ' at position ' + match.index);
}
In this code, we're searching for repeated words in the string. The named capturing group (?<word>\b\w+\b)
matches a word, and \k<word>
refers to that word. If the same word is found again immediately after one or more space characters, it's a match. The matched word and the position of the match are then printed to the console.
To refer to named capturing groups in the replacement part of a replace
call, you can use the $<name>
syntax, like so:
let str = 'Hello hello, how are you?';
let re = /(?<word>\b\w+\b)\s+\k<word>/gi;
let newStr = str.replace(re, '$<word>');
console.log(newStr);
In this code, we're replacing each instance of a repeated word with a single instance of that word. The $<word>
syntax refers to the named capturing group.
What is the 'balancing group definition' in regex?
View Answer:
How do you use capturing groups in lookaheads and lookbehinds?
View Answer:
Here's an example of a capturing group inside a lookahead...
let re = /(\d+)(?=\sUSD)/;
let str = 'The price is 100 USD';
let match = re.exec(str);
if (match) {
console.log('Amount:', match[1]); // Amount: 100
}
In this case, the pattern matches one or more digits (\d+
) only if they're followed by a space and 'USD'. The lookahead assertion does not consume any characters, so the match for it is not included in the match result.
You can also use capturing groups in lookbehind assertions, written as (?<=...)
for positive lookbehind (asserts that what precedes matches the pattern inside the parentheses) or (?<!...)
for negative lookbehind (asserts that what precedes does not match the pattern).
Here's an example of a capturing group inside a lookbehind:
let re = /(?<=USD\s)(\d+)/;
let str = 'The price is USD 100';
let match = re.exec(str);
if (match) {
console.log('Amount:', match[1]); // Amount: 100
}
In this case, the pattern matches one or more digits only if they're preceded by 'USD' and a space. Again, the lookbehind assertion does not consume any characters, so the match for it is not included in the match result.
As of March 2021, lookbehind assertions are a relatively recent addition to JavaScript and are not supported in all environments. They are supported in Node.js 10 and later, and in recent versions of most major browsers.
Can we nest capturing groups?
View Answer:
Here's an example of a nested capturing group...
let re = /(a(b)c)/;
In this case, (a(b)c)
is a capturing group that contains another capturing group (b)
.
Here's how you might use this regular expression:
let str = 'abc';
let re = /(a(b)c)/;
let match = re.exec(str);
if (match !== null) {
console.log('Match 0: ' + match[0]); // Match 0: abc
console.log('Match 1: ' + match[1]); // Match 1: abc
console.log('Match 2: ' + match[2]); // Match 2: b
}
In this code, we're searching for the pattern (a(b)c)
. When a match is found, the match results are printed to the console.
Note that match[0]
contains the entire match, match[1]
contains the first (outer) capturing group, and match[2]
contains the second (inner) capturing group. The outer group includes the characters matched by the inner group, and the inner group includes only the characters it directly matches.
What are branch reset groups in regex?
View Answer:
What are the common use cases for capturing groups?
View Answer:
Can capturing groups impact regex performance?
View Answer:
How can you reference capturing groups in replacement text?
View Answer:
let str = 'Hello, world!';
let re = /(Hello), (world)/;
let newStr = str.replace(re, '$2, $1!');
console.log(newStr); // "world, Hello!"
In this example, we're swapping the words 'Hello' and 'world'. The regular expression (Hello), (world)
contains two capturing groups, and the replacement string '$2, $1!'
references those groups in reverse order.
If your regular expression includes named capturing groups, you can reference them by name in the replacement string using the $<name>
syntax:
let str = 'Hello, world!';
let re = /(?<greeting>Hello), (?<object>world)/;
let newStr = str.replace(re, '$<object>, $<greeting>!');
console.log(newStr); // "world, Hello!"
In this code, the regular expression (?<greeting>Hello), (?<object>world)
contains two named capturing groups. The replacement string '$<object>, $<greeting>!'
references those groups by name in reverse order.