Catastrophic Backtracking
Regular Expressions: Catastrophic Backtracking
What is the meaning of catastrophic backtracking in regular expressions?
View Answer:
// Example: Catastrophic Backtracking
let regexp = /^(\d+)*$/;
let str = '012345678901234567890123456789z';
// will take a very long time (careful!)
console.log(regexp.test(str));
What causes Catastrophic Backtracking in Regular Expressions?
View Answer:
How can you detect Catastrophic Backtracking in your code?
View Answer:
How do you prevent catastrophic backtracking in the regex engine?
View Answer:
const regex = /(\d+)+([a-z]+)+/i;
This will work in most cases, but because it has nested quantifiers, it might suffer from catastrophic backtracking if the string is long and doesn't match the pattern.
Here's how we can improve it:
1. Remove unnecessary quantifiers:
const regex = /\d+[a-z]+/i;
In this case, we don't need the +
after the groups, because \d+
already matches one or more digits, and [a-z]+
matches one or more letters.
2. Simulating atomic grouping:
Atomic groups are not supported in JavaScript, but can be simulated to some extent. If we know that once we have a match of numbers, there's no need to backtrack into it, we can use a positive lookahead to simulate an atomic group:
const regex = /(?=\d+)(\d+)(?=[a-z]+)([a-z]+)/i;
In this case, (?=\d+)
is a positive lookahead that asserts that what follows is one or more digits. Once this is satisfied, the engine won't backtrack into this group. The same applies to (?=[a-z]+)
, which asserts that what follows is one or more letters.
3. Simulating possessive quantifiers:
Possessive quantifiers are also not natively supported in JavaScript. However, they can be simulated using a positive lookahead. A possessive quantifier, once it matches something, won't give it back. This can be useful to prevent backtracking:
const regex = /(?=(\d+))\d+(?=(\d+))[a-z]+/i;
In this case, (?=(\d+))\d+
is simulating a possessive quantifier: it matches one or more digits and doesn't allow backtracking into this group. The same does not apply to the letter group as it is not preceded by a lookahead.
Keep in mind these are complex solutions for problems that might be easier solved by simplifying and optimizing your regex pattern to your specific needs, so they should only be used when necessary.
Note: Please note, JavaScript doesn't natively support atomic groups, so in practical cases, you can use other strategies like replacing *
with *?
to make it non-greedy, or use lookahead and lookbehind assertions.