Greedy Regular Expression
A possessive quantifier is an advanced
feature of some regex flavors (PCRE, Java and the JGsoft engine) which tells
the engine not to backtrack once a match has been madeTo understand how this
works, we need to understand two concepts of regex engines: greediness and backtracking.
Ø
Greediness means that in general regexes will try to
consume as many characters as they can. Let's say our pattern is .* (the dot is a special
construct in regexes which means any character1; the star means match zero or
more times), and your target is aaaaaaaab. The entire string will be consumed, because the
entire string is the longest match that satisfies the pattern.
Ø
Let’s understand
“Backtracking” by changing regex
However,
let's say we change the pattern to .*b. Now, when the regex engine tries to match against aaaaaaaab, the .* will again consume the
entire string. However, since the engine will have reached the end of the
string and the pattern is not yet satisfied (the.* consumed everything but the pattern still has
to match b afterwards), it will backtrack, one character at a time, and try to
match b. The first backtrack will make the .* consume aaaaaaaa, and then b can consume b, and the
pattern succeeds.
Possessive quantifiers are also greedy,
but as mentioned, once they return a match, the engine can no longer backtrack
past that point. So if we change our pattern to .*+b (match any character zero or more times,
possessively, followed by a b), and try to match aaaaaaaab, again the .* will consume the whole string, but then since
it is possessive, backtracking information is discarded, and the b cannot be
matched so the pattern fails.Just short explanation to understand Possessive &
backtracking in regular expression. Hope
this would help you guys to understand regular expression in better way.
Happy Testing!!!
Comments
Post a Comment