Python regex to match a floating point number
Here is the expression to match a float upto two places of decimal.
((?<\.|\d)\d+ (?:\.\d{1,2})?)So let me explain it in steps:
'(?<!..)' is a negative look behind,i.e,at that position the preceding text should not match the regex enclosed in '(?<! )'.In this case it is '\.|\d' which stands for a literal dot or a digit.This is not part of the match,the regex that needs to match starts with '\d+'.
So what we are saying is match 1 or more digits but make sure those digits are not preceded by a literal dot or a digit. Well one can understand why digits shouldnot be preceded by a literal '.' (dot) ,for example we dont want .99 to match but why do we need the digit part. The reason is quite subtle but without it .99 will be matched by the regex.
This is because when regex engine will try to match .99 initially it will fail \d+ matches 99 but as it is preceded by negative lookbehind for a literal '.',the match cant succeed so the engines shifts to next character and makes \d+ match only the rightmost 9 in 99.
Also as the rightmost 9 is preceded by a 9 the negative look behind is also satisfied so \d+ will be end up matching just the 9.
This is a false positive as we definitely dont want to match that so we make the negative look behind include a digit to rule out this result
((?<\.|\d)\d+ (?:\.\d{1,2})?)So let me explain it in steps:
'(?<!..)' is a negative look behind,i.e,at that position the preceding text should not match the regex enclosed in '(?<! )'.In this case it is '\.|\d' which stands for a literal dot or a digit.This is not part of the match,the regex that needs to match starts with '\d+'.
So what we are saying is match 1 or more digits but make sure those digits are not preceded by a literal dot or a digit. Well one can understand why digits shouldnot be preceded by a literal '.' (dot) ,for example we dont want .99 to match but why do we need the digit part. The reason is quite subtle but without it .99 will be matched by the regex.
This is because when regex engine will try to match .99 initially it will fail \d+ matches 99 but as it is preceded by negative lookbehind for a literal '.',the match cant succeed so the engines shifts to next character and makes \d+ match only the rightmost 9 in 99.
Also as the rightmost 9 is preceded by a 9 the negative look behind is also satisfied so \d+ will be end up matching just the 9.
This is a false positive as we definitely dont want to match that so we make the negative look behind include a digit to rule out this result
0 Comments:
Post a Comment
<< Home