语法
. any character except newline
\w \d \s word, digit, whitespace
\W \D \S not word, digit, whitespace
[abc] any of a, b, or c
[^abc] not a, b, or c
[a-g] character between a & g
Anchors
^abc$ start / end of the string
\b word boundary
Escaped characters
\. \* \\ escaped special characters
\t \n \r tab, linefeed, carriage return
\u00A9 unicode escaped ©
Groups & Lookaround
(abc) capture group
\1 backreference to group #1
(?:abc) non-capturing group
(?=abc) positive lookahead
(?!abc) negative lookahead
Quantifiers & Alternation
a* a+ a? 0 or more, 1 or more, 0 or 1
a{5} a{2,} exactly five, two or more
a{1,3} between one & three
a+? a{2,}? match as few as possible
ab|cd match ab or cd
一些函数
re.findall(pattern, str)
返回匹配的字符串列表,pattern中使用( )可以只返回( )中的部分re.match(pattern, string , flags)
从字符串的开始匹配,返回一个匹配对象,失败返回 None,常用于整句匹配(从头匹配)re.search(pattern, str)
返回第一个匹配对象,可以使用( ),res.group(n)
,res.span(n)
,res.start(n)
,res.end(n)
re.sub(pattern, repl, str)
替换,可以使用( ) + \1
反向引用re.compile(pattern, flags)
编译正则表达式,保存以便后用,避免重复编写,设置参数 flags 如下,可多选flags = re.M|re.S
re.I
忽略大小写re.M
多行模式,换行重新开始匹配re.S
.号可匹配一切字符包括换行符re.DOTALL
效果同re.S
适用于跨行匹配的情况
re.split(pattern, string, maxsplit, flags)
匹配的子串来分割字符串,返回 List,maxsplit 设置分隔次数(分隔)
re.sub 示例
s = "don ' t be , shy ."
s = re.sub(r'\s\'\s', "'", s)
s = re.sub(r'\s([\.\!\?\,])', r"\1", s)
print(s)
don't be, shy.
工具网站
https://rubular.com/r/hinmQKDuRO