字符串概述

Python 中处理文本数据使用 str 类型对象，也称为字符串。

type('abc')

str

字符串是由 Unicode 码位构成的不可变序列。每个字符的 Unicode 码位可由内建函数 ord() 查看：

ord('a'), ord('b'), ord('c')

(97, 98, 99)

内建函数 str() 可将其他类型对象转换为字符串：

str(3.14)

'3.14'

字符串有多种不同写法：

单引号标示

单引号标示法，如果字符串中有单引号，需要用 \' 表示：

'it\'s a book'

"it's a book"

双引号标示

双引号标示法，如果字符串中有双引号，需要用 \" 表示：

"it's a \"book\""

'it\'s a "book"'

三重引号标示

三重引号标示法，可以是三重单引号 '''，也可以是三重双引号 """，字符串中的单引号或双引号不受影响，但不能紧挨着：

# 注意最后四个双引号中有一个空格
'''it's a book''', \
"""it's a "book" """

("it's a book", 'it\'s a "book" ')

三重引号标示的字符串可以换行，自动以 \n 表示：

'''
it's a 
book
'''

"\nit's a \nbook\n"

若不想引入 \n，可以使用续航符 \：

'''\
it's a \
book\
'''

"it's a book"

字符串可带前缀 u（或 U），r（或 R），f（或 F）。

u 表示 Unicode 编码字符串（默认，可省略）；r 表示原始字符串，其中的反斜杠会被当作其本身来处理；f 表示格式化字符串字面值。r 和 f 可连用。

'a\nb{1+1}'

'a\nb{1+1}'

r'a\nb{1+1}'

'a\\nb{1+1}'

f'a\nb{1+1}'

'a\nb2'

rf'a\nb{1+1}'

'a\\nb2'

注意带前缀 b（或 B），表示字节串对象：

type(b'abc')

bytes

字符串是不可变序列，下列切片操作得到的是原来的字符串：

a = 'abc'
b = a[:]
a is b

True

jupyter附件

字符串拼接

空格拼接

相邻的两个字符串，无论中间隔了多少个空格（包括 0 个），都会自动连接到一起：

'Py''thon', 'Py'    'thon'

('Python', 'Python')

甚至可以用 () 包围实现换行拼接，这在字符串（或正则表达式）很长时非常有用：

('Py'
f'thon{3}'
r'\Go')

'Python3\\Go'

运算符 `+` 拼接

运算符 + 拼接字符串，和空格类似，但 + 拼接字符串可以是变量的形式：

a = 'Py'
b = 'thon'
'Py' + b, a + b

('Python', 'Python')

运算符 + 还可以和赋值运算符 = 连用，拼接字符串的同时进行赋值：

# 将 a 和 b 拼接，赋值给 a
a += b
print(a)
# 将 b 和 a 拼接，赋值给 b
b += a
b

Python


'thonPython'

转义字符

转义字符 \ 本身不被当作字符，如果要表示一个字符 \，需要自己将自己转义：

'\\'

'\\'

'\'

  File "<ipython-input-2-d44a383620ab>", line 1
    '\'
       ^
SyntaxError: EOL while scanning string literal

上面这一行报错信息是 SyntaxError: EOL while scanning string literal。这是因为 \' 表示的是单引号字符 '（Literal）—— 是可被输出到屏幕的 '，而不是用来标示字符串的那个 '—— 别急，无论哪个初学者第一次读到前面的句子都觉得有点莫名其妙…… —— 于是，Python 编译器扫描这个 “字符串” 的时候，还没找到标示字符串末尾的另外一个 ' 的时候就读到了 EOL（End Of Line）。

如果你想输出这么个字符串，He said, it’s fine.，如果用双引号扩起来 " 倒没啥问题，但是如果用单引号扩起来就麻烦了，因为编译器会把 it 后面的那个单引号 ' 当作字符串结尾。

'He said, it's fine.'

  File "<ipython-input-3-2bcf2ca6dd95>", line 1
    'He said, it's fine.'
                 ^
SyntaxError: invalid syntax

于是你就得用转义符 \：

# 要么你这么写：
print('He said, it\'s fine.')
# 要么你这么写：
print("He said, it's fine.")
# 要么，不管用单引号还是双引号标示字符串，
# 都习惯于用 \' 和 \" 书写属于字符串内部的引号……
"He said, it\'s fine."

He said, it's fine.
He said, it's fine.


"He said, it's fine."

转义字符 \ 可与其他字符组合成有特殊含义的字符：

转义字符	说明
`\(在行尾时)`	续行符
`\\`	反斜杠符号
`\'`	单引号
`\"`	双引号
`\a`	响铃
`\b`	退格
`\n`	换行
`\v`	纵向制表符
`\t`	横向制表符
`\r`	回车
`\f`	换页
`\yy`	八进制数 yy 码位的字符
`\xyy`	十六进制数 yy 码位的字符

续航符，可以将两行代码（或字符串）连接起来，表示一行：

for i in \
range(3): # 两行相当于 for i in range(10):
    print(i)

0
1
2

'hello \
world'

'hello world'

八进制和十六进制字符举例：

# 八进制字符
'\101', '\102'

('A', 'B')

# 十六进制字符
'\x41', '\x42'

('A', 'B')

# 十进制
chr(65),chr(66)

('A', 'B')

在正则表达式中，转义字符 \ 的应用更加普遍。详情请看《正则指引》。

str.count 统计

字符串方法 str.count()，Python 官方文档描述如下：

help(str.count)

Help on method_descriptor:

count(...)
    S.count(sub[, start[, end]]) -> int
    
    Return the number of non-overlapping occurrences of substring sub in
    string S[start:end].  Optional arguments start and end are
    interpreted as in slice notation.

返回回子字符串 sub 在 [start, end] 范围内非重叠出现的次数。可选参数 start 与 end 会被解读为切片表示法。

只给定 sub 一个参数的话，于是从第一个字符开始搜索到字符串结束；如果，随后给定了一个可选参数的话，那么它是 start，于是从 start 开始，搜索到字符串结束；如果 start 之后还有参数的话，那么它是 end；于是从 start 开始，搜索到 end - 1 结束（即不包含索引值为 end 的那个字符）。

'python'.count('0')

'pyyython'.count('yy')

'pythonpythonn'.count('n',5)

'pythonpythonn'.count('n',5,7)

str.replace 替换

字符串方法 str.replace()，Python 官方文档描述如下：

help(str.replace)

Help on method_descriptor:

replace(self, old, new, count=-1, /)
    Return a copy with all occurrences of substring old replaced by new.
    
      count
        Maximum number of occurrences to replace.
        -1 (the default value) means replace all occurrences.
    
    If the optional argument count is given, only the first count occurrences are
    replaced.

返回字符串的副本，其中出现的所有子字符串 old 都将被替换为 new。如果给出了可选参数 count，则只替换前 count 次出现。

'python python'.replace('p','C')

'Cython Cython'

'python python'.replace('py','Cpy',1)

'Cpython python'

如果 old 为空字符串，则在每个字符之间（包括前后）插入 new：

'python python'.replace('','C')

'CpCyCtChCoCnC CpCyCtChCoCnC'

如果 new 为空字符串，则相当于去除了 old：

'python python'.replace('p','')

'ython ython'

str.expandtabs 替换制表符

字符串方法 str.expandtabs()，Python 官方文档描述如下：

help(str.expandtabs)

Help on method_descriptor:

expandtabs(self, /, tabsize=8)
    Return a copy where all tab characters are expanded using spaces.
    
    If tabsize is not given, a tab size of 8 characters is assumed.

返回字符串的副本，其中所有的制表符会由一个或多个空格替换，具体取决于当前列位置和给定的制表符宽度。每 tabsize 个字符设为一个制表位（默认值 8 时设定的制表位在列 0, 8, 16 依次类推）。

要展开字符串，当前列将被设为零并逐一检查字符串中的每个字符。如果字符为制表符 (\t)，则会在结果中插入一个或多个空格符，直到当前列等于下一个制表位。（制表符本身不会被复制。）

如果字符为换行符 (\n) 或回车符 (\r)，它会被复制并将当前列重设为零。任何其他字符会被不加修改地复制并将当前列加一，不论该字符在被打印时会如何显示。

'01\t012\t0123\t01234'.expandtabs()

'01      012     0123    01234'

'01\t012\t0123\t01234'.expandtabs(4)

'01  012 0123    01234'

'\n\t01\r2\t0123\t01234'.expandtabs(4)

'\n    01\r2   0123    01234'

str.split 拆分

字符串方法 str.split()，Python 官方文档描述如下：

help(str.split)

Help on method_descriptor:

split(self, /, sep=None, maxsplit=-1)
    Return a list of the words in the string, using sep as the delimiter string.
    
    sep
      The delimiter according which to split the string.
      None (the default value) means split according to any whitespace,
      and discard empty strings from the result.
    maxsplit
      Maximum number of splits to do.
      -1 (the default value) means no limit.

返回一个由字符串内单词组成的列表，使用 sep 作为分隔字符串。如果给出了 sep，则连续的分隔符不会被组合在一起而是被视为分隔空字符串；如果给出了 maxsplit，则最多进行 maxsplit 次拆分（因此，列表最多会有 maxsplit+1 个元素）。如果 maxsplit 未指定或为 -1，则不限制拆分次数（进行所有可能的拆分）。

'a.bc'.split('.')

['a', 'bc']

'a.b.c'.split('.',maxsplit=1)

['a', 'b.c']

'a.b..c'.split('.')

['a', 'b', '', 'c']

'a.b..c'.split('..')

['a.b', 'c']

''.split('.')

['']

如果 sep 未指定或为 None，则会应用另一种拆分算法：连续的空格会被视为单个分隔符，如果字符串包含前缀或后缀空格，其结果将不包含开头或末尾的空字符串。因此，使用 None 拆分空字符串或仅包含空格的字符串将返回 []。

'a b  c '.split()

['a', 'b', 'c']

'  \n '.split()

[]

''.split()

[]

str.rsplit 拆分

字符串方法 str.rsplit()，Python 官方文档描述如下：

help(str.rsplit)

Help on method_descriptor:

rsplit(self, /, sep=None, maxsplit=-1)
    Return a list of the words in the string, using sep as the delimiter string.
    
      sep
        The delimiter according which to split the string.
        None (the default value) means split according to any whitespace,
        and discard empty strings from the result.
      maxsplit
        Maximum number of splits to do.
        -1 (the default value) means no limit.
    
    Splits are done starting at the end of the string and working to the front.

返回一个由字符串内单词组成的列表，使用 sep 作为分隔字符串。如果给出了 maxsplit，则最多进行 maxsplit 次拆分，从最右边开始。如果 sep 未指定或为 None，任何空白字符串都会被作为分隔符。除了从右边开始拆分，rsplit() 的其他行为都类似于 split()。

'p\nyth   on '.rsplit()

['p', 'yth', 'on']

'p\nyth on'.rsplit('y')

['p\n', 'th on']

'p\nyth on'.rsplit(maxsplit=1)

['p\nyth', 'on']

'p\nyth   on '.rsplit(maxsplit=2)

['p', 'yth', 'on']

多个分隔符在一起，则会解读为拆分空字符串：

'\n\np\nyth on'.rsplit('\n')

['', '', 'p', 'yth on']

str.partition 拆分

字符串方法 str.partition()，Python 官方文档描述如下：

help(str.partition)

Help on method_descriptor:

partition(self, sep, /)
    Partition the string into three parts using the given separator.
    
    This will search for the separator in the string.  If the separator is found,
    returns a 3-tuple containing the part before the separator, the separator
    itself, and the part after it.
    
    If the separator is not found, returns a 3-tuple containing the original string
    and two empty strings.

在 sep 首次出现的位置拆分字符串，返回一个 3 元组，其中包含分隔符之前的部分、分隔符本身，以及分隔符之后的部分。如果分隔符未找到，则返回的 3 元组中包含字符本身以及两个空字符串。

'python'.partition('m')

('python', '', '')

'python'.partition('th')

('py', 'th', 'on')

'python'.partition('ht')

('python', '', '')

str.rpartition 拆分

字符串方法 str.rpartition()，Python 官方文档描述如下：

help(str.rpartition)

Help on method_descriptor:

rpartition(self, sep, /)
    Partition the string into three parts using the given separator.
    
    This will search for the separator in the string, starting at the end. If
    the separator is found, returns a 3-tuple containing the part before the
    separator, the separator itself, and the part after it.
    
    If the separator is not found, returns a 3-tuple containing two empty strings
    and the original string.

在 sep 最后一次出现的位置拆分字符串，返回一个 3 元组，其中包含分隔符之前的部分、分隔符本身，以及分隔符之后的部分。如果分隔符未找到，则返回的 3 元组中包含两个空字符串以及字符串本身。

'python'.rpartition('n')

('pytho', 'n', '')

'pythhhon'.rpartition('hh')

('pyth', 'hh', 'on')

分隔符未找到，则返回的 3 元组中字符串本身排在最后：

'python'.rpartition('ht')

('', '', 'python')

str.splitlines 按行拆分

字符串方法 str.splitlines()，Python 官方文档描述如下：

help(str.splitlines)

Help on method_descriptor:

splitlines(self, /, keepends=False)
    Return a list of the lines in the string, breaking at line boundaries.
    
    Line breaks are not included in the resulting list unless keepends is given and
    true.

返回由原字符串中各行组成的列表，在行边界的位置拆分。结果列表中不包含行边界，除非给出了 keepends 且为真值。

此方法会以下列行边界进行拆分。特别地，行边界是 universal newlines 的一个超集。

\n 换行
\r 回车
\r\n 回车 + 换行
\v 或 \x0b 行制表符
\f 或 \x0c 换表单
\x1c 文件分隔符
\x1d 组分隔符
\x1e 记录分隔符
\x85 下一行
\u2028 行分隔符
\u2029 段分隔符

'ab c\n\nde fg\rkl\r\n'.splitlines()

['ab c', '', 'de fg', 'kl']

'ab c\n\nde fg\rkl\r\n'.splitlines(keepends=True)

['ab c\n', '\n', 'de fg\r', 'kl\r\n']

分隔空字符串此方法将返回一个空列表；末尾的换行不会令结果中增加额外的空字符串:

''.splitlines()

[]

"One line\nTwo lines\n".splitlines()

['One line', 'Two lines']

'One line\nTwo lines\n'.split('\n')

['One line', 'Two lines', '']

str.strip 移除两边字符

字符串方法 str.strip()，Python 官方文档描述如下：

help(str.strip)

Help on method_descriptor:

strip(self, chars=None, /)
    Return a copy of the string with leading and trailing whitespace remove.
    
    If chars is given and not None, remove characters in chars instead.

返回原字符串的副本，移除其中的前导和末尾字符。chars 参数为指定要移除字符的字符串。如果省略或为 None，则 chars 参数默认移除空格符。实际上 chars 参数并非指定单个前缀或后缀；而是会移除参数值中的所有字符:

' python '.strip()

'python'

' python '.strip('p')

' python '

' python '.strip('p n')

'ytho'

' pythonnnn '.strip('p n')

'ytho'

str.lstrip 移除左边字符

字符串方法 str.lstrip()，Python 官方文档描述如下：

help(str.lstrip)

Help on method_descriptor:

lstrip(self, chars=None, /)
    Return a copy of the string with leading whitespace removed.
    
    If chars is given and not None, remove characters in chars instead.

返回原字符串的副本，移除其中的前导字符。chars 参数为指定要移除字符的字符串。如果省略或为 None，则 chars 参数默认移除空格符。实际上 chars 参数并非指定单个前缀；而是会移除参数值中出现的所有字符:

'  python  '.lstrip()

'python  '

'  python  '.lstrip('y p')

'thon  '

'  python  '.lstrip('py')

'  python  '

'ppppython  '.lstrip('y p')

'thon  '

str.rstrip 移除右边字符

字符串方法 str.rstrip()，Python 官方文档描述如下：

help(str.rstrip)

Help on method_descriptor:

rstrip(self, chars=None, /)
    Return a copy of the string with trailing whitespace removed.
    
    If chars is given and not None, remove characters in chars instead.

返回原字符串的副本，移除其中的末尾字符。chars 参数为指定要移除字符的字符串。如果省略或为 None，则 chars 参数默认移除空格符。实际上 chars 参数并非指定单个后缀；而是会移除参数值中的所有字符串:

' python '.rstrip()

' python'

' python '.rstrip('n o')

' pyth'

' python '.rstrip('n')

' python '

' pythonnnnn'.rstrip('no')

' pyth'

str.find 查找最小索引

字符串方法 str.find()，Python 官方文档描述如下：

help(str.find)

Help on method_descriptor:

find(...)
    S.find(sub[, start[, end]]) -> int
    
    Return the lowest index in S where substring sub is found,
    such that sub is contained within S[start:end].  Optional
    arguments start and end are interpreted as in slice notation.
    
    Return -1 on failure.

返回子字符串 sub 在 s[start:end] 切片内被找到的最小索引。可选参数 start 与 end 会被解读为切片表示法。如果 sub 未被找到则返回 -1。

'pythonpython'.find('y')

'pythonpython'.find('pt')

-1

'pythonpython'.find('y',5)

'pythonpython'.find('y',5,7)

-1

str.rfind 查找最大索引

字符串方法 str.rfind()，Python 官方文档描述如下：

help(str.rfind)

Help on method_descriptor:

rfind(...)
    S.rfind(sub[, start[, end]]) -> int
    
    Return the highest index in S where substring sub is found,
    such that sub is contained within S[start:end].  Optional
    arguments start and end are interpreted as in slice notation.
    
    Return -1 on failure.

返回子字符串 sub 在字符串内被找到的最大（最右）索引，这样 sub 将包含在 s[start:end] 当中。可选参数 start 与 end 会被解读为切片表示法。如果未找到则返回 -1。

'python python'.rfind('on')

'python python'.rfind('on',1,7)

'python python'.rfind('on',7)

'python python'.rfind('no')

-1

str.index 查找最小索引

字符串方法 str.index()，Python 官方文档描述如下：

help(str.index)

Help on method_descriptor:

index(...)
    S.index(sub[, start[, end]]) -> int
    
    Return the lowest index in S where substring sub is found,
    such that sub is contained within S[start:end].  Optional
    arguments start and end are interpreted as in slice notation.
    
    Raises ValueError when the substring is not found.

返回子字符串 sub 在 s[start:end] 切片内被找到的最小索引。可选参数 start 与 end 会被解读为切片表示法。类似于 find()，但在找不到 sub 时会引发 ValueError。

'pythonpython'.index('y')

'pythonpython'.index('pt')

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

<ipython-input-3-4252de1acf66> in <module>
----> 1 'pythonpython'.index('pt')


ValueError: substring not found

'pythonpython'.index('y',5)

'pythonpython'.index('y',5,7)

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

<ipython-input-5-35705122f03a> in <module>
----> 1 'pythonpython'.index('y',5,7)


ValueError: substring not found

str.rindex 查找最大索引

字符串方法 str.rindex()，Python 官方文档描述如下：

help(str.rindex)

Help on method_descriptor:

rindex(...)
    S.rindex(sub[, start[, end]]) -> int
    
    Return the highest index in S where substring sub is found,
    such that sub is contained within S[start:end].  Optional
    arguments start and end are interpreted as in slice notation.
    
    Raises ValueError when the substring is not found.

返回子字符串 sub 在字符串内被找到的最大（最右）索引，这样 sub 将包含在 s[start:end] 当中。可选参数 start 与 end 会被解读为切片表示法。如果未找到则返回 -1。类似于 rfind()，但在子字符串 sub 未找到时会引发 ValueError。

'python python'.rindex('on')

'python python'.rindex('on',1,7)

'python python'.rindex('on',7)

'python python'.rindex('no')

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

<ipython-input-5-92aeb174dba9> in <module>
----> 1 'python python'.rindex('no')


ValueError: substring not found

str.join 拼接字符串

字符串方法 str.join()，Python 官方文档描述如下：

help(str.join)

Help on method_descriptor:

join(self, iterable, /)
    Concatenate any number of strings.
    
    The string whose method is called is inserted in between each given string.
    The result is returned as a new string.
    
    Example: '.'.join(['ab', 'pq', 'rs']) -> 'ab.pq.rs'

返回一个由 iterable 中的字符串拼接而成的字符串。如果 iterable 中存在任何非字符串值则会引发 TypeError。调用该方法的字符串将作为元素之间的分隔。

'~'.join('abc')

'a~b~c'

'acb'.join(['1','2'])

'1acb2'

''.join(['1','2'])

'12'

'-'.join({'1':1,'2':2})

'1-2'

'-'.join(['1',2])

---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

<ipython-input-9-fc346e5ca62e> in <module>
----> 1 '-'.join(['1',2])


TypeError: sequence item 1: expected str instance, int found

'-'.join(b'abc')

---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

<ipython-input-8-9d04d7060926> in <module>
----> 1 '-'.join(b'abc')


TypeError: sequence item 0: expected str instance, int found

str.startswith 指定字符串开头？

字符串方法 str.startswith()，Python 官方文档描述如下：

help(str.startswith)

Help on method_descriptor:

startswith(...)
    S.startswith(prefix[, start[, end]]) -> bool
    
    Return True if S starts with the specified prefix, False otherwise.
    With optional start, test S beginning at that position.
    With optional end, stop comparing S at that position.
    prefix can also be a tuple of strings to try.

如果字符串以指定的 prefix 开始则返回 True，否则返回 False。prefix 也可以为由多个供查找的前缀构成的元组。如果有可选项 start，将从所指定位置开始检查。如果有可选项 end，将在所指定位置之前停止比较。

'a.b.a.c'.startswith('ab')

False

'a.b.a.c'.startswith('a.')

True

'a.b.a.c'.startswith('ab',2)

False

'a.b.a.c'.startswith('a.',4)

True

'a.b.a.c'.startswith('a',1,4)

False

str.endswith 指定字符串结尾？

字符串方法 str.endswith()，Python 官方文档描述如下：

help(str.endswith)

Help on method_descriptor:

endswith(...)
    S.endswith(suffix[, start[, end]]) -> bool
    
    Return True if S ends with the specified suffix, False otherwise.
    With optional start, test S beginning at that position.
    With optional end, stop comparing S at that position.
    suffix can also be a tuple of strings to try.

如果字符串以指定的 suffix 结束返回 True，否则返回 False。suffix 也可以为由多个供查找的后缀构成的元组。如果有可选项 start，将从所指定位置开始检查。如果有可选项 end，将在所指定位置之前停止比较。

'python.exe'.endswith('.exe')

True

'python.exe'.endswith(('.exe','.txt'), 5)

True

'python.exe'.endswith(('.py','.txt'), 5)

False

'python.exe'.endswith(('.exe','.txt'), 5,9)

False

str.ljust 左对齐

字符串方法 str.ljust()，Python 官方文档描述如下：

help(str.ljust)

Help on method_descriptor:

ljust(self, width, fillchar=' ', /)
    Return a left-justified string of length width.
    
    Padding is done using the specified fill character (default is a space).

返回长度为 width 的字符串，原字符串在其中靠左对齐。使用指定的 fillchar 填充空位 (默认使用 ASCII 空格符)。如果 width 小于等于字符串长度 len(str) 则返回原字符串的副本。

'python'.ljust(1)

'python'

'python'.ljust(10,'~')

'python~~~~'

'python'.ljust(10)

'python    '

str.center 居中

字符串方法 str.center()，Python 官方文档描述如下：

help(str.center)

Help on method_descriptor:

center(self, width, fillchar=' ', /)
    Return a centered string of length width.
    
    Padding is done using the specified fill character (default is a space).

返回长度为 width 的字符串，原字符串在其正中。使用指定的 fillchar 填充两边的空位（默认使用ASCII 空格符）。如果 width 小于等于字符串长度，则返回原字符串的副本:

'Python'.center(1)

'Python'

'Python'.center(10)

'  Python  '

'Python'.center(20,'~')

'~~~~~~~Python~~~~~~~'

str.rjust 右对齐

字符串方法 str.rjust()，Python 官方文档描述如下：

help(str.rjust)

Help on method_descriptor:

rjust(self, width, fillchar=' ', /)
    Return a right-justified string of length width.
    
    Padding is done using the specified fill character (default is a space).

返回长度为 width 的字符串，原字符串在其中靠右对齐。使用指定的 fillchar 填充空位 (默认使用 ASCII 空格符)。如果 width 小于等于字符串长度 len(str) 则返回原字符串的副本。

'python'.rjust(1)

'python'

'python'.rjust(10,'~')

'~~~~python'

'python'.rjust(10)

'    python'

str.format 格式化

字符串方法 str.format()，Python 官方文档描述如下：

help(str.format)

Help on method_descriptor:

format(...)
    S.format(*args, **kwargs) -> str
    
    Return a formatted version of S, using substitutions from args and kwargs.
    The substitutions are identified by braces ('{' and '}').

执行字符串格式化操作。调用此方法的字符串可以包含字符串字面值或者以花括号 {} 括起来的替换域。每个替换域可以包含一个位置参数的数字索引，或者一个关键字参数的名称。返回的字符串副本中每个替换域都会被替换为对应参数的字符串值。

"The sum of 1 + 2 is {0}".format(1+2)

'The sum of 1 + 2 is 3'

如果你需要在字面文本中包含花括号字符，可以通过重复来转义:

"{{python}}".format()

'{python}'

位置传参和关键字传参方式非常灵活，多个位置索引依次为 0,1,2……，且可以不插入字符串中；关键字传参则需要将关键字插入字符串中：

'{} and {} are both {age} years old.\
'.format('A','B',age=18)

'A and B are both 18 years old.'

'{1} and {0} are both {age} years old.\
'.format('A','B',age=18)

'B and A are both 18 years old.'

'{age} and {} are both {} years old.\
'.format('A','B',age=18)

'18 and A are both B years old.'

'{0} and {0} are both {0} years old.\
'.format('A','B',age=18)

'A and A are both A years old.'

通常，格式化值的工作由值本身的 __format__() 方法来完成。但是，在某些情况下最好强制将类型格式化为一个字符串，覆盖其本身的格式化定义。通过在调用 __format__() 之前将值转换为字符串，可以绕过正常的格式化逻辑。

目前支持的转换旗标有三种: ‘!s’ 会对值调用 str()，’!r’ 调用 repr() 而 ‘!a’ 则调用 ascii()。

'{1!s} and {0!r} are both {age!a} years old.\
'.format('A','B',age=18)

"B and 'A' are both 18 years old."

可以包含值应如何呈现的规格描述，例如字段宽度、对齐、填充、小数精度等细节信息。每种值类型可以定义自己的 “格式化迷你语言” 或解读方式。

各种格式化方式示例：

# 复数格式化
('The complex number {0} is formed'
 ' from the real part {0.real} '
 'and the imaginary part {0.imag}.').format(3-5j)

'The complex number (3-5j) is formed from the real part 3.0 and the imaginary part -5.0.'

# 利用索引取出某项格式化，不可切片
'X: {0[0]}; Y: {0[1]}'.format([1,2,3])

'X: 1; Y: 2'

# 切片格式化
a = [1,2,3]
'X: {0}; Y: {1}'.format(a[:2],a[-2:])

'X: [1, 2]; Y: [2, 3]'

# 右对齐
'{:>20}'.format('right aligned')

'       right aligned'

# 填充 ~ 居中
'{:~^20}'.format('centered')

'~~~~~~centered~~~~~~'

# 更复杂的排版
for i, w in zip('<^>', ['left', 'center', 'right']):
    print('{0:{fill}{align}20}'.format(w, fill=i, align=i))

left<<<<<<<<<<<<<<<<
^^^^^^^center^^^^^^^
>>>>>>>>>>>>>>>right

# 数字前填充 0
'{:05}'.format(12)

'00012'

# 设置保留精度
'{:f}; {:+.1f}'.format(3.14, -3.14)

'3.140000; -3.1'

# 各种进制格式化
"int: {0:d}; hex: {0:x}; oct: {0:o}; \
bin: {0:b}".format(42)

'int: 42; hex: 2a; oct: 52; bin: 101010'

# 保留进制前缀
"int: {0:d}; hex: {0:#x}; oct: {0:#o}; \
bin: {0:#b}".format(42)

'int: 42; hex: 0x2a; oct: 0o52; bin: 0b101010'

# 让数字更易读
'{:,}'.format(1234567890)

'1,234,567,890'

# 百分比格式化
'Correct answers: {:.2%}'.format(5/6)

'Correct answers: 83.33%'

# 特定类型的专属格式化
import datetime
d = datetime.datetime(2010, 7, 4, 12, 15, 58)
'{:%Y-%m-%d %H:%M:%S}'.format(d)

'2010-07-04 12:15:58'

# IP地址格式化
octets = [192, 168, 0, 1]
'{:02X}{:02X}{:02X}{:02X}'.format(*octets)

'C0A80001'

“格式化迷你语言” 总结

各种对齐选项的含义：

‘<’ 强制字段在可用空间内左对齐（这是大多数对象的默认值）。
‘>’ 强制字段在可用空间内右对齐（这是数字的默认值）。
‘=’ 强制将填充放置在符号（如果有）之后但在数字之前。这用于以 “+000000120” 形式打印字段。此对齐选项仅对数字类型有效。当 ’0’ 紧接在字段宽度之前时，它成为默认值。
‘^’ 强制字段在可用空间内居中。

仅对数字类型有效选项：

‘+’ 表示标志应该用于正数和负数。
‘-’ 表示标志应仅用于负数（这是默认行为）。
space 表示应在正数上使用前导空格，在负数上使用减号。
‘#’ 选项可以让“替代形式”被用于转换。替代形式可针对不同类型分别定义。对于整数类型，当使用二进制、八进制或十六进制输出时，此选项会为输出值添加相应的 ‘0b’, ‘0o’ 或 ‘0x’ 前缀。
‘,’ 选项表示使用逗号作为千位分隔符。对于感应区域设置的分隔符，请改用 ’n’ 整数表示类型。
‘_’ 选项表示对浮点表示类型和整数表示类型 ’d’ 使用下划线作为千位分隔符。对于整数表示类型 ‘b’,‘o’, ‘x’ 和 ‘X’，将为每 4 个数位插入一个下划线。对于其他表示类型指定此选项则将导致错误。

确定了数据应如何呈现：

’s’ 字符串格式。这是字符串的默认类型，可以省略。
‘b’ 二进制格式。输出以 2 为基数的数字。
‘c’ 字符。在打印之前将整数转换为相应的 unicode 字符。
’d’ 十进制整数。输出以 10 为基数的数字。
‘o’ 八进制格式。输出以 8 为基数的数字。
‘x’ 十六进制格式。输出以 16 为基数的数字，使用小写字母表示 9 以上的数码。
‘X’ 十六进制格式。输出以 16 为基数的数字，使用大写字母表示 9 以上的数码。
’n’ 数字。这与 ’d’ 相似，不同之处在于它会使用当前区域设置来插入适当的数字分隔字符。
’e’ 指数表示。以使用字母’e’ 来标示指数的科学计数法打印数字。默认的精度为 6。
‘E’ 指数表示。与 ’e’ 相似，不同之处在于它使用大写字母’E’ 作为分隔字符。
‘f’ 定点表示。将数字显示为一个定点数。默认的精确度为 6。
‘F’ 定点表示。与 ‘f’ 相似，但会将 nan 转为 NAN 并将 inf 转为 INF。
‘g’ 常规格式。对于给定的精度 p >= 1，这会将数值舍入到 p 位有效数字，再将结果以定点格式或科学计数法进行格式化，具体取决于其值的大小。
‘G’ 常规格式。类似于 ‘g’，不同之处在于当数值非常大时会切换为 ‘E’。无穷与 NaN 也会表示为大写形式。
’n’ 数字。这与 ‘g’ 相似，不同之处在于它会使用当前区域设置来插入适当的数字分隔字符。
‘%’ 百分比。将数字乘以 100 并显示为定点 (‘f’) 格式，后面带一个百分号。

str.format_map 格式化

字符串方法 str.format_map()，Python 官方文档描述如下：

help(str.format_map)

Help on method_descriptor:

format_map(...)
    S.format_map(mapping) -> str
    
    Return a formatted version of S, using substitutions from mapping.
    The substitutions are identified by braces ('{' and '}').

类似于 str.format(**mapping)，不同之处在于 mapping 会被直接使用。适宜使用此方法的一个例子是当 mapping 为 dict 的子类的情况：

# 创建一个字典子类型，当 键值对 不存在时，返回键
class Default(dict):
    def __missing__(self, key):
        return key
d = Default(a=1)
d['a'], d['b']

(1, 'b')

# country 键值对不存在，所以直接格式化键 ‘country’
'{name} was born in {country}'.format_map(
    Default(name='Guido'))

'Guido was born in country'

与 format 格式化对比：

'{a} is {age}'.format_map({'a':'A', 'age':18})

'A is 18'

'{a} is {age}'.format(**{'a':'A', 'age':18})

'A is 18'

f-string 格式化字符串

f-string 即格式化字符串字面值。字符串以 ‘f’ 或 ‘F’ 为前缀。这种字符串可包含替换字段，即以 {} 标示的表达式。格式化字符串字面值，会在运行时将表达式求值，而其他字符串字面值总是一个常量。

格式化字符串字面值中的表达式会被当作包含在圆括号中的普通 Python 表达式一样处理，但有少数例外。

空表达式不被允许，lambda 和赋值表达式 :=（python 3.8版添加）必须显式地加上圆括号。

f'{(a := 1+1)}' # python 3.8 才能运行

'2'

f'{(lambda x:1)}'

'<function <lambda> at 0x000001D70B06CA60>'

替换表达式可以包含换行（例如在三重引号字符串中），但是不能包含注释。

a = 3; b = 2
f'''3+2\
-5=
{a +
b - 5}'''

'3+2-5=\n0'

每个表达式会在格式化字符串字面值所包含的位置按照从左至右的顺序被求值。

f'{1+2 > 3}'

'False'

可以在表达式后加一个等于号 ‘=’（3.8 新版功能），提供了等于号 ‘=’ 的时候，输出将包含 ‘=’、’=’ 前后的空格以及求值结果。默认情况下，’=’ 会导致表达式的 repr() 被使用，除非专门指定了格式。

foo = "bar"
f"{ foo = }"

" foo = 'bar'"

可以带一个以叹号 ‘!’ 标示的转换字段，转换符 ‘!s’ 即对结果调用 str()，’!r’ 为调用 repr()，而 ‘!a’ 为调用 ascii()。

foo = "bar"
f"{foo = !s}"

'foo = bar'

还可带一个以冒号 ‘:’ 标示的格式说明符，“格式化迷你语言” 与 str.format() 方法所使用的微语言一致，详见 str.format 方法。

foo = 3.14
f"{foo:.4f}"

'3.1400'

f'{123:#o}'

'0o173'

a=5/6
f'{a:.2%}'

'83.33%'

格式表达式中不允许有反斜杠，这会引发错误:

f"newline: {ord('\n')}"

  File "<ipython-input-23-30c78f70325d>", line 1
    f"newline: {ord('\n')}"
    ^
SyntaxError: f-string expression part cannot include a backslash

想包含需要用反斜杠转义的值，可以创建一个临时变量:

newline = ord('\n')
f"newline: {newline}"

'newline: 10'

格式化字符串字面值不可用作文档字符串，即便其中没有包含表达式:

def foo():
    f"Not a docstring"

print(foo.__doc__)

None

字符串操作符

操作符 `*`

操作符 * 可以实现将字符串重复 n（整数）遍相连接：

'Python' * 3

'PythonPythonPython'

n 是小于 1 的整数，则得到空字符串：

'Python' * -1

''

* 操作符可以与 = 连用，重复拼接并赋值：

a = 'py'
a *= 3
a

'pypypy'

由于字符串是可迭代对象，因此可以使用 * 对字符串进行拆包：

(*'Python',)

('P', 'y', 't', 'h', 'o', 'n')

操作符 `%`

字符串使用 % 操作符，官方文档叫 “printf 风格的字符串格式化”。比较早的格式化方法，官方已不推荐使用，了解它能更好地读懂别人的代码。

转换标记符包含两个或更多字符并具有以下组成，且必须遵循如下规定的顺序：

‘%’ 字符，用于标记转换符的起始。
映射键（可选），由加圆括号的字符序列组成。
转换旗标（可选），用于影响某些转换类型的结果。
最小字段宽度（可选）。如果指定为 ‘*’ (星号)，则实际宽度会从 values 元组的下一元素中读取，要转换的对象则为最小字段宽度和可选的精度之后的元素。
精度（可选），以在 ‘.’ (点号) 之后加精度值的形式给出。如果指定为 ‘*’ (星号)，则实际精度会从 values 元组的下一元素中读取，要转换的对象则为精度之后的元素。
长度修饰符（可选）。
转换类型。

'hi %r' % 'python'

"hi 'python'"

'%s %r' % ('hi','python')

"hi 'python'"

转换旗标为：

标志	含义
‘#’	值的转换将使用“替代形式”。
‘0’	转换将为数字值填充零字符。
‘-’	转换值将靠左对齐（如果同时给出 ‘0’ 转换，则会覆盖后者）。
’ '	(空格) 符号位转换产生的正数（或空字符串）前将留出一个空格。
‘+’	符号字符 (’+’ 或 ‘-’) 将显示于转换结果的开头（会覆盖 ”空格” 旗标）。

'A is %#x' % 18

'A is 0x12'

'A is %    d' % 18

'A is  18'

'A is %05o' % 18

'A is 00022'

转换类型为：

转换符	含义
’d’	有符号十进制整数。
‘i’	有符号十进制整数。
‘o’	有符号八进制数。
‘x’	有符号十六进制数（小写）。
‘X’	有符号十六进制数（大写）。
’e’	浮点指数格式（小写）。
‘E’	浮点指数格式（大写）。
‘f’	浮点十进制格式。
‘F’	浮点十进制格式。
‘g’	浮点格式。如果指数小于 -4 或不小于精度则使用小写指数格式，否则使用十进制格式。
‘G’	浮点格式。如果指数小于 -4 或不小于精度则使用大写指数格式，否则使用十进制格式。
‘c’	单个字符（接受整数或单个字符的字符串）。
‘r’	字符串（使用repr() 转换任何 Python 对象）。
’s’	字符串（使用str() 转换任何 Python 对象）。
‘a’	字符串（使用ascii() 转换任何 Python 对象）。
‘%’	不转换参数，在结果中输出一个 ‘%’ 字符。

'%f' % 3.14

'3.140000'

'%.3e' % 3.14

'3.140e+00'

'%.1f%%' % (3.14*100)

'314.0%'

当右边的参数为一个字典（或其他映射类型）时，字符串中的格式必须包含加圆括号的映射键，对应 % 字符之后字典中的每一项。映射键将从映射中选取要格式化的值:

'%(language)s has %(number)03d quote types.' %\
{'language': "Python", "number": 2}

'Python has 002 quote types.'

str.encode 编码为字节串

字符串方法 str.encode()，Python 官方文档描述如下：

help(str.encode)

Help on method_descriptor:

encode(self, /, encoding='utf-8', errors='strict')
    Encode the string using the codec registered for encoding.
    
    encoding
      The encoding in which to encode the string.
    errors
      The error handling scheme to use for encoding errors.
      The default is 'strict' meaning that encoding errors raise a
      UnicodeEncodeError.  Other possible values are 'ignore', 'replace' and
      'xmlcharrefreplace' as well as any other name registered with
      codecs.register_error that can handle UnicodeEncodeErrors.

返回原字符串编码为字节串对象的版本。默认编码为 ‘utf-8’。可以给出 errors 来设置不同的错误处理方案。errors 的默认值为 ‘strict’，表示编码错误会引发 UnicodeError。

下列为 ‘utf-8’ 和 ‘gbk’ 两种编码比较：

'嗨 python'.encode()

b'\xe5\x97\xa8 python'

'嗨 python'.encode('gbk')

b'\xe0\xcb python'

'▲ python'.encode('gbk')

b'\xa1\xf8 python'

'🔺 python'.encode()

b'\xf0\x9f\x94\xba python'

'🔺 python'.encode('gbk') #gbk 不能编码 🔺

---------------------------------------------------------------------------

UnicodeEncodeError                        Traceback (most recent call last)

<ipython-input-15-60e87a9208be> in <module>
----> 1 '🔺 python'.encode('gbk')


UnicodeEncodeError: 'gbk' codec can't encode character '\U0001f53a' in position 0: illegal multibyte sequence

拓展：

将字节串解码为字符串用 bytes.decode:

help(bytes.decode)

[1;31mSignature:[0m [0mbytes[0m[1;33m.[0m[0mdecode[0m[1;33m([0m[0mself[0m[1;33m,[0m [1;33m/[0m[1;33m,[0m [0mencoding[0m[1;33m=[0m[1;34m'utf-8'[0m[1;33m,[0m [0merrors[0m[1;33m=[0m[1;34m'strict'[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Decode the bytes using the codec registered for encoding.

encoding
  The encoding with which to decode the bytes.
errors
  The error handling scheme to use for the handling of decoding errors.
  The default is 'strict' meaning that decoding errors raise a
  UnicodeDecodeError. Other possible values are 'ignore' and 'replace'
  as well as any other name registered with codecs.register_error that
  can handle UnicodeDecodeErrors.
[1;31mType:[0m      method_descriptor

b'\xf0\x9f\x94\xba python'.decode()

'🔺 python'

str.capitalize 首字符大写

字符串方法 str.capitalize()，Python 官方文档描述如下：

help(str.capitalize)

Help on method_descriptor:

capitalize(self, /)
    Return a capitalized version of the string.
    
    More specifically, make the first character have upper case and the rest lower
    case.

返回原字符串的副本，其首个字符大写，其余为小写:

'pyTHON'.capitalize()

'Python'

只有首个字符是字母，才会将首个字符大写：

'嗨 pyTHON'.capitalize()

'嗨 python'

str.casefold 消除大小写

字符串方法 str.casefold()，Python 官方文档描述如下：

help(str.casefold)

Help on method_descriptor:

casefold(self, /)
    Return a version of the string suitable for caseless comparisons.

返回原字符串消除大小写的副本。消除大小写的字符串可用于忽略大小写的匹配。

消除大小写类似于转为小写，但是更加彻底一些，因为它会移除字符串中的所有大小写变化形式。例如，德语小写字母 ‘ß’ 相当于 “ss”，由于它已经是小写，lower() 不会对 ‘ß’ 做任何改变，而 casefold() 则会将其转换为 “ss”。

'pYthOn'.casefold()

'python'

'ß'.casefold()

'ss'

'ß'.lower()

'ß'

str.lower 转小写

字符串方法 str.lower()，Python 官方文档描述如下：

help(str.lower)

Help on method_descriptor:

lower(self, /)
    Return a copy of the string converted to lowercase.

返回原字符串的副本，其所有区分大小写的字符均转换为小写。

'PyThon'.lower()

'python'

'嗨 PyThon'.lower()

'嗨 python'

'PyThon Γ'.lower()

'python γ'

str.title 单词首字母大写

字符串方法 str.title()，Python 官方文档描述如下：

help(str.title)

Help on method_descriptor:

title(self, /)
    Return a version of the string where each word is titlecased.
    
    More specifically, words start with uppercased characters and all remaining
    cased characters have lower case.

返回原字符串的标题版本，其中每个单词第一个字母为大写，其余字母为小写。

'hi python'.title()

'Hi Python'

'嗨python'.title()

'嗨Python'

该算法使用一种简单的与语言无关的定义，将连续的字母组合视为单词。该定义在多数情况下都很有效，但它也意味着代表缩写形式与所有格的撇号也会成为单词边界，这可能导致不希望的结果:

"they're bill's friends from the UK".title()

"They'Re Bill'S Friends From The Uk"

str.upper 转大写

字符串方法 str.upper()，Python 官方文档描述如下：

help(str.upper)

Help on method_descriptor:

upper(self, /)
    Return a copy of the string converted to uppercase.

返回原字符串的副本，其中所有区分大小写的字符均转换为大写。

'嗨python'.upper()

'嗨PYTHON'

'πpython'.upper()

'ΠPYTHON'

str.swapcase 大小写互转

字符串方法 str.swapcase()，Python 官方文档描述如下：

help(str.swapcase)

Help on method_descriptor:

swapcase(self, /)
    Convert uppercase characters to lowercase and lowercase characters to uppercase.

返回原字符串的副本，其中大写字符转换为小写，反之亦然。请注意 s.swapcase().swapcase() == s 并不一定为真值。

'PythoN'.swapcase()

'pYTHOn'

'pYTHOn'.swapcase()

'PythoN'

'ß'.swapcase() # 德语的小写字母 ß 相当于 ss

'SS'

'SS'.swapcase()

'ss'

'ß'.swapcase().swapcase() == 'ß'

False

str.zfill 填充 0

字符串方法 str.zfill()，Python 官方文档描述如下：

help(str.zfill)

Help on method_descriptor:

zfill(self, width, /)
    Pad a numeric string with zeros on the left, to fill a field of the given width.
    
    The string is never truncated.

返回原字符串的副本，在左边填充 ASCII ‘0’ 数码使其长度变为 width。正负值前缀 (’+’/’-’) 的处理方式是在正负符号之后填充而非在之前。如果 width 小于等于 len(str) 则返回原字符串的副本。

"42a".zfill(5)

'0042a'

"-42".zfill(5)

'-0042'

"-42".zfill(1)

'-42'

str.translate 按表转换

文档描述如下：

help(str.translate)

Help on method_descriptor:

translate(self, table, /)
    Replace each character in the string using the given translation table.
    
      table
        Translation table, which must be a mapping of Unicode ordinals to
        Unicode ordinals, strings, or None.
    
    The table must implement lookup/indexing via __getitem__, for instance a
    dictionary or list.  If this operation raises LookupError, the character is
    left untouched.  Characters mapped to None are deleted.

返回原字符串的副本，其中每个字符按给定的转换表进行映射。转换表必须是一个使用 __getitem__() 来实现索引操作的对象，通常为 mapping 或 sequence。当以 Unicode 码位序号（整数）为索引时，转换表对象可以做以下任何一种操作：返回 Unicode 序号或字符串，将字符映射为一个或多个字符；返回 None，将字符从结果字符串中删除；或引发 LookupError 异常，将字符映射为其自身。

ord('p'),ord('C')

(112, 67)

'python'.translate({112:67})

'Cython'

'python'.translate({112:'Cp'})

'Cpython'

'python'.translate({112:None})

'ython'

你可以使用 str.maketrans() 基于不同格式的字符到字符映射来创建一个转换映射表。

table = str.maketrans('pto','123')
'python'.translate(table)

'1y2h3n'

str.maketrans 生成转换表

字符串方法 str.maketrans()，该方法是一个静态方法（没有 self），Python 官方文档描述如下：

help(str.maketrans)

Help on built-in function maketrans:

maketrans(x, y=None, z=None, /)
    Return a translation table usable for str.translate().
    
    If there is only one argument, it must be a dictionary mapping Unicode
    ordinals (integers) or characters to Unicode ordinals, strings or None.
    Character keys will be then converted to ordinals.
    If there are two arguments, they must be strings of equal length, and
    in the resulting dictionary, each character in x will be mapped to the
    character at the same position in y. If there is a third argument, it
    must be a string, whose characters will be mapped to None in the result.

返回一个可供 str.translate() 使用的转换对照表。

如果只有一个参数，则它必须是一个将 Unicode 码位序号（整数）或字符（长度为 1 的字符串）映射到 Unicode 码位序号、（任意长度的）字符串或 None 的字典。字符键将会被转换为码位序号。

str.maketrans({97:'123'})

{97: '123'}

str.maketrans({'a':97})

{97: 97}

str.maketrans({'a':None})

{97: None}

如果有两个参数，则它们必须是两个长度相等的字符串，并且在结果字典中，x 中每个字符将被映射到 y 中相同位置的字符。

str.maketrans('abc','123')

{97: 49, 98: 50, 99: 51}

如果有第三个参数，它必须是一个字符串，其中的字符将在结果中被映射到 None。

str.maketrans('ab','12','xy')

{97: 49, 98: 50, 120: None, 121: None}

str.isalnum 是字母或数字？

字符串方法 str.isalnum()，Python 官方文档描述如下：

help(str.isalnum)

Help on method_descriptor:

isalnum(self, /)
    Return True if the string is an alpha-numeric string, False otherwise.
    
    A string is alpha-numeric if all characters in the string are alpha-numeric and
    there is at least one character in the string.

如果字符串中的所有字符都是字母或数字且至少有一个字符，则返回 True ，否则返回 False 。

''.isalnum()

False

'python123'.isalnum()

True

'python 123'.isalnum()

False

'γ'.isalnum()

True

str.isalpha 是字母（包括汉字等）？

字符串方法 str.isalpha()，Python 官方文档描述如下：

help(str.isalpha)

Help on method_descriptor:

isalpha(self, /)
    Return True if the string is an alphabetic string, False otherwise.
    
    A string is alphabetic if all characters in the string are alphabetic and there
    is at least one character in the string.

如果字符串中的所有字符都是字母，并且至少有一个字符，返回 True ，否则返回 False 。

字母字符是指那些在 Unicode 字符数据库中定义为 ”Letter” 的字符，即那些具有 ”Lm”、”Lt”、”Lu”、”Ll” 或 ”Lo” 之一的通用类别属性的字符。注意，这与 Unicode 标准中定义的 ”字母” 属性不同。

此处的字母包括汉字等。

''.isalpha()

False

'γ'.isalpha()

True

'嗨你好'.isalpha()

True

'嗨！你好'.isalpha()

False

str.isdecimal 是十进制字符？

字符串方法 str.isdecimal()，Python 官方文档描述如下：

help(str.isdecimal)

Help on method_descriptor:

isdecimal(self, /)
    Return True if the string is a decimal string, False otherwise.
    
    A string is a decimal string if all characters in the string are decimal and
    there is at least one character in the string.

如果字符串中的所有字符都是十进制字符且该字符串至少有一个字符，则返回 True，否则返回 False。

十进制字符指那些可以用来组成 10 进制数字的字符。严格地讲，十进制字符是 Unicode 通用类别 ”Nd” 中的一个字符。

''.isdecimal()

False

'3.14'.isdecimal()

False

'０1２3'.isdecimal()

True

'5²'.isdecimal()

False

'python'.isdecimal()

False

b'100'.isdecimal()

---------------------------------------------------------------------------

AttributeError                            Traceback (most recent call last)

<ipython-input-13-52e1682babfd> in <module>
----> 1 b'100'.isdecimal()


AttributeError: 'bytes' object has no attribute 'isdecimal'

str.isdigit 是数字？

字符串方法 str.isdigit()，Python 官方文档描述如下：

help(str.isdigit)

Help on method_descriptor:

isdigit(self, /)
    Return True if the string is a digit string, False otherwise.
    
    A string is a digit string if all characters in the string are digits and there
    is at least one character in the string.

如果字符串中的所有字符都是数字，并且至少有一个字符，返回 True ，否则返回 False 。

数字包括十进制字符和需要特殊处理的数字，如兼容性上标数字。这包括了不能用来组成 10 进制数的数字，如 Kharosthi 数。严格地讲，数字是指属性值为 Numeric_Type=Digit 或 Numeric_Type=Decimal 的字符。

'一'.isdigit()

False

'3.14'.isdigit()

False

'１２３'.isdigit()

True

b'123'.isdigit()

True

'5²'.isdigit()

True

str.isnumeric 是数值字符？

字符串方法 str.isnumeric()，Python 官方文档描述如下：

help(str.isnumeric)

Help on method_descriptor:

isnumeric(self, /)
    Return True if the string is a numeric string, False otherwise.
    
    A string is numeric if all characters in the string are numeric and there is at
    least one character in the string.

如果字符串中至少有一个字符且所有字符均为数值字符则返回 True，否则返回 False。

数值字符包括数字字符，以及所有在 Unicode 中设置了数值特性属性的字符，例如 U+2155, VUL-GAR FRACTION ONE FIFTH。正式的定义为：数值字符就是具有特征属性值 Numeric_Type=Digit, Numeric_Type=Decimal 或 Numeric_Type=Numeric 的字符。

此处所指数字包括罗马数字，汉字数字等。

'②'.isnumeric()

True

'3.14'.isnumeric()

False

'5²'.isnumeric()

True

'Ⅷ'.isnumeric()

True

'一'.isnumeric()

True

'壹'.isnumeric()

True

str.islower 是小写？

字符串方法 str.islower()，Python 官方文档描述如下：

help(str.islower)

Help on method_descriptor:

islower(self, /)
    Return True if the string is a lowercase string, False otherwise.
    
    A string is lowercase if all cased characters in the string are lowercase and
    there is at least one cased character in the string.

如果字符串中至少有一个区分大小写的字符且此类字符均为小写则返回 True，否则返回 False。

'嗨'.islower()

False

'嗨 Abc'.islower()

False

'嗨 abc'.islower()

True

str.isupper 是大写？

字符串方法 str.isupper()，Python 官方文档描述如下：

help(str.isupper)

Help on method_descriptor:

isupper(self, /)
    Return True if the string is an uppercase string, False otherwise.
    
    A string is uppercase if all cased characters in the string are uppercase and
    there is at least one cased character in the string.

如果字符串中至少有一个区分大小写的字符且此类字符均为大写则返回 True，否则返回 False。

'Γ'.isupper()

True

'嗨 AB'.isupper()

True

'嗨 Ab'.isupper()

False

str.istitle 是标题字符串？

字符串方法 str.istitle()，Python 官方文档描述如下：

help(str.istitle)

Help on method_descriptor:

istitle(self, /)
    Return True if the string is a title-cased string, False otherwise.
    
    In a title-cased string, upper- and title-case characters may only
    follow uncased characters and lowercase characters only cased ones.

如果字符串中至少有一个字符且为标题字符串则返回 True，例如大写字符之后只能带非大写字符而小写字符必须有大写字符打头。否则返回 False。

'Abc Py'.istitle()

True

'嗨 A11'.istitle()

True

'嗨 Abc'.istitle()

True

'嗨 ABC'.istitle()

False

str.isascii 是 ASCII 字符？

字符串方法 str.isascii()，Python 官方文档描述如下：

help(str.isascii)

Help on method_descriptor:

isascii(self, /)
    Return True if all characters in the string are ASCII, False otherwise.
    
    ASCII characters have code points in the range U+0000-U+007F.
    Empty string is ASCII too.

如果字符串为空或字符串中的所有字符都是 ASCII ，返回 True，否则返回 False。ASCII 字符的码点范围是 U+0000-U+007F。

''.isascii()

True

'python'.isascii()

True

'python.3'.isascii()

True

'嗨 python'.isascii()

False

str.isidentifier 是有效标识符？

字符串方法 str.isidentifier()，Python 官方文档描述如下：

help(str.isidentifier)

Help on method_descriptor:

isidentifier(self, /)
    Return True if the string is a valid Python identifier, False otherwise.
    
    Call keyword.iskeyword(s) to test whether string s is a reserved identifier,
    such as "def" or "class".

如果字符串是有效的标识符，返回 True，否则返回 False。

''.isidentifier()

False

'1mycode'.isidentifier()

False

'_mycode'.isidentifier()

True

'123'.isidentifier()

False

'_123'.isidentifier()

True

'变量名'.isidentifier()

True

'for'.isidentifier()

True

str.isprintable 是可打印字符？

字符串方法 str.isprintable()，Python 官方文档描述如下：

help(str.isprintable)

Help on method_descriptor:

isprintable(self, /)
    Return True if the string is printable, False otherwise.
    
    A string is printable if all of its characters are considered printable in
    repr() or if it is empty.

如果字符串中所有字符均为可打印字符或字符串为空则返回 True，否则返回 False。

不可打印字符是在 Unicode 字符数据库中被定义为 ”Other” 或 ”Separator” 的字符，例外情况是 ASCII 空格字符 (0x20) 被视作可打印字符。

请注意在此语境下可打印字符是指当对一个字符串发起调用 repr() 时不必被转义的字符。它们与字符串写入 sys.stdout 或 sys.stderr 时所需的处理无关。

''.isprintable()

True

' '.isprintable()

True

'\n'.isprintable()

False

'\python'.isprintable()

True

'py\thon'.isprintable()

False

str.isspace 是空白字符？

字符串方法 str.isspace()，Python 官方文档描述如下：

help(str.isspace)

Help on method_descriptor:

isspace(self, /)
    Return True if the string is a whitespace string, False otherwise.
    
    A string is whitespace if all characters in the string are whitespace and there
    is at least one character in the string.

如果字符串中只有空白字符且至少有一个字符则返回 True，否则返回 False。

''.isspace()

False

' '.isspace()

True

'\n\t\r\f'.isspace()

True

' \\'.isspace()

False

str.removeprefix 移除前缀

字符串方法 str.removeprefix()。

3.9 版本新功能。

str.removeprefix(prefix, /)，如果字符串以前缀字符串 prefix 开头，返回 string[len(prefix):]，否则，返回原始字符串的副本。

'TestHook'.removeprefix('Test')

'Hook'

'BaseTestCase'.removeprefix('Test')

'BaseTestCase'

str.removesuffix 移除后缀

字符串方法 str.removesuffix()。

3.9 版本新功能。

str.removesuffix(suffix, /)，如果字符串以后缀字符串 suffix 结尾，并且后缀非空，返回 string[:-len(suffix)]，否则，返回原始字符串的副本。

'MiscTests'.removesuffix('Tests')

'Misc'

'TmpDirMixin'.removesuffix('Tests')

'TmpDirMixin'

字符串概述