python基础二——数据类型之字符串str

原创

红旗下的小兵 2019-07-30 15:25:53 博主文章分类：Python ©著作权

©著作权归作者所有：来自51CTO博客作者红旗下的小兵的原创作品，请联系作者获取转载授权，否则将追究法律责任

这篇文章，我们着重看下字符串str一些方法及函数，主要包括以下8个知识点：

ord（）函数
chr（）函数
str.encode方法
bytes.decode方法
len（）函数
字符串中插入变量
字符串截取方法
（补充）字符串其它方法
isdigit( ) 函数

在最新的python3版本中，字符串以unicode编码的，也就是说，Python的字符串支持多语言，例如：

print("中英文的字符串str") #中英文的字符串str

ord( )函数

对于单个字符的编码（把字母转化为Ascii），ord函数可以获取单个字符的整数：

print(ord("M")) # 77

chr（）函数

chr函数可以把编码转化为字符：

print(chr(66)) # B
print(chr(25991)) # 文

str.encode( )方法

由于python的字符串类型是str，在内存中用Unicode表示，一个字符对应若干个字节。如果想要在网络上传输、或保存到磁盘上，就需要把str变为以字节为单位的bytes。

以Unicode表示的str可以通过encode（）方法编码为指定的bytes：

纯英文：

纯英文的str，用ASCII编码为bytes

print("abc".encode("ascii")) # b'abc'

含有中文的str可以用UTF-8编码为bytes

print("我是中文".encode("utf-8"))
# b'\xe6\x88\x91\xe6\x98\xaf\xe4\xb8\xad\xe6\x96\x87'

bytes.decode( )方法

如果我们想从网络上或磁盘上读取字节流，那么我们需要读到的数据都是bytes，想要把bytes变为str，需要用decode（）方法：

print(b'abc'.decode("ascii")) #  abc
print(b'\xe4\xb8\xad\xe6\x96\x87'.decode('utf-8'))  # 中文

这里要注意的是：在python中，对bytes类型的数据用带有前缀的b的单引号或双引号表示：b' ' 或者 b" "

str = "abc" 
bytes = b'abc'

上边代码，“abc”表示的是字符串，而b'abc'表示是字节。

如果bytes中包含了无法解码的字节，decode（）方法会报错！！！

print(b'\xeddas4\xb8\xad\xe6\x96\x87'.decode('utf-8'))  # 报错！！！

errors=‘ignore’

如果bytes中只有一小部分无效字节，可以使用errors=‘ignore’忽略错误的字节，且会把错误字节打印出来：

print(b'\xedhehe\xb8\xad\xe6\x96\x87'.decode('utf-8',errors='ignore'))  # hehe文

总结：最常用的编码是UTF-8，当然python也支持其他编码格式。。。

len（）函数

计算字符串中包含多少个字符（与js中的字符串length方法一样）

print(len("abc,呵呵 "))  #  7

也可以计算bytes中的字节数：

print(len(b'abc'))  #  3
print(len('我是中文'.encode('utf-8'))) # 12

上边代码，可见，一个中文字符经过utf-8编码后，通常会占3个字节，而英文只占一个字节。

这里有一个重点：

由于python的源代码是文本文件，所以当代码中包含中文时，在保存源代码时，务必要保存为UTF-8编码。当python解释器读取源代码时，为了让它按照UTF-8编码读取，我们通常在文件开头写上这两行代码：

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

第一行：注释为了告诉Linux/OS x系统，这是python可执行程序，windows系统会忽略这个注释！

第二行：为了告诉python解释器，按照utf-8编码读取代码，否则在你源代码中写的中文输出可能会有乱码。

在字符串中插入变量：（与ES6中的模板字符串类似）在后边文章中有介绍到字典类型如何插入字符串模板中

不用说在字符串中插入变量在开发中会经常用到！我们先来看下python中如何应用：

先来个小小demo：

name = 'lxc'
age = 20
height = '170'
print("name:%s,age:%d,height:%s" % (name,age,height))

上边代码，在字符串内部，%s表示需要用字符串来替换（经常用），%d表示需要用数字来替换，后边的% 是用来格式化字符串的，最后边的括号里边是前边对应的值。（在ES6中模板字符串中，写法是 `我是${变量}` ）

%s会把人和数据类型转化为字符串：

state = True
print("is:%s" % (state)) # 'is:True'

经常用到的占位符有：

占位符：需要替换的内容是：
%d ———————————— 整数——————————————————————————————————
%f ———————————— 浮点数——————————————————————————————————
%s ———————————— 字符串——————————————————————————————————
%x ———————————— 十六进制整数————————————————————————————————

如果，在字符串中出现普通的%，可以使用%%两个符号表示一个% （与js正则转义字符 \ 一样）相当于转义了

state = True
print("is:%s%%" % (state)) # 'is:True%'

字符串的运算：

print("我叫"+"lxc") # 我叫lxc

相乘与乘法运算一样。

print("*"*3) # ***

字符串截取：

字符串截取与range（）函数的原则一样，符合左闭右开的原则

name = 'hello world'
print(name[0:5]) # hello  从0位截取到第5位，不包含5！截取字符串是从0开始的，也就是字符串的初始索引位
#截取world
#方法一
print(name[6:len(name)]) #这里注意下，len（）方法是字符串的包含的个数，从1开始
#方法二
print(name[-5:])
# 方法三
print(name[6:]) # 意思是从6开始一直截取到最后
#方法四
print(name[6:11])
#方法五
print(name[-5:11])

上边代码，我着重说下，如果冒号后边没有数，会一直截取到最后；如果冒号后边是负数，负数表示步长的意思，从后向前走多少位。

补充：

字符串其他方法

1、lower（）、upper（）

lower（）：将字符串中的字母变为小写，且返回的是一个新字符串；

upper( ) : 将字符串的字母变为大写，返回的是一个新字符串；

print('吕星辰lxc'.upper()) #吕星辰LXC
print('吕星辰LXC'.upper()) #吕星辰lxc

2、title( )、capitalize( )

title( ) : 把字符串中所有单词的首字母大写，返回一个新的字符串

# _________________________________
# _________________________________

str1 = 'lxc吕星辰lxc'
new_str = str1.title()
print(new_str) #  Lxc吕星辰Lxc

# _________________________________
# _________________________________

str2 = 'lxc'
new_str = str2.title()
print(new_str) #  Lxc

# _________________________________
# _________________________________

str3 = 'lxc lxc'
new_str = str3.title()
print(new_str) #  Lxc Lxc

capitalize( ) : 只是把字符串中的首字母大写，其他所有的小写，返回一个新字符串

str1 = 'lxc吕星辰lxc'
new_str = str1.capitalize()
print(new_str) #  Lxc吕星辰lxc

# _________________________________

str3 = 'lxc lxc'
new_str = str3.capitalize()
print(new_str) #  Lxc lxc

3、swapcase

对所有字符串中的字母做大小写转换（原来大写变为小写，原来小写变为大写），返回新的字符串

str1 = 'lXc吕星辰Lxc'
new_str = str1.swapcase()
print(new_str) #  LxC吕星辰lXC

# _________________________________

str3 = 'Lxc Lxc'
new_str = str3.swapcase()
print(new_str) #  lXC lXC

4、count ( )

（1）返回字符串的子串在字符串中出现的次数，没有则返回0

str = 'abcdefagabc'
new_str = str.count('abc')
print(new_str) # 2

str = 'abc'
new_str = str.count('e')
print(new_str) # 0

（2）可以指定从字符串中的哪里开始，到哪里结束。都是字符串的索引位，左闭右开原则

str = 'abcdeab'
new_str = str.count('ab',0,2)
print(new_str) # 1

上边代码，从字符串索引位0开始，到字符串索引位2结束，不包含索引位2，字串ab出现过一次。

5、startswith( )、endswith（）

startswith（）：检查字符串是否以指定的子串开头，也可以指定搜索范围起始位和结尾，同样符合左闭右开原则

endswith（）：检查字符串是否以指定的子串结束，也可以指定搜索范围起始位和结尾，同样符合左闭右开原则

# 检查字符串是否以ab开始，是返回True，否返回False
str = 'abcdeab'
new_str = str.startswith('ab')
print(new_str) # True

#检查字符串是否以ab结束，是返回True，否返回False
str1 = 'abcdeab'
new_str = str1.endswith('ab')
print(new_str) # True

# 指定起始位和结尾的搜索边界,也是符合左闭右开的原则
str1 = 'abcdeab'
new_str = str1.startswith('cd',2,3)
print(new_str) # False

6、find( )、rfind（）、index（）、rindex（）

find（）：检查字符串是否包含子串，包含返回子串在字符串中的索引位，否则返回-1，也可以指定搜索的起始和终止位，符合左闭右开原则。index与find一样，只不过没找到会报错（VelueError）！！！

str = 'abcde'
print(str.find('ab')) # 0

str1 = 'abcde'
print(str.index('f')) #报错 substring not found

rfind（）和rindex（）：都是返回子串在字符串中的起始位，没找到：rfind返回的是-1，rindex会报错

str = 'abcde'
print(str.rfind('cde')) # 2

str1 = 'abcde'
print(str.rindex('ce')) #报错 substring not found

7、in

也可以使用in来判断子串是否包含在字符串中，返回True或False

str = 'abcde'
print('ab' in str) # True

8、replace

replace（old，new，count）：把原来字符串中的子串替换成新的子串，count指替换几次，不加count参数，默认是全部替换，如果未找到子串，则返回原字符串。

str = 'abcdeab'
new_str = str.replace('ab','AB')
print(new_str) # ABcdeAB

上边代码，新的子串AB把字符串中所有子串ab都替换了。

str = 'abcdeab'
new_str = str.replace('ab','AB',1)
print(new_str) # ABcdeab

上边代码，我们加上了替换次数一次，结果只有第一个子串ab被替换掉！！！

9、split

按指定分隔符，把字符串分割成多个短语 ，生成一个列表，同时可以指定最大分隔次数

#1
str = 'ab,cd,eab,sdf'
new_str = str.split(',')
print(new_str) # ['ab', 'cd', 'eab', 'sdf']

#2
str1 = '1 2 3'
new_str = str1.split(" ",maxsplit=1)
print(new_str)  # ['1', '2 3']

上边代码，第二个案例中，我们指定了最大分隔次数为1，所以以空格分隔str1只分隔了一次。。。

来个小demo，输入多个以空格隔开的整数，把这些整数转换成元组

r = input()
t = (r.split(" "))
print(tuple(t))

上边代码，经过split处理，字符串会被以逗号分隔成单个元素组成的列表，用tuple函数把list转化为元祖。

10、join

将可迭代对象中的字符串以一种方式连接起来，注意，Iterable中的元素必须全部为字符串才行。

#把字符串每个元素，用下横线连接
str = 'abcd'
new_str = '_'.join(str)
print(new_str) # 'a_b_c_d'

# 把元组每个元素，以中划线连接，并返回一个字符串
tuple = ("1","2",'3')
new_str = '-'.join(tuple)
print(new_str) # '1-2-3'

# 把set集合中每个元素，以&符号连接，并返回一个字符串
# 注意后边的打印结果，顺序是不确定的，因为未知！！！
set = set(['a','b','c'])
new_str = '&'.join(set)
print(new_str)  # 'c&b&a'

# 把字典中的每一个key，以冒号连接，并返回一个字符串
dict = {'name':'lxc','age':20}
new_str = ':'.join(dict)
print(new_str) # 'name:age'

# 把列表中的每一个元素，以逗号连接，并返回一个字符串
list = ['1','2','3']
new_str = ','.join(list)
print(new_str) # '1,2,3'

#数字不可迭代，所以报错
num = 123
new_str = '_'.join(num)
print(new_str) # 报错！ can only join an iterable

11、strip、lstrip、rstrip

strip（）：移除字符串两侧空格，也可以指定移除某个字符串；

lstrip（）：移除左侧空格；

rstrip（）：移除右侧空格；

str = ' lxc '
print(str.strip()) #lxc
print(str.lstrip()) #lxc 右侧有空格
print(str.rstrip())# lxc 左侧有空格

str = 'lxclxc'
print(str.strip('l')) # 'xclxc'

补充：

isdigit（）函数检测字符串中是否全部为数字，返回布尔值，无参数。

下边代码，需求把字符串中数字提取出来，可以用isdigit来判断是否是数字字符串。

s = '12lkjll2jlj3llj21l'
new_str = ''
for i in s:
    if i.isdigit():
        new_str += i
print(new_str) # 122321

# 也可以使用后边要讲到的filter高阶函数解决此问题！！！
s = '12lkjll2jlj3llj21l'
print(list(filter(lambda param: param.isdigit(),s)))
# ['1', '2', '2', '3', '2', '1']

（补充）小demo：

把电话号码中间4位数变为星号 *

tel_number = '13355458818' # 133****8818
def fn(param):
    if isinstance(param,str) and param.isdigit() and len(param) == 11:
        res = param.replace(param[3:7],'*'*4)
        return res
r = fn(tel_number)
print(r) # 133****8818

上边代码，函数里边的条件判断个人只是练习用的，正常来说，判断电话号码的规则应该用正则的！！！

以上就是字符串方法及使用，以后会不断更新完善。。。

上一篇：es6数据结构Map

下一篇：ES6数据结构Set

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯