文章目录
- 前言
- 一、编码格式相互转换
- 1.GBK 转换为 UTF8
- 2.将源文件夹复制至目标文件夹
- 3.将源文件夹复制至目标文件夹并且将文件从 GBK 转换为 UTF8 编码
- 总结
前言
最近因为工作的需要,将指定文件夹的源文件从 GBK 转为 UTF-8 编码格式。为了提高工作效率,第一时间就想到了使用 python 实现,为此记录一下,同时也希望这篇文章能帮助到更多的小伙伴。
一、编码格式相互转换
1.GBK 转换为 UTF8
部分代码如下(示例):
class CCopyFile:
def __init__(self, src, dst):
def ReadFile(filePath, encoding=""):
with codecs.open(filePath, "rb", encoding) as f:
return f.read()
def WriteFile(filePath, contents, encoding=""):
with codecs.open(filePath, "wb", encoding) as f:
f.write(contents)
def UTF8_2_GBK(src, dst):
contents = ReadFile(src, encoding="utf-8")
WriteFile(dst, contents, encoding="gb18030")
def GBK_2_UTF8(src, dst):
contents = ReadFile(src, encoding="gb18030")
WriteFile(dst, contents, encoding="utf-8")
def CopyFile(src, dst):
with open(src, 'rb') as readStream:
contents = readStream.read()
with open(dst, 'wb') as writeStream:
writeStream.write(contents)
'''
匹配后缀,只保存所选的文件格式,并调用 GBK_2_UTF8。
若要保存全部文件,则注释该句直接调用 CopyFile。
注:
1. GBK_2_UTF8 复制文件,并且将编码格式从GBK转为UTF-8
2. CopyFile 直接复制文件,保留源文件的编码格式
'''
if src.split('.')[-1] in postfix:
GBK_2_UTF8(src, dst)
else:
CopyFile(src, dst)
2.将源文件夹复制至目标文件夹
部分代码如下(示例):
# 将源文件夹整体复制到目标文件夹
def CopyDir(srcPath, targetPath):
if os.path.isdir(srcPath) and os.path.isdir(targetPath):
filelist_src = os.listdir(srcPath)
for file in filelist_src:
path = os.path.join(os.path.abspath(srcPath), file)
if os.path.isdir(path):
path1 = os.path.join(os.path.abspath(targetPath), file)
if not os.path.exists(path1):
os.mkdir(path1)
CopyDir(path, path1)
else:
path1 = os.path.join(targetPath, file)
CCopyFile(path, path1)
return True
else:
return False
3.将源文件夹复制至目标文件夹并且将文件从 GBK 转换为 UTF8 编码
完整代码如下:
import os
import codecs
# 设置路径
srcPath = r'D:\share\python_study\srcCode'
targetPath = r'D:\share\python_study\out'
# 设置要保存的文件格式
postfix = set(['h', 'c'])
class CCopyFile:
def __init__(self, src, dst):
def ReadFile(filePath, encoding=""):
with codecs.open(filePath, "rb", encoding) as f:
return f.read()
def WriteFile(filePath, contents, encoding=""):
with codecs.open(filePath, "wb", encoding) as f:
f.write(contents)
def UTF8_2_GBK(src, dst):
contents = ReadFile(src, encoding="utf-8")
WriteFile(dst, contents, encoding="gb18030")
def GBK_2_UTF8(src, dst):
contents = ReadFile(src, encoding="gb18030")
WriteFile(dst, contents, encoding="utf-8")
def CopyFile(src, dst):
with open(src, 'rb') as readStream:
contents = readStream.read()
with open(dst, 'wb') as writeStream:
writeStream.write(contents)
'''
匹配后缀,只保存所选的文件格式,并调用 GBK_2_UTF8。
若要保存全部文件,则注释该句直接调用 CopyFile。
注:
1. GBK_2_UTF8 复制文件,并且将编码格式从GBK转为UTF-8
2. CopyFile 直接复制文件,保留源文件的编码格式
'''
if src.split('.')[-1] in postfix:
GBK_2_UTF8(src, dst)
else:
CopyFile(src, dst)
# 将源文件夹整体复制到目标文件夹
def CopyDir(srcPath, targetPath):
if os.path.isdir(srcPath) and os.path.isdir(targetPath):
filelist_src = os.listdir(srcPath)
for file in filelist_src:
path = os.path.join(os.path.abspath(srcPath), file)
if os.path.isdir(path):
path1 = os.path.join(os.path.abspath(targetPath), file)
if not os.path.exists(path1):
os.mkdir(path1)
CopyDir(path, path1)
else:
path1 = os.path.join(targetPath, file)
CCopyFile(path, path1)
return True
else:
return False
if __name__ == '__main__':
nRet = CopyDir(srcPath, targetPath)
if nRet:
print('Copy Dir OK!')
else:
print('Copy Dir Failed!')
总结
以上就是今天要讲的内容,本文仅仅简单介绍了文件从GBK转为UTF-8 编码的使用。
如果对自动检测文件编码并实现目标编码转换,感兴趣的小伙伴,可以点击这里[源码+工具]:python + tkinter 图形化,文件编码格式自动转换工具
工具效果如图所示: