导航

问题场景

1、使用Pandas处理文件,报错 ​​UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcf in position 0: invalid continuation by​

Traceback (most recent call last):
File "D:\workspace_py\pandas_learning\venv\lib\site-packages\pandas\util\_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "D:\workspace_py\pandas_learning\venv\lib\site-packages\pandas\io\parsers\readers.py", line 586, in read_csv
return _read(filepath_or_buffer, kwds)
File "D:\workspace_py\pandas_learning\venv\lib\site-packages\pandas\io\parsers\readers.py", line 482, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "D:\workspace_py\pandas_learning\venv\lib\site-packages\pandas\io\parsers\readers.py", line 811, in __init__
self._engine = self._make_engine(self.engine)
File "D:\workspace_py\pandas_learning\venv\lib\site-packages\pandas\io\parsers\readers.py", line 1040, in _make_engine
return mapping[engine](self.f, **self.options) # type: ignore[call-arg]
File "D:\workspace_py\pandas_learning\venv\lib\site-packages\pandas\io\parsers\c_parser_wrapper.py", line 69, in __init__
self._reader = parsers.TextReader(self.handles.handle, **kwds)
File "pandas\_libs\parsers.pyx", line 542, in pandas._libs.parsers.TextReader.__cinit__
File "pandas\_libs\parsers.pyx", line 734, in pandas._libs.parsers.TextReader._get_header
File "pandas\_libs\parsers.pyx", line 843, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas\_libs\parsers.pyx", line 1917, in pandas._libs.parsers.raise_parser_error
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcf in position 0: invalid continuation byte

2、目前的代码是这样的:

info = pd.read_csv("xxx.csv", delimiter=",", encoding="utf-8", names=["xxx","xxx"])

解决办法

其实这个问题非常容易解决:

方法一:只需要将endoding改成你的csv文件的编码格式就可以了。

方法二:将csv文件格式成你想要的格式,跟代码保持一致即可。

那查看csv文件的编码格式呢?

右击文件,使用Notepad++打开:

问题解决:UnicodeDecodeError: ‘utf-8‘ codec can‘t decode byte 0xcf in position 0: invalid continuation by_编码格式

查看右下角:

问题解决:UnicodeDecodeError: ‘utf-8‘ codec can‘t decode byte 0xcf in position 0: invalid continuation by_ico_02

修改代码:

info = pd.read_csv("xxx.csv", delimiter=",", encoding="gb2312", names=["xxx","xxx"])

这样子就不会报错了。当然,你也可以!

当然,你将csv文件换成其他格式也是可以的,比如改成utf-8格式:

问题解决:UnicodeDecodeError: ‘utf-8‘ codec can‘t decode byte 0xcf in position 0: invalid continuation by_pandas_03