python 读取csv的某列 python中读取csv文件中的某些列

转载

数据小探 2023-08-30 21:08:56

文章标签 python 读取csv的某列数据分析 python 代码注释 CSV 文章分类 Python 后端开发

一.前言

有时，并不需要文件中所有的列。可以使用Python选取所需要的列。本文演示两种可以在CSV文件中选取特定的列的通用方法。

二.使用列索引值

在CSV文件中选取特定列的一种方法是使用列的索引值。当想要保留的列的索引值非常容易识别，或处理多个输入文件时，各个文件的列的位置一致时，这种方法非常有效。

python 读取csv的某列 python中读取csv文件中的某些列_数据分析

在给出的supplier_data.csv文件中，如果想保留供应商名称和成本这两列，可以通过使用其索引值，如row_list[0]和row_list[3]，将每一行的供应商名称和成本写入文件。

创建脚本

在文本编辑器中输入一段代码，然后将文件保存为6csv_reader_column_by_index.py:

#!/usr/bin/env python3

import sys
import csv

input_file = sys.argv[1]
output_file = sys.argv[2]

my_columns = [0, 3]

with open(input_file, 'r', newline='') as csv_in_file:
    with open(output_file, 'w', newline='') as csv_out_file:
        filereader = csv.reader(csv_in_file)
        filewriter = csv.writer(csv_out_file)
        for row_list in filereader:
            row_list_output = []
            for index_value in my_columns:
                row_list_output.append(row_list[index_value])
            filewriter.writerow(row_list_output)

脚本代码注释

my_columns = [0, 3]

这一行代码创建了列表变量my_columns，包含想要保留的两列的索引值。这样写的目的是，如果索引值需要改变，只需要在一个地方修改即可（即定义my_columns的地方）。

for row_list in filereader:
			row_list_output = [ ]
			for index_value in my_columns:
				row_list_output.append(row_list[index_value])
			filewriter.writerow(row_list_output)

这部分代码即为筛选的控制流语句。第一行是对整个输入文件的循环控制，即输入文件的每一行都要执行下面几行的代码。第二行创建了row_list_output列表变量，用来保存在每行中要保留的值。第三行则是在my_columns各个索引值之间进行迭代。第四行使用append函数根据第三行索引值的迭代填充元素。最后将row_list_output的列表值写入输出文件。然后进入下一次循环，直到循环结束。

运行脚本

在命令行输入以下命令，然后按回车键：

python 读取csv的某列 python中读取csv文件中的某些列_python 读取csv的某列_02

查看结果

python 读取csv的某列 python中读取csv文件中的某些列_代码注释_03

三.使用列标题

在CSV文件选取特定列的第二种方法是使用列标题。当想要保留的列的标题非常容易识别，或者在处理多个输入文件时，各个输入文件中列的位置会发生改变，但标题不变时，这种方法非常有效。

python 读取csv的某列 python中读取csv文件中的某些列_python_04

例如，在给出的supplier_data.csv文件中，想要保留发票号码列和购买日期列。利用列标题和与其对应的索引号，将每一行的发票号码和购买日期写入输出文件。

创建脚本

在文本编辑器中输入一列代码，然后将文件保存为：7csv_reader_column_by_name.py

#!/usr/bin/env python3

import sys
import csv

input_file = sys.argv[1]
output_file = sys.argv[2]

my_columns = ['Invoice Number', 'Purchase Date']
my_columns_index = []
with open(input_file, 'r', newline='') as csv_in_file:
    with open(output_file, 'w', newline='') as  csv_out_file:
        filereader = csv.reader(csv_in_file)
        filewriter = csv.writer(csv_out_file)
        header = next(filereader,None)
        for index_value in range(len(header)):
            if header[index_value] in my_columns:
                my_columns_index.append(index_value)
        for row_list in filereader:
            row_list_output = []
            for index_value in my_columns_index:
                row_list_output.append(row_list[index_value])
            filewriter.writerow(row_list_output)

脚本代码注释

my_columns = ['Invoice Number', 'Purchase Date']
my_columns_index = []

这两行代码，第一行创建了名为my_columns的列表变量，包含要保留的两列的名字。第二行创建了my_columns_index的空列表变量，为后面处理保留的两列的索引值。

header = next(filereader,None)

这一行代码是对标题行的处理，next函数与之前介绍的用法基本一致，在这里多了None，其作用是标记结尾。

for index_value in range(len(header)):
            if header[index_value] in my_columns:
                my_columns_index.append(index_value)

这几行代码是对要保留的两列的索引值进行处理。第一行用len函数求出标题行所含的元素个数，并结合range函数构建循环体。第二行使用if控制流语句判断每个列标题是否在my_columns中，即检验是否为想要保留的列。若第二行成立，第三行代码中使用append函数将该列索引值添加入my_columns_index列表中。

for row_list in filereader:
            row_list_output = []
            for index_value in my_columns_index:
                row_list_output.append(row_list[index_value])
            filewriter.writerow(row_list_output)

最后这部分代码为筛选的控制流语句，与前一种方法思路相同，这里就不进行赘述了。