pytorch 读取 txt pytorch 读取hive数据库

转载

码海无压 2023-11-09 10:01:44

文章标签 pytorch 读取 txt pytorch python 人工智能深度学习 文章分类 PyTorch 人工智能

提示：本文记录如何根据excel中的患者信息，加载相应的图像和标签。

前言

很多时候患者的信息被存在excel表格里面，患者的图片存在另一批文件夹中。此时，如果需要训练一个深度学习模型，我们首先需要根据患者的图片名称找到相应excel中的患者信息，如标签（是或否患病）。

一、自定义一个Classdataset

定义一个类，包含如何加载数据、标签以及划分训练验证样本

二、使用步骤

1.引入相关库

import os
import cv2
import csv

2.定义dataset类

代码如下（示例）：

自己使用时，需要自定义图片文件夹所在路径，和找到excel中标签和患者信息所处的列

class dataset():
    """
   建立读取数据的类
    """

    def __init__(self,
                 img_transforms, # 预处理
                 dataset_root, # 标签所在路径
                 label_file='tzcmetadata.csv',  # 标签文件-excel表格
                 num_classes=2, # 类别总数
                 target_column=14,  # excel中标签所在的列数
                 mode='train'): # 训练模式，如果是"val"，表示验证模式
        self.dataset_root = dataset_root
        self.img_transforms = img_transforms
        self.mode = mode.lower()
        self.num_classes = num_classes
        labelcsv = dataset_root + '/' + label_file # 标签文件
        self.file_list_0 = [] # 存放左眼文件列表
        self.file_list_1 = [] # 存放右眼文件列表
        self.label_list = [] # 存放标签
        
        with open(labelcsv,'r') as f: # 读取excel中的数据
            reader = csv.reader(f)
            next(reader)
            for line in reader: # 遍历excel的每一行
                patientid = line[1] # 获取患者id
                label = int(line[target_column]) # 获取患者标签
                filelist = os.listdir(dataset_root+'/OCT-Enface') # 返回指定文件夹的文件或名字列表（里面是患者的图像） 
                count = 0
                tmp1 = None
                tmp0 = None
                for file in filelist: # 遍历存在患者图片的文件夹
                    if file.startswith(patientid[1:]+'-0'): # 找到与excel的患者序号相对应图像
                        tmp0 = dataset_root + '/OCT-Enface/Fundus_Checked/' + file   
                        count += 1
                        if (count == 2) and (tmp1 is not None): # 只有患者的左右眼图像都被找到时，才加入训练数据
                            self.file_list_0.append(tmp0)
                            self.file_list_1.append(tmp1)
                            break
                    elif file.startswith(patientid[1:]+'-1'):
                        tmp1 = dataset_root + '/OCT-Enface/Fundus_Checked/' + file      
                        count += 1
                        if (count == 2) and (tmp0 is not None):
                            self.file_list_1.append(tmp1)
                            self.file_list_0.append(tmp0)
                            break
                if count == 2:
                    self.label_list.append(label)
                    
        total = len(self.label_list) # 统计符合的标签数目
        
        if self.mode == 'train': # 划分训练和验证集  8：2
            self.label_list = self.label_list[:int(total*0.8)]
            self.file_list_0 = self.file_list_0[:int(total*0.8)]
            self.file_list_1 = self.file_list_1[:int(total*0.8)]
        else:
            self.label_list = self.label_list[int(total*0.8):]
            self.file_list_0 = self.file_list_0[int(total*0.8):]
            self.file_list_1 = self.file_list_1[int(total*0.8):]
        
        print(self.mode,'Total:',len(self.label_list),sum(self.label_list),max(self.label_list))

    def __getitem__(self, idx):  # 数据集本质应当是所有数据样本的一个列表，因此每个样本都有对应的索引index。我们取用一个样本最简单的方式就是用该样本的index从数据列表中把它取出来。
        label = self.label_list[idx]
        file0 = self.file_list_0[idx]
        file1 = self.file_list_1[idx]
        img0 = cv2.imread(file0)
        img1 = cv2.imread(file1)
        if img0 is None:
            print('Error:',file0)
        if img1 is None:
            print('Error:',file1)
        if self.img_transforms is not None:
            img0 = self.img_transforms(img0)
        if self.img_transforms is not None:
            img1 = self.img_transforms(img1)
        return img0, img1, label
    
    def __len__(self):
        return len(self.label_list)