issue 连接


​https://github.com/numpy/numpy/issues/3477​


今天使用np.genfromtxt加载一个5G不到的csv文件,把我30G内存+10Gswap都干满了还是报了out of memory.

一开始以为是open方法慢呢,原来是np.genfromtxt

一定要注意!!!!

np.genfromtxt非常的慢,并且需要读取文件的10倍内存

有问题的代码实例

with open(bin_file, 'r') as f:
#读取5G的文件居然需要10倍大小的内存!!!!!
csv = np.genfromtxt(f, delimiter=",", dtype='float32')
csv = csv.reshape((count, shape_x, shape_y, shape_c))
print(csv.shape)
for i in range(count):
dset[i,:,:,:] = csv[i, :,:,:]

替换方案

使用np.loadtxt

def write_h5(file):
print("load file ", file)
#csv = np.genfromtxt(f, delimiter=",", dtype='float32')
csv = np.loadtxt(file, delimiter=',')
print("load completed! ")

csv = csv.reshape((total_count, shape_x, shape_y, shape_c))

h5f = h5py.File(file_name, 'w')
print("create h5file ", file_name)
h5f.create_dataset(name='data', data=csv)
print("create h5file dataset")
h5f.close()