Python numpy loadtxt 文件不存在返回 numpy loadtxt用法

转载

mob64ca140e4022 2024-01-03 15:36:13

1、使用loadtxt()加载数据

--loadtxt(fname, dtype, delimiter, converters, usecols)

当使用numpy中的loadtxt函数导入该数据集时，假设数据类型dtype为浮点型，但是很明显第五列的数据类型并不是浮点型。需要通过loadtxt()函数中的converters参数将第五列通过转换函数映射成浮点类型的数据。

----fname：文件路径。eg：C:/Dataset/iris.txt
----dtype：数据类型。eg：float、str等
----delimiter：分隔符。eg：‘，'
----converters：将数据列与转换函数进行映射的字典。eg：{1:fun}，含义是将第2列对应转换函数进行转换
----usecols：选取数据的列

from sklearn import svm
import numpy as np
from sklearn.model_selection import train_test_split
import pdb
import collections

def iris_type(s):
    it = {b'Iris-setosa': 0, b'Iris-versicolor': 1, b'Iris-virginica': 2}
    return it[s]


# 这里数据少可以一次读入，正常情况下添加读入数据
path = './iris.txt' # 数据文件路径
data = np.loadtxt(path, dtype=float, delimiter=',', converters={4: iris_type})
# print(data)

2、numpy.reshape(data, new shape, order='C')

--numpy.reshape(data, new shape, order='C')

----data：数组--需要处理的数据，可以使用numpy.reshape(data, new shape, order=' ') 或 data.reshape(new shape, order=)形式
----new shape：新的格式--整数或整数数组，如(2,3)表示2行3列，新的形状应该与原来的形状兼容，即行数和列数相乘后等于a中元素的数量
----order：可选范围为{‘C’, ‘F’, ‘A’}。使用索引顺序读取a的元素，并按照索引顺序将元素放到变换后的的数组中。如果不进行order参数的设置，默认参数为C。

order顺序解释：

（1）“C”指的是用类C写的读/索引顺序的元素，最后一个维度变化最快，第一个维度变化最慢。以二维数组为例，简单来讲就是横着读，横着写，优先读/写一行。
（2）“F”是指用FORTRAN类索引顺序读/写元素，最后一个维度变化最慢，第一个维度变化最快。竖着读，竖着写，优先读/写一列。注意，“C”和“F”选项不考虑底层数组的内存布局，只引用索引的顺序。
（3）“A”选项所生成的数组的效果与原数组a的数据存储方式有关，如果数据是按照FORTRAN存储的话，它的生成效果与”F“相同，否则与“C”相同。

import numpy as np


a = [[1, 2, 3, 4],
     [5, 6, 7, 8],
     [9, 10, 11, 12]]

b = np.reshape(a, (12), order='C')
print(b)
# [ 1  2  3  4  5  6  7  8  9 10 11 12]

c = np.reshape(a, (12), order='F')
print(c)
# [ 1  5  9  2  6 10  3  7 11  4  8 12]

d = np.reshape(a, (6, 2), order='C')
print(d)

e = np.reshape(a, (6, 2), order='F')
print(e)

# 排成一行
f = np.reshape(a, (-1))
print(f)

# 排成一列
g = np.reshape(a, (-1, 1))
print(g)

"""
[ 1  2  3  4  5  6  7  8  9 10 11 12]  ----  b
[ 1  5  9  2  6 10  3  7 11  4  8 12]  ----  c
[[ 1  2]
 [ 3  4]
 [ 5  6]
 [ 7  8]
 [ 9 10]
 [11 12]]  ----  d
[[ 1  3]
 [ 5  7]
 [ 9 11]
 [ 2  4]
 [ 6  8]
 [10 12]]  ----  e
[ 1  2  3  4  5  6  7  8  9 10 11 12]  ----  f
[[ 1]
 [ 2]
 [ 3]
 [ 4]
 [ 5]
 [ 6]
 [ 7]
 [ 8]
 [ 9]
 [10]
 [11]
 [12]]  ---- g
"""

3、numpy.asarray()

array和asarray都可将结构数据转换为ndarray类型。

但是主要区别就是当数据源是ndarray时，array仍会copy出一个副本，占用新的内存，但asarray不会。也就是说对于b = np.array(a) 和 c = np.asarray(a)来说，b是真的拷贝，如果a改变，b不会变，但是c会改变

4、numpy.split()

numpy.split(ary, indices_or_sections, axis=0)

ary:要切分的数组
indices_or_sections:如果是一个整数，就用该数平均切分，如果是一个数组，为沿轴切分的位置（左开右闭）
axis：沿着哪个维度进行切向，默认为0，横向切分。为1时，纵向切分

import numpy as np

data = [[5.1, 3.5, 1.4, 0.2, 'Iris-setosa'], [4.9, 3.0, 1.4, 0.2, 'Iris-setosa'], 
        [4.7, 3.2, 1.3, 0.2, 'Iris-setosa'], [7.0, 3.2, 4.7, 1.4, 'Iris-versicolor'], 
        [6.4, 3.2, 4.5, 1.5, 'Iris-versicolor'], [6.9, 3.1, 4.9, 1.5, 'Iris-versicolor'], 
        [5.5, 2.3, 4.0, 1.3, 'Iris-versicolor'], [6.3, 3.3, 6.0, 2.5, 'Iris-virginica'], 
        [5.8, 2.7, 5.1, 1.9, 'Iris-virginica'], [7.1, 3.0, 5.9, 2.1, 'Iris-virginica']]


x, y, w = np.split(data, [1, 3], axis = 1)
print(x)
print(y)
print(w)

"""
[['5.1']
 ['4.9']
 ['4.7']
 ['7.0']
 ['6.4']
 ['6.9']
 ['5.5']
 ['6.3']
 ['5.8']
 ['7.1']]
[['3.5' '1.4']
 ['3.0' '1.4']
 ['3.2' '1.3']
 ['3.2' '4.7']
 ['3.2' '4.5']
 ['3.1' '4.9']
 ['2.3' '4.0']
 ['3.3' '6.0']
 ['2.7' '5.1']
 ['3.0' '5.9']]
[['0.2' 'Iris-setosa']
 ['0.2' 'Iris-setosa']
 ['0.2' 'Iris-setosa']
 ['1.4' 'Iris-versicolor']
 ['1.5' 'Iris-versicolor']
 ['1.5' 'Iris-versicolor']
 ['1.3' 'Iris-versicolor']
 ['2.5' 'Iris-virginica']
 ['1.9' 'Iris-virginica']
 ['2.1' 'Iris-virginica']]
"""

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。