Python数据处理pandas、numpy等第三方库函数笔记（持续更新）

原创

小虎AI实验室 2023-04-30 07:56:09 ©著作权

文章标签 python numpy pandas 数组 Test 文章分类 JavaScript 前端开发

©著作权归作者所有：来自51CTO博客作者小虎AI实验室的原创作品，请联系作者获取转载授权，否则将追究法律责任

说明

因为在平时学习中，对于pandas、numpy等python库的一些函数用法时常忘记，特在此做个汇总与整理，便于下次查找~

numpy

nonzero(a)
：返回非零的数组下标
举例：

>>x = np.array([[1,0,0], [0,2,0], [1,1,0]])
 >>x
 array([[1, 0, 0],
 [0, 2, 0],
 [1, 1, 0]])
 >>np.nonzero(x)
 (array([0, 1, 2, 2], dtype=int64), array([0, 1, 0, 1], dtype=int64))

in1d
:Test whether each element of a 1-D array is also present in a second array.
Returns a boolean array the same length as ar1 that is True where an element of ar1 is in ar2 and False otherwise.
举例：

>> test = np.array([0, 1, 2, 5, 0])
 >> states = [0, 2]
 >> mask = np.in1d(test, states)
 >> mask
 array([ True, False, True, False, True], dtype=bool)
 >> test[mask]
 array([0, 2, 0])
 >> mask = np.in1d(test, states, invert=True)
 >> mask
 array([False, True, False, True, False], dtype=bool)
 >> test[mask]
 array([1, 5])

train_test_split

sklearn.model_selection.train_test_split(*arrays, **options)
举例：

>> import numpy as np
 >> from sklearn.model_selection import train_test_split
 >> X, y = np.arange(10).reshape((5, 2)), range(5)
 >> X
 array([[0, 1],
 [2, 3],
 [4, 5],
 [6, 7],
 [8, 9]])
 >> list(y)
 [0, 1, 2, 3, 4]>> X_train, X_test, y_train, y_test = train_test_split(
 … X, y, test_size=0.33, random_state=42)
 …
 >> X_train
 array([[4, 5],
 [0, 1],
 [6, 7]])
 >> y_train
 [2, 0, 3]
 >> X_test
 array([[2, 3],
 [8, 9]])
 >> y_test
 [1, 4]

GroupKFold

sklearn.model_selection.GroupKFold(n_splits=3)
举例：

>> from sklearn.model_selection import GroupKFold
 >> X = np.array([[1, 2], [3, 4], [5, 6], [7, 8]])
 >> y = np.array([1, 2, 3, 4])
 >> groups = np.array([0, 0, 2, 2])
 >> group_kfold = GroupKFold(n_splits=2)
 >> group_kfold.get_n_splits(X, y, groups)
 >> print(group_kfold)
 GroupKFold(n_splits=2)
 >> for train_index, test_index in group_kfold.split(X, y, groups):
 … print(“TRAIN:”, train_index, “TEST:”, test_index)
 … X_train, X_test = X[train_index], X[test_index]
 … y_train, y_test = y[train_index], y[test_index]
 … print(X_train, X_test, y_train, y_test)
 …
 TRAIN: [0 1] TEST: [2 3]
 [[1 2]
 [3 4]] [[5 6]
 [7 8]] [1 2] [3 4]
 TRAIN: [2 3] TEST: [0 1]
 [[5 6]
 [7 8]] [[1 2]
 [3 4]] [3 4] [1 2]
 ##shuffle()
 返回随机排序后的序列。
 numpy.random.shuffle(x)
 Examples>> arr = np.arange(10)
 >> np.random.shuffle(arr)
 >> arr
 [1 7 5 2 9 4 3 6 0 8]
 Multi-dimensional arrays are only shuffled along the first axis:>> arr = np.arange(9).reshape((3, 3))
 >> np.random.shuffle(arr)
 >> arr
 array([[3, 4, 5],
 [6, 7, 8],
 [0, 1, 2]])