Python学习笔记：numpy选择符合条件数据：select、where、choose、nonzero

转载

mob604757037cf3 2021-09-30 14:45:00

一、np.select函数

1.介绍

np.select 函数根据某些条件筛选某些元素。

使用语法为：

import numpy as np
np.select(condlist, choicelist, default=0)
# 返回列表

参数（必须写成“列表”的形式）：

condlist -- 操作数据所依据的条件
choiselist -- 根据condlist条件所需要执行的操作
default -- 不满足条件所执行的操作

2.传统循环方法

使用循环、条件判断的方法执行效率低下，可用 select 替代完成。

a = np.array([1,2,3,4,5,6,7,8,9,10])
result = []
for i in a:
    if i < 6:
        i += 10
    else:
        i = 100
    result.append(i)
print(result)
# [11, 12, 13, 14, 15, 100, 100, 100, 100, 100]

3.单条件

a = np.array([1,2,3,4,5,6,7,8,9,10])
result2 = np.select([a < 6], [a + 10], default=100)
print(result2)
# array([ 11,  12,  13,  14,  15, 100, 100, 100, 100, 100])

对应元素满足条件执行操作，否则返回默认值。

4.多条件、多操作

a = np.array([[1,2,3,4,5],
              [6,7,8,9,10],
              [11,12,13,14,15],
              [16,17,18,19,20],
              [21,22,23,24,25]])
b = np.array(range(25)).reshape(5,5) + 1
result2 = np.select([a<6, np.logical_and(a>10, a<16), a>20],
                    [a+10, a**2, a*10],
                    default=100)
result2
'''
array([[ 11,  12,  13,  14,  15],
       [100, 100, 100, 100, 100],
       [121, 144, 169, 196, 225],
       [100, 100, 100, 100, 100],
       [210, 220, 230, 240, 250]])
'''

每个条件中，对应为真才执行相应的操作，针对所有条件都不满足元素，执行默认值default。

# 同时满足
result3 = np.select([a<12, np.logical_and(a>10, a<16), a>20],
                    [a+10, a**2, a*10],
                    default=100)
result3
# 观察元素11
'''
array([[ 11,  12,  13,  14,  15],
       [ 16,  17,  18,  19,  20],
       [ 21, 144, 169, 196, 225],
       [100, 100, 100, 100, 100],
       [210, 220, 230, 240, 250]])
'''

同时满足多个条件下，优先执行条件一、条件二，依次选择。

二、np.where函数

1.介绍

np.where 函数实现满足条件，输出x，不满足条件输出y。

使用语法为：

np.where(condition, x, y)

2.提供3个参数

如果全部数组都是一维数组，则等价于：

[xv if c else yv for c, xv, yv in zip(condition, x, y)]

一维数组实例

a = np.arange(10) # array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
np.where(a, 1, -1) # array([-1,  1,  1,  1,  1,  1,  1,  1,  1,  1])
np.where(a > 5, a, a*10) # array([ 0, 10, 20, 30, 40, 50,  6,  7,  8,  9])

多维数组同样可以使用，取满足条件的对应元素。

condition = [[True, False],
            [True, True]]
x = [[1, 2], [3, 4]]
y = [[9, 8], [7, 6]]
np.where(condition, x, y)
'''
array([[1, 8],
       [3, 4]])
'''

3.仅有condition参数

缺失x和y参数的情况下，则输出满足条件（非0）元素的坐标，等价于 np.asarray(condition).nonzero() 。

# 广播机制 broadcast
a = np.array([2,4,6,8,10])
np.where(a > 5) # (array([2, 3, 4], dtype=int64),)
a[np.where(a > 5)] # array([ 6,  8, 10])

多维数组

a = np.arange(27).reshape(3,3,3)
'''
array([[[ 0,  1,  2],
        [ 3,  4,  5],
        [ 6,  7,  8]],

       [[ 9, 10, 11],
        [12, 13, 14],
        [15, 16, 17]],

       [[18, 19, 20],
        [21, 22, 23],
        [24, 25, 26]]])
'''

np.where(a > 5 )
'''
(array([0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2],
       dtype=int64),
 array([2, 2, 2, 0, 0, 0, 1, 1, 1, 2, 2, 2, 0, 0, 0, 1, 1, 1, 2, 2, 2],
       dtype=int64),
 array([0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2],
       dtype=int64))
'''

a[np.where(a >5)]
# array([ 6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26])

np.where 输出每个元素对应的坐标，原始数据是三维数组，则输出3个数组的tuple。

三、np.choose函数

1.介绍

同样的，np.choose 实现根据条件选择相关元素，相比较 for if else 执行效率更高。

使用语法：

np.choose(a, choices, out=None, mode='raise')

参数：

a -- int类型数组 0~(n-1)之间的数
choices -- 被操作的数组 与a同维度
out -- 可选 接收运算结果的数组
mode -- 默认raise 表示数组元素不能超过n
         clip 元素小于0 变为0 大于n-1 变为n-1
         wrap value mode n 余数

2.实操

当a和choices相同维数（1维）

result = np.array([0,0,0,0,0])
a = np.choose([4,2,1,3,0], [11,22,33,44,55], out=result)
print(a) # array([55, 33, 22, 44, 11])
print(result) # array([55, 33, 22, 44, 11])

元素个数代表的是 choices 中的索引 index。

当a和choices相同维数（2维）

d = np.choose([[4,2,1,3,0],[3,4,2,0,1],[0,2,1,4,3]],
             [[11,22,33,32,31],[44,55,66,65,64],[77,88,99,98,97],[111,222,333,332,331],[444,555,666,665,664]])
 
print(d)
'''
[[444  88  66 332  31]
 [111 555  99  32  64]
 [ 11  88  66 665 331]]
'''

内外层索引匹配。

当a的维数多于choices时

b = np.choose([[4,2,1,3,0],[3,4,2,0,1],[0,2,1,4,3]],[11,22,33,44,55])
print(b)
'''
[[55 33 22 44 11]
 [44 55 33 11 22]
 [11 33 22 55 44]]
'''

当a的维数少于choices时

c = np.choose([4,2,1,3,0],
              [[11,22,33,32,31],[44,55,66,65,64],[77,88,99,98,97],[111,222,333,332,331],[444,555,666,665,664]])
print(c) # [444  88  66 332  31]

choices 最外层索引index与a匹配，内层索引默认从0开始，0、1、2、3、4、5逐渐递增的。

鉴于此，choices的内层元素数量依然要与a的个数进行匹配才行，否则会报错。

四、np.nonzero函数

np.nonzero 函数用于得到数组中非零元素的位置（数组索引）。

返回的索引值数组是一个2维tuple数组，该tuple数组中包含一维的array数组。

x = np.array([[3, 0, 0], [0, 4, 0], [5, 6, 0]])
print(x)
np.nonzero(x) # (array([0, 1, 2, 2], dtype=int64), array([0, 1, 0, 1], dtype=int64))
x[np.nonzero(x)] # array([3, 4, 5, 6])

a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
np.nonzero(a > 3)
'''
(array([1, 1, 1, 2, 2, 2], dtype=int64),
 array([0, 1, 2, 0, 1, 2], dtype=int64))
'''