前言:
处理JSON 数据的神器: JMESPath (一) 入门篇 文中介绍了, jmespath的基本使用方法, 如何使用 . 和[ ] 抽取json文档中的目标数据, 管道符 | 的作用和多字段处理 [ ] { }
本文继续介绍 JMESPath 的高阶用法, 包括数据过滤 和使用内置函数进行数据转换
1. 数据过滤
对于列表数据, jmespath支持基于另一个表达式比较来过滤数据元素的方法. 语法为 [?expression] ,
- == != < <= > >=
- || && ( )
- ` ` ' '
In [1]: from jmespath import search
In [2]: data = {
...: "locations": [
...: {"name": "Seattle", "state": "WA", "size": 83.78},
...: {"name": "New York", "state": "NY", "size": 302.6},
...: {"name": "Bellevue", "state": "WA", "size": 37.51},
...: {"name": "Olympia", "state": "WA", "size": 20.09}
...: ]
...: }
In [3]: search("locations[?state == 'WA']", data)
Out[3]:
[{'name': 'Seattle', 'state': 'WA', 'size': 83.78},
{'name': 'Bellevue', 'state': 'WA', 'size': 37.51},
{'name': 'Olympia', 'state': 'WA', 'size': 20.09}]
In [4]: search("locations[?(state=='WA' && name=='Olympia') || state=='NY'] ", data)
Out[4]:
[{'name': 'New York', 'state': 'NY', 'size': 302.6},
{'name': 'Olympia', 'state': 'WA', 'size': 20.09}]
在上面的表达式中,?state == 'WA'部分是一个筛选表达式。它将判断locations字段对应的列表中state字段是否等于WA, 只有返回为True的数据, 才会被抓取。
第二次解析, ?(state=='WA' && name=='Olympia') || state=='NY' 表达式组合了判断和逻辑运算, 抓取state==WA且name是Olympia的数据, 或者state是NY的数据, 可以看到返回的数据正是这两条.
为了解释转义运算符,我们需要修改一下data数据, 增加一条{"name": "TEST", "state": "TEST"}的数据
In [1]: from jmespath import search
In [2]: data = {
...: "locations": [
...: {"name": "TEST", "state": "TEST", "size": 20.09},
...: {"name": "New York", "state": "NY", "size": 302.6},
...: {"name": "Bellevue", "state": "WA", "size": 37.51}
...: ]
...: }
In [3]: search("locations[?name == state]", data)
Out[3]: [{'name': 'TEST', 'state': 'TEST', 'size': 20.09}]
In [4]: search("locations[?size>`40`]", data)
Out[4]: [{'name': 'New York', 'state': 'NY', 'size': 302.6}]
筛选的表达式中, 没有引号的数据, 标识的是目标数据中对应字段的值, 所以表达式 ?name == state 筛选的的是 locations中 name和state相等的数据. 包括之前想要筛选state == 'WA' 的数据, 需要用 ' ' 对WA进行转义. 如果比较运算需要对数字进行操作, 需要用到另一个转义符 ` ` ,?size>`40`表达式, 筛选的size大于40的数据.
jmespath 还支持应用内置函数对数据进行筛选,
In [5]: search("locations[?contains(name, 'New')]", data)
Out[5]: [{'name': 'New York', 'state': 'NY', 'size': 302.6}]
如表达式 ?contains(name, 'New'), 筛选name字段中包含'New'字符的数据.
2. 内置函数
jmespath 提供了丰富的内置函数, 支持对数据的简单处理操作. 包括格式转换, 数据断言, 求值等常用的功能. 函数参数中一个特殊字符 @ 将当前结果传递给函数, 类似于Python中的self, 支持的函数如下:
通用 | type | not_null | length | to_array | to_string | to_number | ||
断言 | contains | starts_with | ends_with | |||||
数字 | abs | ceil | floor | |||||
列表 | avg | min | max | sum | sort | reverse | map | join |
字典 | min_by | max_by | sort_by | mege | keys | values |
下面对内置函数进行一个全面的介绍:
2.1 通用函数
2.1.1 type
返回对应数据的数据类型, 示例:
Given | Expression | Result |
“foo” | type(@) | “string” |
true | type(@) | “boolean” |
false | type(@) | “boolean” |
null | type(@) | “null” |
123 | type(@) | "number" |
123.05 | type(@) | "number" |
["abc"] | type(@) | “array” |
{"abc": "123"} | type(@) | “object” |
2.1.2 not_null
返回未解析为非null的第一个参数。
Given | Expression | Result |
{"a": null, "b": null, "c": [], "d": "foo"} | not_null(no_exist, a, b, c, d) | [] |
{"a": null, "b": null, "c": [], "d": "foo"} | not_null(a, b, `null`, d, c) | "foo" |
{"a": null, "b": null, "c": [], "d": "foo"} | not_null(a, b) | null |
2.1.3 length
返回数据的长度, 示例:
Given | Expression | Result |
n/a | length(`abc`) | 3 |
“current” | length(@) | 7 |
“current” | length(not_there) | <error: invalid-type> |
["a", "b", "c"] | length(@) | 3 |
[] | length(@) | 0 |
{} | length(@) | 0 |
{"foo": "bar", "baz": "bam"} | length(@) | 2 |
2.1.4 to_array
将数据转换成数组类型, 示例:
Expression | Result |
to_array(`[1, 2]`) | [1, 2] |
to_array(`"string"`) | ["string"] |
to_array(`0`) | [0] |
to_array(`true`) | [true] |
to_array(`{"foo": "bar"}`) | [{"foo": "bar"}] |
2.1.5 to_string
Expression | Result |
to_string(`2`) | "2" |
2.1.6 to_number
2.2 断言类函数
2.2.1 contains
判断目标数据是否包含特定字符, 示例:
Given | Expression | Result |
n/a | contains(`foobar`, `foo`) | true |
n/a | contains(`foobar`, `not`) | false |
n/a | contains(`foobar`, `bar`) | true |
n/a | contains(`false`, `bar`) | <error: invalid-type> |
n/a | contains(`foobar`, 123) | false |
["a", "b"] | contains(@, `a`) | true |
["a"] | contains(@, `a`) | true |
["a"] | contains(@, `b`) | false |
["foo", "bar"] | contains(@, `foo`) | true |
["foo", "bar"] | contains(@, `b`) | false |
2.2.2 starts_with
判断目标数据是否以特定字符开头, 示例:
Given | Expression | Result |
foobarbaz | starts_with(@, ``foo)`` | true |
foobarbaz | starts_with(@, ``baz)`` | false |
foobarbaz | starts_with(@, ``f)`` | true |
2.2.3 ends_with
判断目标数据是否以特定字符开头, 示例:
Given | Expression | Result |
foobarbaz | ends_with(@, ``baz)`` | true |
foobarbaz | ends_with(@, ``foo)`` | false |
foobarbaz | ends_with(@, ``z)`` | true |
2.3 求值类函数
2.3.1 对数字求值 abs, ceil, floor
内置函数, abs求绝对值, ceil向上取整, floor 向下取整
Expression | Result |
abs(1) | 1 |
abs(-1) | 1 |
abs(`abc`) | <error: invalid-type> |
Expression | Result |
ceil(`1.001`) | 2 |
ceil(`1.9`) | 2 |
ceil(`1`) | 1 |
ceil(`abc`) | null |
Expression | Result |
floor(`1.001`) | 1 |
floor(`1.9`) | 1 |
floor(`1`) | 1 |
2.3.2 列表求值 avg, min, max, sum
求平均值, 最大值和最小值, 求和, 示例:
Given | Expression | Result |
[10, 15, 20] | avg(@) | 15 |
[10, false, 20] | avg(@) | <error: invalid-type> |
[false] | avg(@) | <error: invalid-type> |
false | avg(@) | <error: invalid-type> |
Given | Expression | Result |
[10, 15] | min(@) | 10 |
["a", "b"] | min(@) | “a” |
["a", 2, "b"] | min(@) | <error: invalid-type> |
[10, false, 20] | min(@) | <error: invalid-type> |
Given | Expression | Result |
[10, 15] | max(@) | 15 |
["a", "b"] | max(@) | “b” |
["a", 2, "b"] | max(@) | <error: invalid-type> |
[10, false, 20] | max(@) | <error: invalid-type> |
Given | Expression | Result |
[10, 15] | sum(@) | 25 |
[10, false, 20] | max(@) | <error: invalid-type> |
[10, false, 20] | sum([].to_number(@)) | 30 |
[] | sum(@) | 0 |
2.3.3 列表求值 sort, reverse, map, join
对列表排序, 逆序, 映射, 聚合成字符串, 示例:
Given | Expression | Result |
[b, a, c] | sort(@) | [a, b, c] |
[1, a, c] | sort(@) | [1, a, c] |
[false, [], null] | sort(@) | [[], null, false] |
[[], {}, false] | sort(@) | [{}, [], false] |
{"a": 1, "b": 2} | sort(@) | null |
false | sort(@) | null |
Given | Expression | Result |
[0, 1, 2, 3, 4] | reverse(@) | [4, 3, 2, 1, 0] |
[] | reverse(@) | [] |
["a", "b", "c", 1, 2, 3] | reverse(@) | [3, 2, 1, "c", "b", "a"] |
"abcd | reverse(@) | dcba |
Given | Expression | Result |
{"array": [{"foo": "a"}, {"foo": "b"}, {}, [], {"foo": "f"}]} | map(&foo, array) | ["a", "b", null, null, "f"] |
[[1, 2, 3, [4]], [5, 6, 7, [8, 9]]] | map(&[], @) | [[1, 2, 3, 4], [5, 6, 7, 8, 9]] |
Given | Expression | Result |
["a", "b"] | join(`, `, @) | “a, b” |
["a", "b"] | join(``, @)`` | “ab” |
["a", false, "b"] | join(`, `, @) | <error: invalid-type> |
[false] | join(`, `, @) | <error: invalid-type> |
2.3.4 字典求值 min_by, max_by, sort_by
根据字典中的key 求最大值,最小值和排序
Expression | Result |
min_by(people, &age) | {"age": 10, "age_str": "10", "bool": true, "name": 3} |
min_by(people, &age).age | 10 |
min_by(people, &to_number(age_str)) | {"age": 10, "age_str": "10", "bool": true, "name": 3} |
min_by(people, &age_str) | <error: invalid-type> |
min_by(people, age) | <error: invalid-type> |
Expression | Result |
max_by(people, &age) | {"age": 50, "age_str": "50", "bool": false, "name": "d"} |
max_by(people, &age).age | 50 |
max_by(people, &to_number(age_str)) | {"age": 50, "age_str": "50", "bool": false, "name": "d"} |
max_by(people, &age_str) | <error: invalid-type> |
max_by(people, age) | <error: invalid-type> |
Expression | Result |
sort_by(people, &age)[].age | [10, 20, 30, 40, 50] |
sort_by(people, &age)[0] | {"age": 10, "age_str": "10", "bool": true, "name": 3} |
sort_by(people, &to_number(age_str))[0] | {"age": 10, "age_str": "10", "bool": true, "name": 3} |
2.3.5 字典求值 mege, keys, values
合并字典, 求字典的键数组, 字典的值数据
Expression | Result | |
merge(`{"a": "b"}`, `{"c": "d"}`) | {"a": "b", "c": "d"} | |
merge(`{"a": "b"}`, `{"a": "override"}`) | {"a": "override"} | |
merge(`{"a": "x", "b": "y"}`, `{"b": "override", "c": "z"}`) | {"a": "x", "b": "override", "c": "z"} | |
Given | Expression | Result |
{"foo": "baz", "bar": "bam"} | keys(@) | ["foo", "bar"] |
{} | keys(@) | [] |
false | keys(@) | <error: invalid-type> |
[b, a, c] | keys(@) | <error: invalid-type> |
Given | Expression | Result |
{"foo": "baz", "bar": "bam"} | values(@) | ["baz", "bam"] |
["a", "b"] | values(@) | <error: invalid-type> |
false | values(@) | <error: invalid-type> |
总结:
本文介绍了jmespath的高阶用法, 包括数据过滤和内置函数的介绍. 可以通过表达式对目标JSON数据进行筛选, 并将抽取出的结果用函数进行转换, 提供了更高级的处理数据的能力.