数组
awk的数组为关联数组
格式
array_name[index-expression]
范例:
weekdays["mon"]="Monday"
index-expression
利用数组,实现 k/v 功能 可使用任意字符串;字符串要使用双引号括起来 如果某数组元素事先不存在,在引用时,awk会自动创建此元素,并将其值初始化为“空串” 若要判断数组中是否存在某元素,要使用“index in array”格式进行遍历
范例:
[root@longwang ~]# awk 'BEGIN{weekdays["mon"]="Monday";weekdays["tue"]="Tuesday";print weekdays["mon"]}' Monday [root@longwang ~]# awk '!line[$0]++' /etc/fstab # # /etc/fstab # Created by anaconda on Fri Sep 4 21:10:02 2020 # Accessible filesystems, by reference, are maintained under '/dev/disk/'. # See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info. # After editing this file, run 'systemctl daemon-reload' to update systemd # units generated from this file. /dev/mapper/cl-root / xfs defaults 0 0 UUID=232316f3-6dff-487a-9fae-af211542097b /boot ext4 defaults 1 2 /dev/mapper/cl-data /data xfs defaults 0 0 /dev/mapper/cl-swap swap swap defaults 0 0 [root@longwang ~]# awk '{print !line[$0]++,$0,line[$0]}' /etc/fstab 1 1 1 # 1 1 # /etc/fstab 1 1 # Created by anaconda on Fri Sep 4 21:10:02 2020 1 0 # 2 1 # Accessible filesystems, by reference, are maintained under '/dev/disk/'. 1 1 # See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info. 1 0 # 3 1 # After editing this file, run 'systemctl daemon-reload' to update systemd 1 1 # units generated from this file. 1 0 # 4 1 /dev/mapper/cl-root / xfs defaults 0 0 1 1 UUID=232316f3-6dff-487a-9fae-af211542097b /boot ext4 defaults 1 2 1 1 /dev/mapper/cl-data /data xfs defaults 0 0 1 1 /dev/mapper/cl-swap swap swap defaults 0 0 1 [root@longwang ~]# awk '{!line[$0]++;print $0,line[$0]}' /etc/fstab 1 # 1 # /etc/fstab 1 # Created by anaconda on Fri Sep 4 21:10:02 2020 1 # 2 # Accessible filesystems, by reference, are maintained under '/dev/disk/'. 1 # See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info. 1 # 3 # After editing this file, run 'systemctl daemon-reload' to update systemd 1 # units generated from this file. 1 # 4 /dev/mapper/cl-root / xfs defaults 0 0 1 UUID=232316f3-6dff-487a-9fae-af211542097b /boot ext4 defaults 1 2 1 /dev/mapper/cl-data /data xfs defaults 0 0 1 /dev/mapper/cl-swap swap swap defaults 0 0 1
范例:判断数组索引是否存在
[root@longwang ~]# awk 'BEGIN{arr["i"]="x";arr["j"]="y";print "i" in arr, "y" in arr}' 1 0 [root@longwang ~]# awk 'BEGIN{arr["i"]="x";arr["j"]="y"; if("i" in arr){print "存在"}else{print "不存在"}}' 存在 [root@longwang ~]# awk 'BEGIN{arr["i"]="x";arr["j"]="y"; if("abc" in arr){print "存在"}else{print "不存在"}}' 不存在
若要遍历数组中的每个元素,要使用 for 循环
for(var in array) {for-body}
注意:var会遍历array的每个索引
范例:遍历数组
[root@longwang ~]# awk 'BEGIN{weekd["mon"]="Monday";weekd["tue"]="Tuesday";for(i in weekd){print i,weekd[i]}}' tue Tuesday mon Monday [root@longwang ~]# awk 'BEGIN{stude[1]="yuzong";stude[2]="kuanzong";stude[3]="lizong";for(x in stude){print x":"stude[x]}}' 1:yuzong 2:kuanzong 3:lizong [root@longwang ~]# awk 'BEGIN{a["x"]="welcome";a["y"]="to";a["z"]="tiantang";for(i in a){print i,a[i]}}' x welcome y to z tiantang [root@longwang ~]# awk -F: '{user[$1]=$3}END{for(i in user){print "username:"i,"uid: "user[i]}}' /etc/passwd username:longe uid: 1002 username:longwang uid: 1001 username:www uid: 1008 username:sshd uid: 74 username:systemd-resolve uid: 193 username:shutdown uid: 6 username:bin uid: 1 username:saslauth uid: 995 username:tss uid: 59 username:mail uid: 8 username:halt uid: 7 username:adm uid: 3
范例:显示主机的连接状态出现的次数
[root@longwang ~]# awk 'NR!=1{print $1}' access_log | sort| uniq -c |sort -nr| head -n3 4870 172.20.116.228 3429 172.20.116.208 2834 172.20.0.222 [root@iZwz98 ~]# ss -tan | awk 'NR!=1{state[$1]++}END{for(i in state){print i,state[i]}}' LISTEN 3 ESTAB 2 TIME-WAIT 1 [root@iZwz98 ~]# netstat -tan | awk '/^tcp/{state[$NF]++}END{for(i in state){print i,state[i]}}' LISTEN 3 ESTABLISHED 2
范例:
[root@centos8 ~]#awk '{ip[$1]++}END{for(i in ip){print i,ip[i]}}' /var/log/httpd/access_log 172.20.0.200 1482 172.20.21.121 2 172.20.30.91 29 172.16.102.29 864 [root@longwang ~]# awk '{ip[$1]++}END{for(i in ip){print ip[i],i}}' access_log | sort -nr | head -n 34870 172.20.116.228 3429 172.20.116.208 2834 172.20.0.222
范例:封掉查看访问日志中连接次数超过1000次的IP
[root@centos8 ~]#awk '{ip[$1]++}END{for(i in ip){if(ip[i]>=1000) {system("iptables -A INPUT -s "i" -j REJECT")}}}' nginx.access.log-20210424
范例:多维数组
[root@longwang ~]# awk 'BEGIN{ > arr[1][1]=11 > arr[1][2]=12 > arr[1][3]=13 > arr[2][1]=21 > arr[2][2]=22 > arr[2][3]=23 > for (i in arr) > for (j in arr[i]) > print arr[i][j] > }' 11 12 13 21 22 23
awk 函数
awk 的函数分为内置和自定义函数
官方文档
https://www.gnu.org/software/gawk/manual/gawk.html#Functions
常见内置函数
数值处理:
rand():返回0和1之间一个随机数 srand():配合rand() 函数,生成随机数的种子 int():返回整数
范例:
[root@longwang ~]# awk 'BEGIN{srand()print rand()}' awk: cmd. line:1: BEGIN{srand()print rand()} awk: cmd. line:1: ^ syntax error [root@longwang ~]# awk 'BEGIN{srand();print rand()}' 0.40168 [root@longwang ~]# awk 'BEGIN{srand();print rand()}' 0.0469583 [root@longwang ~]# awk 'BEGIN{srand();print rand()}' 0.994006 [root@longwang ~]# awk 'BEGIN{srand(); for (i=1;i<=10;i++)print int(rand()*100)}' 72 45 82 98 52 81 75 23 84 66
字符串处理:
length([s]):返回指定字符串的长度 sub(r,s,[t]):对t字符串搜索r表示模式匹配的内容,并将第一个匹配内容替换为s gsub(r,s,[t]):对t字符串进行搜索r表示的模式匹配的内容,并全部替换为s所表示的内容 split(s,array,[r]):以r为分隔符,切割字符串s,并将切割后的结果保存至array所表示的数组中,第 一个索引值为1,第二个索引值为2,…
范例: 统计用户名的长度
[root@longwang ~]# cut -d: -f1 /etc/passwd | awk '{print length()}' 4 3 6 3 2 4 8 4 4 8 5 3 6 4 16 15 [root@longwang ~]# awk -F: '{print length($1)}' /etc/fstab 0 1 12 38 1 74 76 1 74 33 1 75 93 75 75
范例:
[root@longwang ~]# echo "2020:08:08 08:08:08" | awk 'sub(/:/,"-",$1)' 2020-08:08 08:08:08 [root@longwang ~]# echo "2020:08:08 08:08:08" | awk '{sub(/:/,"-",$1);print $0}' 2020-08:08 08:08:08
范例
[root@longwang ~]# echo "2020:08:08 08:08:08" | awk 'gsub(/:/,"-",$0)' 2020-08-08 08-08-08 [root@longwang ~]# echo "2020:08:08 08:08:08" | awk '{gsub(/:/,"-",$0)print $0}' awk: cmd. line:1: {gsub(/:/,"-",$0)print $0} awk: cmd. line:1: ^ syntax error [root@longwang ~]# echo "2020:08:08 08:08:08" | awk '{gsub(/:/,"-",$0);print $0}' 2020-08-08 08-08-08
范例:
[root@centos8 ~]#netstat -tn | awk '/^tcp/{split($5,ip,":");count[ip[1]]++}END{for(i in count){print i,count[i]}}' 10.0.0.1 1 10.0.0.6 1 10.0.0.7 673
可以awk中调用shell命令
system('cmd')
空格是awk中的字符串连接符,如果system中需要使用awk中的变量可以使用空格分隔,或者说
除了awk的变量外其他一律用""引用起来
[root@longwang ~]# awk 'BEGIN{system("hostname")}' longwang.local [root@longwang ~]# awk 'BEGIN{score=100; system("echo your score is " score)}' your score is 100 [root@centos8 ~]#netstat -tn | awk '/^tcp/{split($5,ip,":");count[ip[1]]++}END{for(i in count){if(count[i]>=10) {system("iptables -A INPUT -s "i" -j REJECT")}}}'
时间函数
官方文档: 时间函数
https://www.gnu.org/software/gawk/manual/gawk.html#Time-Functions
systime() 当前时间到1970年1月1日的秒数 strftime() 指定时间格式
范例:
[root@longwang ~]# awk 'BEGIN{print systime()}' 1619259442 [root@longwang ~]# awk 'BEGIN{print strftime("%Y-%m-%dT%H:%M",systime()-3600)}' 2021-04-24T17:18
自定义函数
自定义函数格式:
function name ( parameter, parameter, ... ) { statements return expression }
范例:
[root@centos8 ~]#cat func.awk function max(x,y) { x>y?var=x:var=y return var } BEGIN{print max(a,b)} [root@centos8 ~]#awk -v a=30 -v b=20 -f func.awk 30
awk 脚本
将awk程序写成脚本,直接调用或执行
范例:
[root@centos8 ~]#cat passwd.awk {if($3>=1000)print $1,$3} [root@centos8 ~]#awk -F: -f passwd.awk /etc/passwd nobody 65534 wang 1000 long 1001
范例:
[root@centos8 ~]#cat test.awk #!/bin/awk -f #this is a awk script {if($3>=1000)print $1,$3} [root@centos8 ~]#chmod +x test.awk [root@centos8 ~]#./test.awk -F: /etc/passwd nobody 65534 wang 1000 long 1001
向awk脚本传递参数
格式:
awkfile var=value var2=value2... Inputfile
注意:在BEGIN过程中不可用。直到首行输入完成以后,变量才可用。可以通过-v 参数,让awk在执行
BEGIN之前得到变量的值。命令行中每一个指定的变量都需要一个-v参数
范例:
[root@longwang ~]# cat test2.awk #!/bin/awk -f {if($3 >=min && $3 <=max)print $1,$3} [root@longwang ~]# chmod +x test2.awk [root@longwang ~]# ./test2.awk -F: min=100 max=200 /etc/passwd systemd-resolve 193
范例: 检查出最近一小时内访问nginx服务次数超过3次的客户端IP
[root@longwang ~]# vim check_nginx_log.awk #!/usr/bin/awk -f BEGIN{ #定义一个小时前的时间,并格式化日期格式 beg=strftime("%Y-%m-%dT%H:%M",systime()-3600); # 定义结束时间 end=strftime("%Y-%m-%dT%H:%M",systime()-60); } # 定义取这个时间段内的日志 $> beg && $4 < end{ # 利用ip当做数组下标,次数当做数组内容 count[$12]+=1; } END{ # 结束从数组取数据代表数组的下标,也就是ip for(i in count){ # 如果次数大于3次,做操作 if(count[i]>3){ print count[i]" "i; } } } [root@longwang ~]# awk -F'"' -f check_nginx_log.awk /apps/nginx/logs/acces.log