awk使用详解（一）正规表达式、域和过滤

精选原创

mashaoli 2016-10-27 13:30:20 博主文章分类：系统编程精讲 ©著作权

文章标签 linux 系统编程 文章分类 运维

©著作权归作者所有：来自51CTO博客作者mashaoli的原创作品，谢绝转载，否则将追究法律责任

正规表达式是什么?

过滤命令和文件的输出结果, 用于编辑文本和配置文件等

正规表达式的特性

1. 普通字符例如空格, 下划线(_), A-Z, a-z, 0-9等.

2. 元字符：扩展的普通字符:

(.) 匹配一行内的单个字符.

(*)匹配零个或更多字符，并立即处理.

[ character(s) ] 匹配任意一个字符,使用 (-) 指出范围，例： [a-f], [1-5], 等.

^ 匹配开始.

$ 匹配结尾.

\ 转义字符.

文本过滤器 awk.：作为一个简单的命令行使用：

# awk 'script' filename

这里 'script' 是一个命令, 文件在后；如：读一个给定的行并重复所有行。 'script' 是在 '/pattern/ action'中。

Linux中如何使用awk

一个简单的用法:

打印文件内所有行：例打印/etc/hosts 的所有行.

# awk '//{print}'/etc/hosts

使用匹配模式:

给定 localhost 匹配, awk 将匹配/etc/hosts 文件中所有行

# awk '/localhost/{print}' /etc/hosts

Awk (.) 模式： (.)将匹配字符串，例：loc,localhost,localnet等.也可以：“l some_single_character c”.

# awk '/l.c/{print}' /etc/hosts

使用 (*) 匹配，如：localhost,localnet,lines,capable,等

# awk '/l*c/{print}' /etc/localhost

(*) 能检测最长的匹配.

使用 t*t 匹配 t 开始和 t i结束的字符串:

this is tecmint, where you get the best good tutorials, how to's, guides, tecmint.

使用 /t*t/:的输出结果：this is tthis is tecmintthis is tecmint, where you get t

this is t
this is tecmint
this is tecmint, where you get t
this is tecmint, where you get the best good t
this is tecmint, where you get the best good tutorials, how t
this is tecmint, where you get the best good tutorials, how tos, guides, t
this is tecmint, where you get the best good tutorials, how tos, guides, tecmint

使用awk 选择完成最后的选项:

this is tecmint, where you get the best good tutorials, how to's, guides, tecmint

使用 Awk 设置[ character(s) ]

设置 [al1], 匹配包含 a 或 l 或 1，在 /etc/host文件中的行.

# awk '/[al1]/{print}' /etc/hosts

以下为用 K 或 k 根着 T:

# awk '/[Kk]T/{print}' /etc/hosts

范围内的特定字符:

1. [0-9] 一个单一号码

2. [a-z] 一个小写字符

3. [A-Z] 一个大写字符

4. [a-zA-Z] 一个字符

5. [a-zA-Z 0-9] 一个字符和号码

举例:

# awk '/[0-9]/{print}' /etc/hosts

/etc/hosts 文件包含最少一个[0-9] 字符.

Awk 使用(^) 元字符

匹配所有的行:

# awk '/^fe/{print}' /etc/hosts

# awk '/^ff/{print}' /etc/hosts

Awk 使用 ($) 元字符

匹配所有的行:

# awk '/ab$/{print}' /etc/hosts

# awk '/ost$/{print}' /etc/hosts

# awk '/rs$/{print}' /etc/hosts

Awk 使用 (\) 转义字符

允许匹配一个控制字符

# awk '//{print}' deals.txt

# awk '/$25.00/{print}' deals.txt

# awk '/\$25.00/{print}' deals.txt

使用Awk打印域和文件的列

缺省的内部域 (IFS) 变量，IFS 指 tab 和 space，第一个域使用 $1, 第二个使用$2, 第三个使用 $3 和第四个到最后一个用(s).

Example 1:

# vi tecmintinfo.txt

# cat tecmintinfo.txt

文件内容如下：

Tecmint.com is the fastest linux online

从tecmintinfo.txt文件中，打印 first、second 、 third 域。

$ awk '//{print $1 $2 $3 }' tecmintinfo.txt

TecMint.comisthe

域根据 IFS定义打印:

1. Field one which is “TecMint.com” is accessed using $1.

2. Field two which is “is” is accessed using $2.

3. Field three which is “the” is accessed using $3.

.域没有分割，它根据缺省。

使用 (,) 号------显示一个空格

$ awk '//{print $1, $2, $3; }' tecmintinfo.txt

TecMint.com is the

重要的 ($) 注意与scripting脚本的不同.

shell 脚本 ($) 用来访问一个变量是指，而 Awk ($) 仅仅用于访问一个域.

Example 2: 多行调用_shoping.list.

No Item_ NameUnit _PriceQuantityPrice

1 Mouse #20,000 1 #20,000

2 Monitor #500,000 1 #500,000

3 RAM_Chips #150,000 2 #300,000

4 Ethernet_Cables #30,000 4 #120,000

仅打印 Unit_Price 项，用以下命令：

$ awk '//{print $2, $3 }' my_shopping.txt

Item_Name Unit_Price

Mouse #20,000

Monitor #500,000

RAM_Chips #150,000

Ethernet_Cables #30,000

Awk 还有 printf 命令，用于格式化你的输出

$ awk '//{printf "%-10s %s\n",$2, $3 }' my_shopping.txt

Item_Name Unit_Price

Mouse #20,000

Monitor #500,000

RAM_Chips #150,000

Ethernet_Cables #30,000

要点：使用 Awk 来过滤文本和字符串, 帮助你获得特定栏目数据。记住：($) 操作符与shell的不同

使用 Awk 过滤文本和字符串

有时, 过滤文件文本中某些行, 使用特定的模式匹配。用AWK是非常容易的。以下为：一个饰品商店的列表.

$ cat food_prices.list 
No	Item_Name		Quantity	Price
1	Mangoes			   10		$2.45
2	Apples			   20		$1.50
3	Bananas			   5		$0.90
4	Pineapples		   10		$3.46
5	Oranges			   10		$0.78
6	Tomatoes		   5		$0.55
7	Onions			   5            $0.45

$ awk '/ *\$[2-9]\.[0-9][0-9] */ { print $1, $2, $3, $4, "*" ; } / *\$[0-1]\.[0-9][0-9] */ { print ; }' food_prices.list

从上面的输出中，你可以看到，有一个（*）标志线的食品，有菠萝和芒果。如果你检查一下他们的价格，他们 2美元以上。

1. / *\$[2-9]\.[0-9][0-9] */ 获得食物价格大于2$的项

2. /*\$[0-1]\.[0-9][0-9] */ 查看食物价格大于2$的项.

域 (*) 作为旗，放在行结尾。第二个模式打印其他比2美元少的项

(*)标志没有格式化输出，不够清晰

1. 使用printf 命令：以下是长而无聊的

$ awk '/ *\$[2-9]\.[0-9][0-9] */ { printf "%-10s %-10s %-10s %-10s\n", $1, $2, $3, $4 "*" ; } / *\$[0-1]\.[0-9][0-9] */ { printf "%-10s %-10s %-10s %-10s\n", $1, $2, $3, $4; }' food_prices.list

2. 使用 $0 域. awk 采用可变 0 存储整个输入线。方便解决上述问题，它是简单和快速的如下：

$ awk '/ *\$[2-9]\.[0-9][0-9] */ { print $0 "*" ; } / *\$[0-1]\.[0-9][0-9] */ { print ; }' food_prices.list

结论

使用 Awk 命令：方便和快速.

下一篇：awk使用详解（二）比较操作符、复合表达式、next命令及sdin输入

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯