Shell脚本攻略读书笔记十三之文件类型统计【Enumerating file type statistics】

1.需求

统计某个路径下的各个类型文件的数目

2.代码

代码如下:

#!/bin/bash
# Filename: filestat.sh
if [ $# -ne 1 ];
then
echo "Usage is $0 basepath";
exit
fi
path=$1

declare -A statarray;
while read line;
do
ftype=`file -b "$line" | cut -d, -f1`
let statarray["$ftype"]++;
done< <(find $path -type f -print)
echo ============ File types and counts =============
for ftype in "${!statarray[@]}";
do
echo $ftype : ${statarray["$ftype"]}
done

需要注意的是,这个filestat.sh如果使用sh -n filestat.sh检验语法时,会抛出一个错误:

[root@server4 shells]# sh -n filestat.sh
filestat.sh: line 18: syntax error near unexpected token `<'
filestat.sh: line 18: `done < <(find $path -type f -print)'

但是实际执行时是不会遇到这个错误的(大致的原因是:不应该使用shell 去检测一个bash脚本)【不应该认为/bin/sh在任何地方都是bash。一些Linux发行版使用更加简单的shells(诸如dash,或者是ksh)作为/bin/sh;如果你想使用bash-具体的扩展诸如<(pipeline)语法,你应该明确地使用bash作为shell。】

3.运行结果

[root@server4 shells]# ./filestat.sh /root/shells/
===file type and counts=====
POSIX shell script:1
Bourne-Again shell script:15
ASCII text:13
empty:5
UTF-8 Unicode text:1
a /usr/bin/expect script:1

4.分析

  • 查看文件类型:file filename
options 
-b --brief :Do not prepend filenames to output lines(brief mode). 

所以如下两种使用方式是一样的,example:

[root@server4 ~]# file --brief zookeeper-3.4.11.tar.gz 
gzip compressed data, from Unix, last modified: Sun Jul  8 06:25:43 2018

[root@server4 ~]# file -b zookeeper-3.4.11.tar.gz 
gzip compressed data, from Unix, last modified: Sun Jul  8 06:25:43 2018
  • 声明一个数组变量:declare -A
 -A action
                      The action may be one of the following to generate a list of possible completions:
                      这个参数可能是以下动作之一,为了生成一个可能完成的列表。
                      
                      alias   Alias names.  May also be specified as -a.
                      arrayvar
                              Array variable names.
                      binding Readline key binding names.
                      builtin Names of shell builtin commands.  May also be specified as -b.
                      command Command names.  May also be specified as -c.
                      directory
                              Directory names.  May also be specified as -d.
                      disabled
                              Names of disabled shell builtins.
                      enabled Names of enabled shell builtins.
                      export  Names of exported shell variables.  May also be specified as -e.
                      file    File names.  May also be specified as -f.
                      function
                              Names of shell functions.
                      group   Group names.  May also be specified as -g.
                      helptopic
                              Help topics as accepted by the help builtin.
                      hostname
                              Hostnames, as taken from the file specified by the HOSTFILE shell variable.
                      job     Job names, if job control is active.  May also be specified as -j.
                      keyword Shell reserved words.  May also be specified as -k.
                      running Names of running jobs, if job control is active.
                      service Service names.  May also be specified as -s.
                      setopt  Valid arguments for the -o option to the set builtin.
                      shopt   Shell option names as accepted by the shopt builtin.
                      signal  Signal names.
                      stopped Names of stopped jobs, if job control is active.
                      user    User names.  May also be specified as -u.
                      variable
                              Names of all shell variables.  May also be specified as -v.

But, we are not interested in all of the details;we need only the basic information.Details are comma-separated,as in the following example.
所以有了如下的这个cut命令

  • cut 命令
-d, --delimiter=DELIM
              use DELIM instead of TAB for field delimiter 【使用DELIM而不是tab作为域分割】
         
-f, --fields=LIST
	  select only these fields;  also print any line that contains no delimiter character, unless the -s option is specified
【仅仅选择某些域;也可以输出任何不包含分隔符的字符,除非-s选项被指定。】
[root@server4 shells]# file -b reboot.sh 
Bourne-Again shell script, ASCII text executable

缺少参数报错:

[root@server4 shells]# file -b reboot.sh | cut -d,
cut: you must specify a list of bytes, characters, or fields
Try 'cut --help' for more information.
[root@server4 shells]# file -b reboot.sh | cut --delimiter=, -f1
Bourne-Again shell script

[root@server4 shells]# file -b reboot.sh | cut --delimiter=, -f2
 ASCII text executable

[root@server4 shells]# file -b reboot.sh | cut --delimiter=, -f3
#【如果没有内容,则不输出】

A while loop is used to iterate line by line through the find command’s output.
一个while循环用于逐行遍历find命令的输出

[root@server4 shells]# find /root/shells/ -type f -print
/root/shells/checkUser.sh
/root/shells/checkZookeeper.sh
/root/shells/reboot.sh
/root/shells/jps.sh
/root/shells/addScalaHome.sh
/root/shells/adjustSysctlConf.sh
/root/shells/closeZookeeper.sh
/root/shells/hosts
/root/shells/startZookeeper.sh
/root/shells/jps.sha1
/root/shells/file1.txt
/root/shells/file2.txt
/root/shells/sorted.txt
/root/shells/isSorted.sh
/root/shells/data.txt
/root/shells/checkword.sh
/root/shells/interactive.sh
/root/shells/input.data
/root/shells/automate_expect.sh
/root/shells/data1.txt
/root/shells/data2.txt
/root/shells/log.txt
/root/shells/filestat.sh
/root/shells/remove_duplicates.sh
/root/shells/server.log
/root/shells/allTableInextractdb.sql
/root/shells/sorted.txt~
/root/shells/a.txt
/root/shells/createFile.sh
/root/shells/1.txt
/root/shells/2.txt
/root/shells/3.txt
/root/shells/file
/root/shells/file1.txt~
/root/shells/filestat.sh.bak
/root/shells/filestat.sh_20181006

这个find命令的输出是作为循环的输入。

正常的while循环是:

while read line;
do something
done < filename

Instead of the filename we used the output of find.<(find $path -type f -print) is equivalent to a filename.
The first < is for input redirection and the second < is for converting the subprocess output to a filename.There is a space between these two.

[root@server4 shells]# cat filestat.sh
#!/bin/bash
#Filename:filestat.sh
if [ $# -ne 1 ]; #如果参数不等于1
then 
 echo "Usage is $0 basepath";
 exit
fi
path=$1 #将参数赋值给path

declare -A statarray; #声明一个变量

while read line;
do
 #echo ++++++$line is ++++++
 ftype=`file -b "$line" | cut -d, -f1`  #得出文件的类型
 let statarray["$ftype"]++;       #将文件类型作为下标,然后自加
done < <(find $path -type f -print)  

echo ===file type and counts=====
for ftype in "${!statarray[@]}";
do
 echo $ftype:${statarray["$ftype"]}
done

${!statarray[@]} 用于返回数组索引清单

添加类型输出信息后:

[root@server4 shells]# ./filestat.sh /root/shells/
===file type and counts=====
POSIX shell script Bourne-Again shell script ASCII text empty UTF-8 Unicode text a /usr/bin/expect script
POSIX shell script:1
Bourne-Again shell script:15
ASCII text:13
empty:5
UTF-8 Unicode text:1
a /usr/bin/expect script:1