之前同事用perl写了个脚本来分析 数据

数据格式如下,文件有数百M

zhangsan     80
lisi         81.5
wangwu       93
zhangsan     85
lisi         88
wangwu      97
zhangsan    90
lisi        92
wangwu      88

计算每人的平均分和总分

cat test.pl

 #!/usr/bin/perl -w

    use strict;

    BEGIN {
           eval{ require List::Util; };
           import List::Util qw/sum/ unless $@;
    }

    my %name;

    open my $file,'<','score2.txt' or die "$!\n";

    while ( <$file> ) {
           chomp;
           my @array = split /\s+/;
           $name{$array[0]} = [] unless exists $name{$array[0]};
           push @{$name{$array[0]}},$array[1];
    }

    close $file;

    print "name#######average#######total\n";

    for my $name ( sort keys %name ) {
        my @tmp_array = @{$name{$name}};
        print $name,"\t",sum(@tmp_array)/(scalar @tmp_array),"\t",sum(@tmp_array),"\n";
    }

 

 

接着是perl的完成需要的时间

name#######average#######total
lisi    87.1666666666667        201101083.5
wangwu  92.6666666666667        213790062
zhangsan        85      196102395

real    0m16.099s
user    0m15.379s
sys     0m0.713s

 

 

我写了awk 的完成时间

time awk ' BEGIN{printf "name\t\t avgscore\t\t\t sum \n"}name[$1]+=$2{};++num[$1]{};END{for( i in name) printf "%s\t\t|%-10f|\t\t|%.2f|\n", i,name[i]/num[i],name[i]}' score2.txt

name             avgscore                        sum
zhangsan                |85.000000 |            |196102395.00|
lisi            |87.166667 |            |201101083.50|
wangwu          |92.666667 |            |213790062.00|

real    0m4.468s
user    0m4.402s
sys     0m0.063s

 

很给力的awk吧!