之前同事用perl写了个脚本来分析 数据
数据格式如下,文件有数百M
zhangsan 80
lisi 81.5
wangwu 93
zhangsan 85
lisi 88
wangwu 97
zhangsan 90
lisi 92
wangwu 88
计算每人的平均分和总分
cat test.pl
#!/usr/bin/perl -w
use strict;
BEGIN {
eval{ require List::Util; };
import List::Util qw/sum/ unless $@;
}
my %name;
open my $file,'<','score2.txt' or die "$!\n";
while ( <$file> ) {
chomp;
my @array = split /\s+/;
$name{$array[0]} = [] unless exists $name{$array[0]};
push @{$name{$array[0]}},$array[1];
}
close $file;
print "name#######average#######total\n";
for my $name ( sort keys %name ) {
my @tmp_array = @{$name{$name}};
print $name,"\t",sum(@tmp_array)/(scalar @tmp_array),"\t",sum(@tmp_array),"\n";
}
接着是perl的完成需要的时间
name#######average#######total
lisi 87.1666666666667 201101083.5
wangwu 92.6666666666667 213790062
zhangsan 85 196102395
real 0m16.099s
user 0m15.379s
sys 0m0.713s
我写了awk 的完成时间
time awk ' BEGIN{printf "name\t\t avgscore\t\t\t sum \n"}name[$1]+=$2{};++num[$1]{};END{for( i in name) printf "%s\t\t|%-10f|\t\t|%.2f|\n", i,name[i]/num[i],name[i]}' score2.txt
name avgscore sum
zhangsan |85.000000 | |196102395.00|
lisi |87.166667 | |201101083.50|
wangwu |92.666667 | |213790062.00|
real 0m4.468s
user 0m4.402s
sys 0m0.063s
很给力的awk吧!