需要统计一下svn上项目的代码的数量,具体应该怎么做呢?
第一步:编辑脚本将所有的svn的项目的源码拉到指定的目录下面。

rm -rf /tmp/svntest
mkdir /tmp/svntest
i=1
while read line
do
cd /tmp/svntest
mkdir /tmp/svntest/$i
cd /tmp/svntest/$i
svn checkout --username="XXXX" --password=XXXX

第二步:编写统计文件行数的脚本

$line > /dev/null
repo=`echo ${line##*/}`
cd $repo
echo $line
line=`find . -type f -name "*.java" -o -name "*.jsp" -o -name "*.html" -o -name "*.js" | xargs cat | egrep -v '^\s+*$'| wc -l`
file=`find . -type f -name "*.java" -o -name "*.jsp" -o -name "*.html" -o -name "*.js" | xargs ls -tl | wc -l`
echo "java_file_number: $file code_line: $line"
let i=$i+1

以下是完整的脚本代码:

#!/bin/bash
rm -rf /tmp/svntest
mkdir /tmp/svntest
i=1
while read line
do
cd /tmp/svntest
mkdir /tmp/svntest/$i
cd /tmp/svntest/$i
svn checkout --username="XXXX" --password=XXXX $line > /dev/null
repo=`echo ${line##*/}`
cd $repo
echo $line
line=`find . -type f -name "*.java" -o -name "*.jsp" -o -name "*.html" -o -name "*.js" | xargs cat | egrep -v '^\s+*$'| wc -l`
file=`find . -type f -name "*.java" -o -name "*.jsp" -o -name "*.html" -o -name "*.js" | xargs ls -tl | wc -l`
echo "java_file_number: $file code_line: $line"
let i=$i+1
done << Eof
http://XXXXX/geexek/enroll
http://XXXXX/geexek/mainsite/trunk/geexek
http://XXXXX/geexek/toolbox/integration
http://XXXXX/geexek/toolbox/rmiupload/src
http://XXXXX/geexek/cmptservice/trunk/orienteering
http://XXXXX/geexek/enroll/managex
http://XXXXX/geexek/enroll/branches/dev_1.0/managements
http://XXXXX/corun/Server/DEyesServer/branches/Geexek-branch/GeexekApp
http://XXXXX/corun/Server/DEyesServer/branches/Geexek-branch/gWeexServer
Eof

第三步、将整理好的txt文件,通过脚本,转化成csv文件方便统计。

import pandas as pd
import numpy as np
import re

rePattern = re.compile("\d+")

resDic = {
'project':[],
'javaFiles':[],
'javaLines':[],
'csFiles':[],
'csLines':[],
'jsFiles':[],
'jsLines':[],
'jspFiles':[],
'jspLines':[],
'htmlFiles':[],
'htmlLines':[]
}

def findResult(name,eachLine):
res = re.findall(rePattern,str(eachLine))
resDic[name+'Files'].append(res[0])
resDic[name+'Lines'].append(res[1])
return


if __name__ == "__main__":

with open('./代码统计.txt','r') as f:
for eachLine in f:
if "==" in eachLine:
continue
if "http" in eachLine and "svn" not in eachLine:
# print(eachLine.split("/")[-1])
eachLine = eachLine.strip("\n")
resDic['project'].append(eachLine.split("/")[-1])
continue
if "java" in eachLine:
findResult('java',eachLine)
continue
if "cs" in eachLine:
findResult('cs',eachLine)
continue
if "js" in eachLine and "jsp" not in eachLine:
findResult('js',eachLine)
continue
if "jsp" in eachLine:
findResult("jsp",eachLine)
continue
if "html" in eachLine:
findResult("html",eachLine)

df = pd.DataFrame(resDic)
df.to_csv("sum.csv")