需要统计一下svn上项目的代码的数量,具体应该怎么做呢?
第一步:编辑脚本将所有的svn的项目的源码拉到指定的目录下面。
rm -rf /tmp/svntest
mkdir /tmp/svntest
i=1
while read line
do
cd /tmp/svntest
mkdir /tmp/svntest/$i
cd /tmp/svntest/$i
svn checkout --username="XXXX" --password=XXXX
第二步:编写统计文件行数的脚本
$line > /dev/null
repo=`echo ${line##*/}`
cd $repo
echo $line
line=`find . -type f -name "*.java" -o -name "*.jsp" -o -name "*.html" -o -name "*.js" | xargs cat | egrep -v '^\s+*$'| wc -l`
file=`find . -type f -name "*.java" -o -name "*.jsp" -o -name "*.html" -o -name "*.js" | xargs ls -tl | wc -l`
echo "java_file_number: $file code_line: $line"
let i=$i+1
以下是完整的脚本代码:
#!/bin/bash
rm -rf /tmp/svntest
mkdir /tmp/svntest
i=1
while read line
do
cd /tmp/svntest
mkdir /tmp/svntest/$i
cd /tmp/svntest/$i
svn checkout --username="XXXX" --password=XXXX $line > /dev/null
repo=`echo ${line##*/}`
cd $repo
echo $line
line=`find . -type f -name "*.java" -o -name "*.jsp" -o -name "*.html" -o -name "*.js" | xargs cat | egrep -v '^\s+*$'| wc -l`
file=`find . -type f -name "*.java" -o -name "*.jsp" -o -name "*.html" -o -name "*.js" | xargs ls -tl | wc -l`
echo "java_file_number: $file code_line: $line"
let i=$i+1
done << Eof
http://XXXXX/geexek/enroll
http://XXXXX/geexek/mainsite/trunk/geexek
http://XXXXX/geexek/toolbox/integration
http://XXXXX/geexek/toolbox/rmiupload/src
http://XXXXX/geexek/cmptservice/trunk/orienteering
http://XXXXX/geexek/enroll/managex
http://XXXXX/geexek/enroll/branches/dev_1.0/managements
http://XXXXX/corun/Server/DEyesServer/branches/Geexek-branch/GeexekApp
http://XXXXX/corun/Server/DEyesServer/branches/Geexek-branch/gWeexServer
Eof
第三步、将整理好的txt文件,通过脚本,转化成csv文件方便统计。
import pandas as pd
import numpy as np
import re
rePattern = re.compile("\d+")
resDic = {
'project':[],
'javaFiles':[],
'javaLines':[],
'csFiles':[],
'csLines':[],
'jsFiles':[],
'jsLines':[],
'jspFiles':[],
'jspLines':[],
'htmlFiles':[],
'htmlLines':[]
}
def findResult(name,eachLine):
res = re.findall(rePattern,str(eachLine))
resDic[name+'Files'].append(res[0])
resDic[name+'Lines'].append(res[1])
return
if __name__ == "__main__":
with open('./代码统计.txt','r') as f:
for eachLine in f:
if "==" in eachLine:
continue
if "http" in eachLine and "svn" not in eachLine:
# print(eachLine.split("/")[-1])
eachLine = eachLine.strip("\n")
resDic['project'].append(eachLine.split("/")[-1])
continue
if "java" in eachLine:
findResult('java',eachLine)
continue
if "cs" in eachLine:
findResult('cs',eachLine)
continue
if "js" in eachLine and "jsp" not in eachLine:
findResult('js',eachLine)
continue
if "jsp" in eachLine:
findResult("jsp",eachLine)
continue
if "html" in eachLine:
findResult("html",eachLine)
df = pd.DataFrame(resDic)
df.to_csv("sum.csv")