统计svn上的项目的java代码数量

原创

keyboard_sun 2022-10-26 10:03:10 博主文章分类：shell ©著作权

文章标签 shell svn java html 文章分类 运维

©著作权归作者所有：来自51CTO博客作者keyboard_sun的原创作品，请联系作者获取转载授权，否则将追究法律责任

需要统计一下svn上项目的代码的数量，具体应该怎么做呢？
第一步：编辑脚本将所有的svn的项目的源码拉到指定的目录下面。

rm -rf /tmp/svntest
mkdir /tmp/svntest
i=1
while read line
do
cd /tmp/svntest
mkdir /tmp/svntest/$i
cd /tmp/svntest/$i
svn checkout --username="XXXX" --password=XXXX

第二步：编写统计文件行数的脚本

$line > /dev/null
repo=`echo ${line##*/}`
cd $repo
echo $line
line=`find . -type f -name "*.java" -o -name "*.jsp" -o -name "*.html" -o -name "*.js" | xargs cat | egrep -v '^\s+*$'| wc -l`
file=`find . -type f -name "*.java" -o -name "*.jsp" -o -name "*.html" -o -name "*.js" | xargs ls -tl | wc -l`
echo "java_file_number: $file code_line: $line"
let i=$i+1

以下是完整的脚本代码：

#!/bin/bash
rm -rf /tmp/svntest
mkdir /tmp/svntest
i=1
while read line
do
cd /tmp/svntest
mkdir /tmp/svntest/$i
cd /tmp/svntest/$i
svn checkout --username="XXXX" --password=XXXX $line > /dev/null
repo=`echo ${line##*/}`
cd $repo
echo $line
line=`find . -type f -name "*.java" -o -name "*.jsp" -o -name "*.html" -o -name "*.js" | xargs cat | egrep -v '^\s+*$'| wc -l`
file=`find . -type f -name "*.java" -o -name "*.jsp" -o -name "*.html" -o -name "*.js" | xargs ls -tl | wc -l`
echo "java_file_number: $file code_line: $line"
let i=$i+1
done << Eof
http://XXXXX/geexek/enroll
http://XXXXX/geexek/mainsite/trunk/geexek
http://XXXXX/geexek/toolbox/integration
http://XXXXX/geexek/toolbox/rmiupload/src
http://XXXXX/geexek/cmptservice/trunk/orienteering
http://XXXXX/geexek/enroll/managex
http://XXXXX/geexek/enroll/branches/dev_1.0/managements
http://XXXXX/corun/Server/DEyesServer/branches/Geexek-branch/GeexekApp
http://XXXXX/corun/Server/DEyesServer/branches/Geexek-branch/gWeexServer
Eof

第三步、将整理好的txt文件，通过脚本，转化成csv文件方便统计。

import pandas as pd
import numpy as np
import re

rePattern = re.compile("\d+")

resDic = {
    'project':[],
    'javaFiles':[],
    'javaLines':[],
    'csFiles':[],
    'csLines':[],
    'jsFiles':[],
    'jsLines':[],
    'jspFiles':[],
    'jspLines':[],
    'htmlFiles':[],
    'htmlLines':[]
}

def findResult(name,eachLine):
    res = re.findall(rePattern,str(eachLine))
    resDic[name+'Files'].append(res[0])
    resDic[name+'Lines'].append(res[1])
    return 


if __name__ == "__main__":
        
    with open('./代码统计.txt','r') as f:
        for eachLine in f:
            if "==" in eachLine:
                continue
            if "http" in eachLine and "svn" not in eachLine:
    #             print(eachLine.split("/")[-1])
                eachLine = eachLine.strip("\n")
                resDic['project'].append(eachLine.split("/")[-1])
                continue
            if "java" in eachLine:
                findResult('java',eachLine)
                continue
            if "cs" in eachLine:
                findResult('cs',eachLine)
                continue
            if "js" in eachLine and "jsp" not in eachLine:
                findResult('js',eachLine)
                continue
            if "jsp" in eachLine:
                findResult("jsp",eachLine)
                continue
            if "html" in eachLine:
                findResult("html",eachLine)
                
    df = pd.DataFrame(resDic)
    df.to_csv("sum.csv")