在数据迁移的时候,目前启用了10个并行的进程。每个进程负责一部分的数据导入工作。然而在统计数据导入进度的时候,总是感觉抓不到重点,没有一目了然的报告。
在定时做数据状态检查的时候,总是凭着感觉和不停的查看日志来得到最基本的状态。
为了从这种体力工作中解放出来,今天写一个状态报表来对数据的导入状态进行清晰的了解。
比如现在有100个表,分为10个并行的进程来导入数据,其中有些表比较大,比如表TEST,我们做了切分,把它切分为100个dump,那么做这个表TEST做数据导入的时候,就需要知道截止目前导入了100个dump里的多少个。比如已经导入了10个,我们就认为目前表TEST导入了10%.还有90个dump需要导入。如果可以的话最好知道是哪个并行进程在做这项数据导入。
如果有的表还没有开始导入,就表明属于等待状态,如果已经导入完成,则表明完成
期望达到的结果类似下面的格式,我们就可以很清楚的看到DRESS_DATA已经完成了数据导入工作,是进程10完成的。
ICAL_FILES还没有开始导入数据,还在等待。

ICAL_FILES 0 of TOTAL 1 completed, |--pending... from
AC_SOURCE 0 of TOTAL 1 completed, |--pending... from
DRESS_DATA 4 of TOTAL 4 completed, |--finished... from split_par_10_appendata.log
A_NAME_LINK 3 of TOTAL 3 completed, |--finished... from split_par_9_appendata.log
AGREEMENT 1 of TOTAL 1 completed, |--finished... from split_par_8_appendata.log
AT_RESOURCE 4 of TOTAL 4 completed, |--finished... from split_par_7_appendata.log
R1_ACCOUNT 2 of TOTAL 2 completed, |--finished... from split_par_7_appendata.log
DRESS_NAME 3 of TOTAL 3 completed, |--finished... from split_par_6_appendata.log
AAL_BALANCE 0 of TOTAL 16 completed, |--pending... from
IRRANGEMENT 0 of TOTAL 1 completed, |--pending... from
ARGE_GROUP 0 of TOTAL 11 completed, |--pending... from
R1_CHARGES 24 of TOTAL 24 completed, |--finished... from split_par_6_appendata.log
R1_CONTROL 1 of TOTAL 1 completed, |--finished... from split_par_4_appendata.log
_DEBIT_LINK 0 of TOTAL 9 completed, |--pending... from
RMER_CREDIT 0 of TOTAL 1 completed, |--pending... from


我们假定生成的日志都是按照split_par*.log的格式。
其中DUMP目录中存放的是抽取得到的外部表dump,比如表TEST切分为100份,就有100个dump文件。我们根据这个信息来统计数据导入的进度。

function check_tab
{
total_cnt=`ls -l ../DUMP/$1_[0-9]*.dmp|wc -l`
fin_cnt=`grep COPY_MIG.$1_EXT_[0-9]* *par*.log|wc -l`
par_from_file=`grep COPY_MIG.$1_EXT_[0-9]* *par*.log|tail -1|awk -F: '{print $1}'`
tmp_status=finished
if [ $fin_cnt -eq $total_cnt ]
then
tmp_status=finished
elif [ $fin_cnt -eq 0 ]
then
tmp_status=pending
elif [ $fin_cnt -lt $total_cnt ]
then
tmp_status=processing
fi
echo $1 $fin_cnt of TOTAL $total_cnt completed, "|--"$tmp_status... from $par_from_file >> tmp_check.lst
}

total_tab_cnt=`cat ../parfile/tablst|wc -l`
for i in {1..$total_tab_cnt}
do
tmp_tab_name=`sed -n "${i}p" ../parfile/tablst`
#echo $tmp_tab_name
check_tab $tmp_tab_name
done

awk '
BEGIN{
print "############################################################"
}
{
printf "%30s %4d %2s %5s %5d %3s %-15s %4s %30s \n", $1,$2,$3,$4,$5,$6,$7,$8,$9
}' tmp_check.lst

rm tmp_check.lst

运行后,结果如下所示,这样表的情况就一目了然了。
ICAL_FILES 0 of TOTAL 1 completed, |--pending... from
AC_SOURCE 0 of TOTAL 1 completed, |--pending... from
DRESS_DATA 4 of TOTAL 4 completed, |--finished... from split_par_10_appendata.log
A_NAME_LINK 3 of TOTAL 3 completed, |--finished... from split_par_9_appendata.log
AGREEMENT 1 of TOTAL 1 completed, |--finished... from split_par_8_appendata.log
AT_RESOURCE 4 of TOTAL 4 completed, |--finished... from split_par_7_appendata.log
R1_ACCOUNT 2 of TOTAL 2 completed, |--finished... from split_par_7_appendata.log
DRESS_NAME 3 of TOTAL 3 completed, |--finished... from split_par_6_appendata.log
AAL_BALANCE 0 of TOTAL 16 completed, |--pending... from
IRRANGEMENT 0 of TOTAL 1 completed, |--pending... from
ARGE_GROUP 0 of TOTAL 11 completed, |--pending... from
R1_CHARGES 24 of TOTAL 24 completed, |--finished... from split_par_6_appendata.log
R1_CONTROL 1 of TOTAL 1 completed, |--finished... from split_par_4_appendata.log
_DEBIT_LINK 0 of TOTAL 9 completed, |--pending... from
RMER_CREDIT 0 of TOTAL 1 completed, |--pending... from
RIT_REQUEST 0 of TOTAL 1 completed, |--pending... from
RIT_REQUEST 0 of TOTAL 1 completed, |--pending... from
R1_DISPUTE 1 of TOTAL 1 completed, |--finished... from split_par_10_appendata.log
1E_ACTIVITY 1 of TOTAL 1 completed, |--finished... from split_par_9_appendata.log
XREFERENCES 1 of TOTAL 1 completed, |--finished... from split_par_8_appendata.log
ES_CONTROL 1 of TOTAL 1 completed, |--finished... from split_par_7_appendata.log
R1_INVOICE 1 of TOTAL 1 completed, |--finished... from split_par_10_appendata.log
AR1_MEMO 1 of TOTAL 1 completed, |--finished... from split_par_6_appendata.log
AY_CHANNEL 0 of TOTAL 1 completed, |--pending... from
R1_PAYMENT 1 of TOTAL 1 completed, |--finished... from split_par_9_appendata.log
1T_ACTIVITY 1 of TOTAL 1 completed, |--finished... from split_par_4_appendata.log
RNT_DETAILS 1 of TOTAL 1 completed, |--finished... from split_par_8_appendata.log
_ND_BALANCE 0 of TOTAL 1 completed, |--pending... from
AND_REQUEST 0 of TOTAL 1 completed, |--pending... from
1_TAX_ITEM 0 of TOTAL 34 completed, |--pending... from
CTION_LOG 21 of TOTAL 21 completed, |--finished... from split_par_6_appendata.log
ED_CREDIT 0 of TOTAL 2 completed, |--pending... from
WRITE_OFF 1 of TOTAL 1 completed, |--finished... from split_par_10_appendata.log
COUNT_EXT 1 of TOTAL 1 completed, |--finished... from split_par_9_appendata.log
T_COUNTER 1 of TOTAL 1 completed, |--finished... from split_par_8_appendata.log
S_CONTROL 1 of TOTAL 1 completed, |--finished... from split_par_7_appendata.log
BILLED_OC 1 of TOTAL 1 completed, |--finished... from split_par_6_appendata.log
Y_HISTORY 0 of TOTAL 32 completed, |--pending... from
_REQUESTS 1 of TOTAL 1 completed, |--finished... from split_par_4_appendata.log
ADD_COMPS 4 of TOTAL 4 completed, |--finished... from split_par_10_appendata.log