快速扫描文本文件，统计行数，并返回每一行的索引位置(Delphi、C#)

转载

mob60475705a319 2016-10-12 00:35:00

文章标签 数组 c# javascript 文本文件解决方法 文章分类 代码人生

由项目需要，需要扫描1200万行的文本文件。经网友的指点与测试，发现C#与Delphi之间的差距并不大。不多说，列代码测试：

下面是Delphi的代码：

快速扫描文本文件，统计行数，并返回每一行的索引位置(Delphi、C#)_c#

//遍历文件查找回车出现的次数

function ScanEnterFile(const FileName:string):TInt64Array;

var

MyFile:TMemoryStream;//文件内存

rArray:TInt64Array; //行索引结果集

size,curIndex:int64;//文件大小，当前流位置

enterCount:int64;//回车数量

DoLoop:Boolean;//是否继续循环

pc: PChar;

arrayCount:int64;//当前索引数组大小

addStep:integer;//检测到回车字符串时需要添加的步进

begin

if fileName = '' then

Exit;

if not FileExists(fileName) then

Exit;

MyFile:=TMemoryStream.Create;//创建流

MyFile.LoadFromFile(fileName);//把流入口映射到MyFile对象

size:=MyFile.Size;

pc:=MyFile.Memory; //把字符指针指向内存流

curIndex:=RowLeast;

DoLoop:=true;

enterCount:=0;

setlength(rArray,perArray);

arrayCount:=perArray;

enterCount:=0;

rArray[enterCount]:=0;

while DoLoop do

begin

addStep:=0;

if (ord(pc[curIndex])=13) then

addStep:=2;

if (ord(pc[curIndex])=10) then

addStep:=1;

//处理有回车的

if (addStep<>0) then

begin

Application.ProcessMessages;

//增加一行记录

inc(enterCount);

//判断是否需要增大数组

if (enterCount mod perArray=0) then

begin

arrayCount:=arrayCount+perArray;

setlength(rArray,arrayCount);

end;

rArray[enterCount]:=curIndex+addStep;

curIndex:=curIndex+addStep+RowLeast;

end

else

curIndex:=curIndex+2;

if curIndex> size then

DoLoop:=false

else

DoLoop:=true;

end;

result:=rArray;

freeandnil(MyFile);

end;

快速扫描文本文件，统计行数，并返回每一行的索引位置(Delphi、C#)_c#

执行代码：

快速扫描文本文件，统计行数，并返回每一行的索引位置(Delphi、C#)_c#

procedure TMainForm.btn2Click(Sender: TObject);

var

datasIndex:TInt64Array;//数据文件索引

begin

t1:=GetTickCount;

datasIndex:=ScanEnterFile('R:\201201_dataFile.txt');

Caption:=Caption+'::'+inttostr(GetTickCount-t1);

end;

快速扫描文本文件，统计行数，并返回每一行的索引位置(Delphi、C#)_c#

执行结果是：16782 ms

下面是C#的代码：

快速扫描文本文件，统计行数，并返回每一行的索引位置(Delphi、C#)_c#

/// <summary>

/// 扫描文本文件，进行行数的统计，并返回每一行的开始指针数组(1.2KW数据速度比使用数组的快10秒)

/// </summary>

/// <param name="fileName">文件名</param>

/// <param name="rowCount">行数</param>

/// <param name="rowLeast">一行最小长度</param>

/// <param name="incCount">递增索引数组数量</param>

/// <param name="initCount">首次初始化行索引数量</param>

/// <returns>索引列表</returns>

public static IList<long> ScanEnterFile(string fileName, out int rowCount, int rowLeast,ThreadProgress progress)

{

rowCount = 0;

if (string.IsNullOrEmpty(fileName))

return null;

if (!System.IO.File.Exists(fileName))

return null;

FileStream myFile = new FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.Read, 8);//把文件读入流

IList<long> rList=new List<long>();

int enterCount = 0;//回车数量

int checkValue;

int addStep;

myFile.Position = rowLeast;

checkValue = myFile.ReadByte();

while (checkValue != -1)

{

//Application.DoEvents();

addStep = -1;

//由于文件ReadByte之后，其当前位置已经往后推移了移位。

//因此，如果是回车的第一个字符，则要推移一位。

//而如果是回车的第二个字符，则不用推移一位

if (checkValue == 13)

addStep = 1;

else if (checkValue == 10)

addStep = 0;

if (addStep >= 0)

{

enterCount++;

rList.Add(myFile.Position + addStep);

myFile.Seek(rowLeast + addStep, SeekOrigin.Current);

progress(enterCount);

}

else myFile.Seek(2, SeekOrigin.Current);

checkValue = myFile.ReadByte();

}

rowCount = enterCount + 1;

return rList;

}

快速扫描文本文件，统计行数，并返回每一行的索引位置(Delphi、C#)_c#

执行的代码：

Stopwatch stopwatch = new Stopwatch();

stopwatch.Start();

int rowCount;

FileHelper.ScanEnterFile(@"R:\201201_dataFile.txt", out rowCount, 35, outputProgress);

useTime = stopwatch.ElapsedMilliseconds;

执行结果是：

124925 ms

（经过众多网友的批评与指点，该方法并没有把文件读取内存中，而是逐个字节地读取，速度比Delphi字节读进内存的方法要慢很多。这种方法只适合于老机器，内存不够的情况下，当今内存已经很便宜了，所以，该方法目前已经过时了，下面经过网友的指点，使用了readline的方法，速度大概是6秒左右。）

快速扫描文本文件，统计行数，并返回每一行的索引位置(Delphi、C#)_c#

public static IList<long> ScanEnterFile(string fileName, ThreadProgress progress)

{

if (string.IsNullOrEmpty(fileName))

return null;

if (!System.IO.File.Exists(fileName))

return null;

IList<long> rList = new List<long>();

rList.Add(0);

StreamReader sr = File.OpenText(fileName);

string rStr = sr.ReadLine();

while (null != rStr)

{

rList.Add(rList[rList.Count-1] + rStr.Length + 2);

rStr = sr.ReadLine();

progress(rList.Count);

}

sr.Close();

return rList;

}

快速扫描文本文件，统计行数，并返回每一行的索引位置(Delphi、C#)_c#

经过测试，该方法如果存在中文字符编码的时候，其位置是错误的。日后找到解决方法后，再上来更新。

经过测试，C#的使用IList<T>比数组的要快。

总结：任何事物都有其存在的价值，至于看官门选什么，就根据自己的需要，来选择，这里，本人不会有任何偏向于哪一方。反正，能成事，什么都不重要了。

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：SSH框架总结（框架分析+环境搭建+实例源代码下载）

下一篇：软件项目复杂性

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯

快速扫描文本文件，统计行数，并返回每一行的索引位置(Delphi、C#)

快速扫描文本文件，统计行数，并返回每一行的索引位置(Delphi、C#)

51CTO博客