软件工程作业-01（希望还有02）

原创

lqvc2011 2014-04-09 18:24:41 ©著作权

©著作权归作者所有：来自51CTO博客作者lqvc2011的原创作品，请联系作者获取转载授权，否则将追究法律责任

软件工程第二次作业http://8409328.blog.51cto.com/8399328/1403615

首先，我想做的是统计历年高考的单词个数，然后从中统计得出高。频。词。汇，其次，这是我做的第三个版本，之前第一次做的已经发表了感觉不满意又删了，做了第二个版本失败了就没有发，直到现在第三个版本。最后，在这次写作业的过程中学到了很多东西，我会再文中和大家分享的。

思路（因为做了三遍了所以思路比较清晰，而且文是这个快做完的时候写的，不规范，因为不太懂）

第一步 读取文件

读取文件的方法很多，之前用的是FileOutputStream最简单的，也是最初学过的，后来查到有Channel，评论说它的读取速度快，就像用了方法如下（举个例子而已，是找一个博客上的出处忘了）一定能要运行的

package nio;
import java.io.BufferedInputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.RandomAccessFile;
import java.nio.ByteBuffer;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.channels.FileChannel.MapMode;
/**
 *
 * Channel类似与流,数据可以从Channel读取到Buffer,也可以从Buffer写入到Channel
 * 但通道和流还是有区别,比如流只能是单向读或写,而通道可以异步读写
 *
 * @author
 */
public class FileChannelTest {
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     
    // 110M
    private static String file = "D:\\Users\\lq\\Desktop\\fileTest\\cet07.txt";
    public static void main(String[] args) throws IOException {
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
        // 普通 NIO 读取
        // 每次读取1024个字节
        // readByChannelTest(1024); // 28151毫秒
        // 普通 NIO 读取
        // 每次读取1个字节,每次读取1个字节太慢了
        // readByChannelTest(1);
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
        // 使用内存映射文件来读取
        // 从FileChannel拿到MappedByteBuffer,读取文件内容
        readByChannelTest3(1024);   // 61毫秒，甚至不到100毫秒
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
        // 对于一个只有110M的文件，验证使用FileChannel映射得到MappedByteBuffer
        // 就能大幅提交文件读取速度
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
        // 普通的缓冲流读取
        // readByBufferdStream();   // 3922毫秒
    }
    /**
     * 使用FileChannel读取文件,并打印在控制台
     *
     * @param 每次读取多少个字节
     * @throws IOException
     */
    public static void readByChannelTest(int allocate) throws IOException {
        long start = System.currentTimeMillis();
        FileInputStream fis = new FileInputStream(file);
        // 1.从FileInputStream对象获取文件通道FileChannel
        FileChannel channel = fis.getChannel();
        long size = channel.size();
        // 2.从通道读取文件内容
        byte[] bytes = new byte[1024];
        ByteBuffer byteBuffer = ByteBuffer.allocate(allocate);
        // channel.read(ByteBuffer) 方法就类似于 inputstream.read(byte)
        // 每次read都将读取 allocate 个字节到ByteBuffer
        int len;
        while ((len = channel.read(byteBuffer)) != -1) {
            // 注意先调用flip方法反转Buffer,再从Buffer读取数据
            byteBuffer.flip();
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
            // 有几种方式可以操作ByteBuffer
            // 1.可以将当前Buffer包含的字节数组全部读取出来
            //bytes = byteBuffer.array();
            // System.out.print(new String(bytes));
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
            // 2.类似与InputStrean的read(byte[],offset,len)方法读取
            byteBuffer.get(bytes, 0, len);
            // System.out.print(new String(bytes, 0 ,len));
            // 3.也可以遍历Buffer读取每个字节数据
            // 一个字节一个字节打印在控制台,但这种更慢且耗时
            // while(byteBuffer.hasRemaining()) {
            // System.out.print((char)byteBuffer.get());
            // }
            // 最后注意调用clear方法,将Buffer的位置回归到0
            byteBuffer.clear();
        }
        // 关闭通道和文件流
        channel.close();
        fis.close();
        long end = System.currentTimeMillis();
        System.out.println(String.format("\n===>文件大小：%s 字节", size));
        System.out.println(String.format("===>读取并打印文件耗时：%s毫秒", end - start));
    }
    /**
     * 仍然是根据FileChannel操作ByteBuffer,从ByteBuffer读取内容
     * 通道读取文件，速度比内存映射慢很多，甚至比普通缓冲流要慢
     *
     * @param allocate
     * @throws IOException
     */
    public static void readByChannelTest2(int allocate) throws IOException {
        long start = System.currentTimeMillis();
        FileInputStream fis = new FileInputStream(file);
        // 1.从FileInputStream对象获取文件通道FileChannel
        FileChannel channel = fis.getChannel();
        long size = channel.size();
        // 每次读取allocate个字节,计算要循环读取多少次
        long cycle = size / allocate;
        // 看是否能整数倍读完
        int mode = (int) (size % allocate);
        // 循环读取
        byte[] bytes;
        ByteBuffer byteBuffer = ByteBuffer.allocate(allocate);
        for (long i = 0; i < cycle; i++) {
            if (channel.read(byteBuffer) != -1) {
                byteBuffer.flip();
                bytes = byteBuffer.array();
                // System.out.print(new String(bytes));
                byteBuffer.clear();
            }
        }
        // 读取最后mode个字节
        if (mode > 0) {
            byteBuffer = ByteBuffer.allocate(mode);
            if (channel.read(byteBuffer) != -1) {
                byteBuffer.flip();
                bytes = byteBuffer.array();
                // System.out.print(new String(bytes));
                byteBuffer.clear();
            }
        }
        // 关闭通道和文件流
        channel.close();
        fis.close();
        long end = System.currentTimeMillis();
        System.out.println(String.format("\n===>文件大小：%s 字节", size));
        System.out.println(String.format("===>读取并打印文件耗时：%s毫秒", end - start));
    }
    /**
     * 通过 FileChannel.map()拿到MappedByteBuffer
     * 使用内存文件映射，速度会快很多
     *
     * @throws IOException
     */
    public static void readByChannelTest3(int allocate) throws IOException {
        long start = System.currentTimeMillis();
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
        RandomAccessFile fis = new RandomAccessFile(new File(file), "rw");
        FileChannel channel = fis.getChannel();
        long size = channel.size();
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
        // 构建一个只读的MappedByteBuffer
        MappedByteBuffer mappedByteBuffer = channel.map(MapMode.READ_ONLY, 0, size);
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
        // 如果文件不大,可以选择一次性读取到数组
        // byte[] all = new byte[(int)size];
        // mappedByteBuffer.get(all, 0, (int)size);
        // 打印文件内容
        // System.out.println(new String(all));
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
        // 如果文件内容很大,可以循环读取,计算应该读取多少次
        byte[] bytes = new byte[allocate];
        long cycle = size / allocate;
        int mode = (int)(size % allocate);
        //byte[] eachBytes = new byte[allocate];
        for (int i = 0; i < cycle; i++) {
            // 每次读取allocate个字节
            mappedByteBuffer.get(bytes);
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
            // 打印文件内容,关闭打印速度会很快
            // System.out.print(new String(eachBytes));
        }
        if(mode > 0) {
            bytes = new byte[mode];
            mappedByteBuffer.get(bytes);
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
            // 打印文件内容,关闭打印速度会很快
            // System.out.print(new String(eachBytes));
        }
        // 关闭通道和文件流
        channel.close();
        fis.close();
        long end = System.currentTimeMillis();
        System.out.println(String.format("\n===>文件大小：%s 字节", size));
        System.out.println(String.format("===>读取并打印文件耗时：%s毫秒", end - start));
    }
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     
    /**
     * 普通Java IO 缓冲流读取
     * @throws IOException
     */
    public static void readByBufferdStream() throws IOException {
        long start = System.currentTimeMillis();
        BufferedInputStream bis = new BufferedInputStream(new FileInputStream(file));
        long size = bis.available();
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
        int len = 0;
        int allocate = 1024;
        byte[] eachBytes = new byte[allocate];
        while((len = bis.read(eachBytes)) != -1) {
            // System.out.print(new String(eachBytes, 0, len));
        }
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
        bis.close();
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
        long end = System.currentTimeMillis();
        System.out.println(String.format("\n===>文件大小：%s 字节", size));
        System.out.println(String.format("===>读取并打印文件耗时：%s毫秒", end - start));
    }
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     
}

第二步统计

得到StringBuffer类型文件的内容之后就是要把每个单词取出来，用的方法很简单因为这个太复杂了英语单牵扯到形式的变化，能力有限，尽力去做

public List<Map.Entry> filter(List<Map.Entry> list) {
        List<Map.Entry> result = new ArrayList<Map.Entry>();
        Map.Entry mapEntryTemp = null;
        strToList();
        //System.out.println(list);
        for (int i = 0; i < list.size(); i++) {
            mapEntryTemp = list.get(i);
            String keyTemp = (String) mapEntryTemp.getKey();
            // System.out.println(keyTemp);
            if (!filterWord.contains(keyTemp)) {
            result.add(mapEntryTemp);
            }
        }
        System.out.println(result);
        return result;
    }

是不是真的很简单，其中filterWord是一个hashset类型的，里面存储英语最常用单词，为什么用Hashset呢，因为它查找起来最快，而且Java有现成的数据结构，不用自己写。

第三步保存

保存也很简单啊，但我还是想尝试一下，于是就用了channel，其实在之前的很多情况下的代码都不完全是自己写的，因为自己没有写过不会写，就在网上找自己需要的功能相似的代码，然后自己根据需要在改，然后收藏建立自己的错题本，以后再遇到，在查，查多了就会了。当然也有找不到的时候，怎么办呢，其实答案就在API，找到你想用的那个方法，然后找相关的方法，很多方法都是通过某种数据结构相关联的，上代码

public void save(List<Map.Entry> list) throws IOException {
        Map.Entry mapEntryTmp;
        File file  = new File("D:\\Users\\lq\\Desktop\\fileTest\\savefile4.txt");
        FileOutputStream fos = new FileOutputStream(file);
        FileChannel fc = fos.getChannel();
                                                                                                                                                                                                                                                                                   
        ByteBuffer buffer = ByteBuffer.allocateDirect(1024*1024*1024);
                                                                                                                                                                                                                                                                                   
        for (int i = 0; i < list.size(); i++) {
            mapEntryTmp = list.get(i);
            String strmedium = mapEntryTmp.getKey()+" ";
            buffer.put(strmedium.getBytes());
        }
        buffer.flip();
        fc.write(buffer);
    }

这个是我自己按找API文挡写的，估计没有人会这样写吧。

然后我就这样做了，我记得String和Byte之间有联系，所以就去查API在ByteBuffer中有一个put(byte[] src)，在String中有一个Byte[] get（String）的方法，接下来就知道怎么做了，然后我就试了一下，结果是可以。ok

现在应该说已经完了，但是我又遇到了新问题，我的程序在处理一些文档的是侯侯出现丢词的现象，所以未完待续。。。

果然拿来主义是要被坑的，在反复调试的过程中发现了一个问题，检测最常见单词文档（加长版）时，竟然the检测从出了17个，然后我就去在在文档中去看这个结果用CRL+F搜素，只要单词中含有the（例如they）就会被检测为contain，自己写了一个小测试程序，果真如果所想，只要含有一个字符就行了，后来查了API文档

如果此 collection 包含指定的元素，则返回 true。更确切地讲，当且仅当此 collection 至少包含一个满足 (o==null ? e==null : o.equals(e)) 的元素 e 时，返回 true。

看得似懂非懂的，这是要覆写父类方法了

其中我关键代码我都写在上面了，如果要源代码的发我私信我就可以了，我可以发邮件给你。