PHP：计算文件或数组中单词出现频率

转载

mob60475704c528 2015-04-21 08:38:00

文章标签 linux命令 php 数组数据逆序 文章分类 PHP 后端开发

一：如果是小文件，可以一次性读入到数组中，使用方便的数组计数函数进行词频统计（假设文件中内容都是空格隔开的单词）：

    <?php  
    $str = file_get_contents("/path/to/file.txt"); //get string from file  
    preg_match_all("/\b(\w+[-]\w+)|(\w+)\b/",$str,$r); //place words into array $r - this includes hyphenated words  
    $words = array_count_values(array_map("strtolower",$r[0])); //create new array - with case-insensitive count  
    arsort($words); //order from high to low   
    print_r($words)

二：如果是大文件，读入内存就不合适了，可以采用如下方法：

    <?php  
    $filename = "/path/to/file.txt";  
    $handle = fopen($filename,"r");  
    if ($handle === false) {  
      exit;  
      }  
    $word = "";  
    while (false !== ($letter = fgetc($handle))) {  
      if ($letter == ' ') {  
        $results[$word]++;  
        $word = "";  
        }  
      else {  
        $word .= $letter;  
        }  
    }  
    fclose($handle);  
    print_r($results);