内容样本:


 <ul class="panel_body">    
                 <li>
                    <a href="/zhaoyangjian724/article/category/1756569" οnclick="_gaq.push(['_trackEvent','function', 'onclick', 'blog_articles_wenzhangfenlei']); ">Oracle dump解析</a><span>(20)</span>
                </li>
                 <li>
                    <a href="/zhaoyangjian724/article/category/1756685" οnclick="_gaq.push(['_trackEvent','function', 'onclick', 'blog_articles_wenzhangfenlei']); ">sql 查询优化</a><span>(159)</span>
                </li>



perl 提供的方法:

find_by_tag_name

  @elements = $h->find_by_tag_name('tag', ...);
  $first_match = $h->find_by_tag_name('tag', ...);


在列表上下文, 返回元素的列表在$h下面 有任何指定tag名字的。


在标量环境,返回第一次 找到的元素

node2:/root/pachong#cat test.pl 
use LWP::UserAgent;  
use POSIX;  
use HTML::TreeBuilder::XPath;   
use Encode;   
use HTML::TreeBuilder;    
use Data::Dumper;
 my $ua = LWP::UserAgent->new;  
 $ua->timeout(10);  
 $ua->env_proxy;  
 $ua->agent("Mozilla/8.0");  
  
  
 use HTML::TreeBuilder::XPath;  
 my $tree= HTML::TreeBuilder::XPath->new;  
 $tree->parse_file( "csdn.html");
   ##获取博客分类的URL,根据a标签查找属性为href  
  @Links = $tree->find_by_tag_name('a');  
  print %{$Links[0]};
  print "\n";


node2:/root/pachong#perl test.pl 
HTML::Element=HASH(0x24ad1f8)
----hash-----

返回一个对象,调用对象的方法:






node2:/root/pachong#perl test.pl 
onclick_gaq.push(['_trackEvent','function', 'onclick', 'blog_articles_wenzhangfenlei']); _taga_contentARRAY(0x179c718)href/zhaoyangjian724/article/category/1756569_parentHTML::Element=HASH(0x1b2e2d0)
node2:/root/pachong#

@Links = $tree->find_by_tag_name('a');    返回的是a标签下的元素列表

attr

  $value = $h->attr('attr');
  $old_value = $h->attr('attr', $new_value);

返回(可选的结果集) $h给定属性的值,属性值(不是值,如果提供的话)是强制为小写。


如果尝试读取属性的值不存在对于这个元素,返回值是undef.


如果 methods 是被提供来访问一个属性(像 $h->tag for "_tag", $h->content_list, etc. below), 

使用那些替代 $h->attr,是否用于读取或者设置

  $href = $_->attr('href'); 



取出对应属性的值:

$VAR1 = bless( {
                 'onclick' => '_gaq.push([\'_trackEvent\',\'function\', \'onclick\', \'blog_articles_wenzhangfenlei\']); ',
                 'href' => '/zhaoyangjian724/article/category/1756569',
                 '_content' => [
                                 'Oracle dump解析'
                               ],

这里  '_content' 是一个数组引用


node2:/root/pachong#cat test.pl 
use LWP::UserAgent;  
use POSIX;  
use HTML::TreeBuilder::XPath;   
use Encode;   
use HTML::TreeBuilder;    
use Data::Dumper;
 my $ua = LWP::UserAgent->new;  
 $ua->timeout(10);  
 $ua->env_proxy;  
 $ua->agent("Mozilla/8.0");  
  
  
 use HTML::TreeBuilder::XPath;  
 my $tree= HTML::TreeBuilder::XPath->new;  
 $tree->parse_file( "csdn.html");
   ##获取博客分类的URL,根据a标签查找属性为href  
  @Links = $tree->find_by_tag_name('a');  
  #print Dumper($Links[0]);
  print "\n";
  print "--------------------\n";
  print @{$Links[0]->{'_content'}};
  #print "\n";
  print "--------------------\n";
node2:/root/pachong#perl test.pl 

--------------------
Oracle dump解析--------------------

取出属性为'_content' 对应的值


利用  $href = $_->attr('_content'); 



 use HTML::TreeBuilder::XPath;  
 my $tree= HTML::TreeBuilder::XPath->new;  
 $tree->parse_file( "csdn.html");
   ##获取博客分类的URL,根据a标签查找属性为href  
  @Links = $tree->find_by_tag_name('a');  
  #print Dumper($Links[0]);
  print "--------------------\n";
  print $Links[0]->attr('_content'); 
  print "\n";
  print @{$Links[0]->attr('_content')}; 
  print "\n";
  print "--------------------\n";
node2:/root/pachong#perl test.pl 
--------------------
ARRAY(0x20276b8)
Oracle dump解析
--------------------






 <ul class="panel_body">    
                 <li>
                    <a href="/zhaoyangjian724/article/category/1756569" οnclick="_gaq.push(['_trackEvent','function', 'onclick', 'blog_articles_wenzhangfenlei']); ">Oracle dump解析</a><span>(20)</span>
                </li>
                 <li>
                    <a href="/zhaoyangjian724/article/category/1756685" οnclick="_gaq.push(['_trackEvent','function', 'onclick', 'blog_articles_wenzhangfenlei']); ">sql 查询优化</a><span>(159)</span>
                </li>
根据a标签查询 相应属性的值
<a> 标签的 href 属性用于指定超链接目标的 URL

取href:
 use HTML::TreeBuilder::XPath;  
 my $tree= HTML::TreeBuilder::XPath->new;  
 $tree->parse_file( "csdn.html");
   ##获取博客分类的URL,根据a标签查找属性为href  
  @Links = $tree->find_by_tag_name('a');  
  #print Dumper($Links[0]);
  print "--------------------\n";
  print $Links[0]->{'href'}; 
  print "\n";
  print $Links[0]->attr('href'); 
  print "\n";
  print "--------------------\n";
node2:/root/pachong#perl test.pl 
--------------------
/zhaoyangjian724/article/category/1756569
/zhaoyangjian724/article/category/1756569
--------------------