读取网页读取货币汇率数据 python python读取网页链接

转载

IT剑客风云 2023-06-02 09:12:52

文章标签 Python HTML 抓取网页 文章分类 Python 后端开发

需要从web中抓取相关的网页。正好想学习一下Python，首先看了一下 Python简明教程，内容讲的不多，但是能够使你快速入门，我一直认为实例驱动学习是最有效的办法。所以直接通过实际操作怎么去抓取网页来丰富对Python的学习效果会更好。

HTMLParser。本文中采用的是sgmllib，但是通过查找相关资料发现其实第三方工具BeautifulSoup是最好的，能够处理较差的HTML。所以后面还要接着学习BeautifulSoup。

（2）脚本代码

import urllib2
import sgmllib

class LinksParser(sgmllib.SGMLParser):
	urls = []
	def do_a(self, attrs):
		for name, value in attrs:
			if name == 'href' and value not in self.urls:
				if value.startswith('http'):
					self.urls.append(value)
					print value
			else:
				continue
			return

if __name__ == "__main__":
	# str = ""
	# if str.strip() is '':
		# print "str is None"
	# else:
		# print "str is no None"


	p =  LinksParser()
	f = urllib2.urlopen('http://www.baidu.com')
	value = f.read()
	print value
	p.feed(value)
	
	for url in p.urls:
		print url
		
	f.close()
	p.close()

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。