python2安装Crypto库 python安装urllib2库

转载

烟雨江南的秋 2023-09-16 19:54:11

文章标签 python2安装Crypto库 python urllib2模块 html 数据服务器 文章分类 Python 后端开发

Python标准库中有许多实用的工具类，但是在具体使用时，标准库文档上对使用细节描述的并不清楚，比如 urllib和urllib2 这个 HTTP 客户端库。这里总结了一些 urllib和urlib2 库的使用细节。

Python urllib 库提供了一个从指定的 URL 地址获取网页数据，然后对其进行分析处理，获取想要的数据。

一、urllib常用函数介绍：

1.　urlopen()函数：即创建一个类文件对象为指定的 url 来读取。

可以使用help(urllib.urlopen)查看函数说明。

urlopen(url, data=None, proxies=None)

Create a file-like object for the specified URL to read from.

urlopen返回一个类文件对象，它提供了如下方法：

read(),readline,readlines,fileno和close：这些方法的使用和文件对象一样；

info(): 返回一个httplib.HTTPMessage对象，表示远程服务器返回的头信息。

getcode():返回Http状态码，如果是http请求，200表示请求成功完成，404表示网址没有找到。

getutl: 返回请求的url地址。

示例：

>>>import urllib
>>>baidu = urllib.urlopen('http://www.baidu.com')
>>>baidu.read()
>>> print baidu.info()
输出：
Date: Fri, 24 Apr 2015 05:41:40 GMT
Server: Apache
Cache-Control: max-age=86400
Expires: Sat, 25 Apr 2015 05:41:40 GMT
Last-Modified: Tue, 12 Jan 2010 13:48:00 GMT
ETag: "51-4b4c7d90"
Accept-Ranges: bytes
Content-Length: 81
Connection: Close
Content-Type: text/html
>>>for line in baidu:            #等价于read(),就像在操作本地文件，将网页数据打印出来。
print line,
baidu.close()

补充：

urllib.open的参数有特别要示，要遵循一些网络协议，比如http,ftp，也就是说在网址的开头必须要有http://或ftp://如：

urllib.urlopen('http://www.baidu.com')

urllib.urlopen('ftp://192.168.1.200')

若要使用本地文件，就需要在前面加filt关键字，如：

urllib.urlopen('file:nowangic.py')

urllib.urlopen('file:F:\test\helloworld.py')

2.　urlretrieve()函数：直接将远程数据下载到本地。

可以使用help(urllib.urlretvieve)查看函数说明

Help on function urlretrieve in module urllib:

urlretrieve(url, filename=None, reporthook=None, data=None)

参数 finename 指定了保存本地路径(如果参数未指定，urllib会生成一个临时文件保存数据。)

参数 reporthook 是一个回调函数，当连接上服务器、以及相应的数据块传输完毕时会触发该回调，我们可以利用这个回调函数来显示当前的下载进度。

参数 data 指 post 到服务器的数据，该方法返回一个包含两个元素的(filename, headers)元组，filename 表示保存到本地的路径，header 表示服务器的响应头。

示例1：

>>>urllib.urlretrieve('http://www.soso.com','c://soso.html')

('c://soso.html', )

示例2：下面是urlretrieve()下载文件实例，可以显示下载进度。

#coding:utf-8
import urllib
def cbk(a,b,c):
"""
@a: 已经下载的数据块
@b: 数据块的大小
@c: 远程文件的大小
"""
per = 100.0 *a*b/c
if per >100:
per = 100
print '#%d%%'% per
url = 'http://www.soso.com'
local = 'c://test//soso.html'
urllib.urlretrieve(url,local,cbk)
示例3：爬虫练习：
#-*-coding:utf-8-*-
""" 爬虫练习
Date:06-15-2015
"""
import urllib
import re
#获取指定url网页内容
def getHtml(url):
page = urllib.urlopen(url)
html = page.read()
return html
#利用正则表达式将指定的图片下载
def getImg(html):
reg = 'src="(.*?\.jpg)" pic_ext'
regimg = re.compile(reg)
imglist = re.findall(regimg,html)
x = 0
for img in imglist:
urllib.urlretrieve(img,'%s.jpg' % x)
x+=1
Html = getHtml('http://tieba.baidu.com/p/3825178610')
Img = getImg(Html)

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。