python爬取豆瓣电影第一页数据and使用with open() as读写文件

转载

mob60475705c8db 2021-11-02 11:26:00

文章标签 数据 json chrome safari html 文章分类 代码人生

# _*_ coding : utf-8 _*_
# @Time : 2021/11/2 9:58
# @Author : 秋泊酱
# @File : 获取豆瓣电影第一页
# @Project : 爬虫案例


# get请求
# 获取豆瓣电影的第一页的数据，并且保存到本地

import urllib.request

url = 'https://movie.douban.com/j/chart/top_list?type=5&interval_id=100%3A90&action=&start=0&limit=20'


headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36'
}

# (1) 请求对象的定制
request = urllib.request.Request(url=url,headers=headers)

# （2）获取响应的数据
response = urllib.request.urlopen(request)
content = response.read().decode('utf-8')

# (3) 数据下载到本地
# open方法默认情况下使用的是gbk的编码  如果我们要想保存汉字 那么需要在open方法中指定编码格式为utf-8
# encoding = 'utf-8'
# fp = open('douban.json','w',encoding='utf-8')
# fp.write(content)

with open('douban1.json','w',encoding='utf-8') as fp:
    fp.write(content)

python爬取豆瓣电影第一页数据and使用with open() as读写文件_safari

python爬取豆瓣电影第一页数据and使用with open() as读写文件_chrome_02

文件对象.readline() 方法用于从文件读取整行，包括 "\n" 字符。如果指定了一个非负数的参数，则返回指定大小的字节数，包括 "\n" 字符

文件读写时有可能产生IOError，一旦出错，后面的file.close()就不会调用。

file = open("test.txt", "r", encoding='UTF-8')
for line in file.readlines():
     print (line)
file.close()

python爬取豆瓣电影第一页数据and使用with open() as读写文件_chrome_03

所以，为了保证无论是否出错都能正确地关闭文件，我们可以使用try: ...except: ... finally: ... 捕捉异常、处理异常来实现。

# _*_ coding : utf-8 _*_
# @Time : 2021/11/2 10:58
# @Author : 秋泊酱
# @File : 无法关闭
# @Project : 爬虫案例

file= open("../test.txt","r")
try:
     for line in file.readlines():
         print(line)
except:
     print("error")

# finally 语句无论是否发生异常都将执行最后的代码
finally:
     file.close()

python爬取豆瓣电影第一页数据and使用with open() as读写文件_html_04

每次都这么写，太繁琐。

with open（）as 会在语句结束自动关闭文件，即便出现异常

语法：

with open(文件名, 模式) as 文件对象:
    文件对象.方法()

# _*_ coding : utf-8 _*_
# @Time : 2021/11/2 11:21
# @Author : 秋泊酱
# @File : with open() as
# @Project : 爬虫案例

with open("../test.txt","r", encoding='utf-8') as file:
    for line in file.readlines():
        print (line)