pyppeteer 类似selenium,可以操作Chrome浏览器

文档:​​https://miyakogi.github.io/pyppeteer/index.html​​​

github: ​​https://github.com/miyakogi/pyppeteer​​​

安装

环境要求:

python 3.6+

pip install pyppeteer

代码示例

# -*- coding: utf-8 -*-

import asyncio
from pyppeteer import launch
from pyquery import PyQuery as pq

# 最好指定一下自己浏览器的位置,如果不指定会自动下载,太慢了...
executable_path = "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"


# 示例一: 渲染页面
async def crawl_page():
# 打开浏览器
browser = await launch(executablePath=executable_path)

# 打开tab
page = await browser.newPage()

# 输入网址回车
await page.goto('http://quotes.toscrape.com/js/')

# 获取内容并解析
doc = pq(await page.content())
print('Quotes:', doc('.quote').length)

# 关闭浏览器
await browser.close()


# 示例二:截图,保存pdf,执行js
async def save_pdf():
browser = await launch(executablePath=executable_path)
page = await browser.newPage()
await page.goto('http://quotes.toscrape.com/js/')

# 网页截图保存
await page.screenshot(path='example.png')

# 网页导出 PDF 保存
await page.pdf(path='example.pdf')

# 执行 JavaScript
dimensions = await page.evaluate('''() => {
return {
width: document.documentElement.clientWidth,
height: document.documentElement.clientHeight,
deviceScaleFactor: window.devicePixelRatio,
}
}''')

print(dimensions)

await browser.close()


if __name__ == '__main__':
asyncio.get_event_loop().run_until_complete(crawl_page())
# asyncio.get_event_loop().run_until_complete(save_pdf())

异步编程,这个关键字太多了,看的眼花缭乱


参考
​别只用 Selenium,新神器 Pyppeteer 绕过淘宝更简单!​