python爬虫从0到1 -selenium的基本使用

from selenium import webdriver

# 驱动的路径
path = 'chromedriver.exe'

# 模拟真实的浏览器
browser =webdriver.Chrome(path)

url = "https://baidu.com"
browser.get(url)

# page_source用来获取网页源码
content = browser.page_source
print(content)

（四）selenium的元素定位

元素定位即自动化要做的就是模拟鼠标和键盘来操作这些元素，点击，输入等等，操作这些元素前首先要找到他们，webdriver提供很多定位元素的方法。
以下有6种方法：

1.通过id来找到对象

button = browser.find_element_by_id('su')

2.根据标签属性的属性值来找到对象

button = browser.find_element_by_class_name('wd')

3.根据xpath语句来获取对象

button = browser.find_element_by_xpath('//input[@id="su"]')

4.根据bs4语法来获取对象

button = browser.find_element_by_css_selector('#su')

5.根据标签名字来获取对象

button = browser.find_element_by_tag_name('input')

6.通过当前页面中的链接文本来获取对象

button = browser.find_element_by_link_text('新闻')

（五）selenium访问元素信息

例：

python爬虫从0到1 -selenium的基本使用_python

1.获取元素属性的属性值

.get_attribute(’’)

button = browser.find_element_by_id('su')
# 获取元素属性值
content = button.get_attribute('class')
print(content)

运行结果：

python爬虫从0到1 -selenium的基本使用_chrome_02

2.获取标签名

.tag_name

button = browser.find_element_by_id('su')
# 获取标签名
content = button.tag_name
print(content)

运行结果：

python爬虫从0到1 -selenium的基本使用_chrome_03

3.获取元素文本

.text

button = browser.find_element_by_link_text('新闻')
# 获取元素文本
content = button.text
print(content)

运行结果：

python爬虫从0到1 -selenium的基本使用_爬虫_04

（六）selenium中的交互

1.点击

click()

2.输入

send_keys()

3.后退操作

browser.back

4.前进操作

browser.forword()

5.模拟js滚动

move = document.documentElement.scrollTop=‘100000’
执行move代码
browser.execute_script(move)

6.获取网页代码

page_source

7.退出

browser.quit()

实例

from selenium import webdriver
import time

path = 'chromedriver.exe'

browser = webdriver.Chrome(path)

url = 'https://baidu.com'
browser.get(url)

# 在搜索框内输入内容
input = browser.find_element_by_id('kw')
input.send_keys('钢铁是怎样炼成的')
time.sleep(3)

# 找百度一下按钮并点击
button = browser.find_element_by_id('su')
button.click()
time.sleep(2)

# 滑倒底部
move = 'document.documentElement.scrollTop=100000'
browser.execute_script(move)
time.sleep(2)

# 查找下一页按钮并点击
next = browser.find_element_by_xpath('//a[@class="n"]')
next.click()

# 返回
browser.back()

# 前进
browser.forward()

# 退出
browser.quit()