1. Beautiful Soup安装
pip install beautifulsoup

linux要用 pip3 

2. 使用

python爬虫笔记(四)网络爬虫之提取—Beautiful Soup库(1)_linux

使用这个网站:https://python123.io/ws/demo.html

python爬虫笔记(四)网络爬虫之提取—Beautiful Soup库(1)_python_02

# -*- coding: utf-8 -*-
"""
Created on Tue Jan 21 21:29:56 2020

@author: douzi
"""

import requests
from bs4 import BeautifulSoup

r = requests.get("http://python123.io/ws/demo.html")
print(r.text)

demo = r.text

soup = BeautifulSoup(demo, "html.parser")
print(soup.prettify())

python爬虫笔记(四)网络爬虫之提取—Beautiful Soup库(1)_linux_03

3. Beautiful Soup库的理解

python爬虫笔记(四)网络爬虫之提取—Beautiful Soup库(1)_python_04

python爬虫笔记(四)网络爬虫之提取—Beautiful Soup库(1)_a标签_05

python爬虫笔记(四)网络爬虫之提取—Beautiful Soup库(1)_标签属性_06

python爬虫笔记(四)网络爬虫之提取—Beautiful Soup库(1)_html_07

python爬虫笔记(四)网络爬虫之提取—Beautiful Soup库(1)_html_08

4. 获取html的Tag标签

4.1 title标签

import requests
from bs4 import BeautifulSoup

r = requests.get("http://python123.io/ws/demo.html")
print(r.text, "\n")

demo = r.text

soup = BeautifulSoup(demo, "html.parser")
print(soup.title)

python爬虫笔记(四)网络爬虫之提取—Beautiful Soup库(1)_html_09

4.2 a标签(链接标签)

tag = soup.a
print(tag)

python爬虫笔记(四)网络爬虫之提取—Beautiful Soup库(1)_linux_10

python爬虫笔记(四)网络爬虫之提取—Beautiful Soup库(1)_a标签_11

4.3 获取标签属性

python爬虫笔记(四)网络爬虫之提取—Beautiful Soup库(1)_python_12

4.4 标签之间的字符串(NavigableString)

python爬虫笔记(四)网络爬虫之提取—Beautiful Soup库(1)_python_13

4.5 注释

python爬虫笔记(四)网络爬虫之提取—Beautiful Soup库(1)_html_14

5. 小结

python爬虫笔记(四)网络爬虫之提取—Beautiful Soup库(1)_a标签_15

python爬虫笔记(四)网络爬虫之提取—Beautiful Soup库(1)_python_16