简单的Python爬虫示例

原创

mb6469fb71850ad 2023-05-21 19:14:40 ©著作权

文章标签 Python HTML html 文章分类 网络安全

©著作权归作者所有：来自51CTO博客作者mb6469fb71850ad的原创作品，请联系作者获取转载授权，否则将追究法律责任

这是一个简单的Python爬虫示例。

import requests

from bs4 import BeautifulSoup

url = "https://www.example.com/"

response = requests.get(url)

soup = BeautifulSoup(response.content, "html.parser")

links = []

for link in soup.find_all("a"):

href = link.get("href")

if href and href.startswith("http"):

links.append(href)

print(links)

这个程序使用了Python中的requests库和BeautifulSoup库来解析HTML文档并提取出所有的链接。首先，我们使用requests.get()方法来获取网页的响应。然后，我们把响应内容传递给BeautifulSoup对象，使用find_all()方法和"a"标签来查找所有的链接。最后，我们将所有的链接添加到一个列表中，并输出这个列表。

请注意法律规定，确保你的爬虫行为合法并遵守道德准则。