带你用Python爬取代理

第一步
导入库:

import requests,xml.etree.ElementTree as ET

说明
Requests:请求库,用于请求API网址
xml.etree.ElementTree:用于解析返回值时,解析XML数据
第二步
构造请求参数

Arguments={
    "https":input("是否支持HTTPS,0,不限;1,HTTPS代理,请输入:"),
    "type":input("代理类型,0,不限;1,透明代理;2,匿名代理;3,高匿代理,请输入:"),
    "format":input("返回格式,text,文本;json,JSON;xml,XML,请输入:"),
    "token":你的Token
}

注意
没有token的请去
proxy.newday.me注册用户
然后,把账号token值填入这里
第三步
判断用户输入的返回格式:
先判断json

if Arguments["format"]=="json":

开始请求:

Response=requests.get("http://api.newday.me/proxy/extract",Arguments).json()

说明
Arguments:这是前面定义的传参
json:获取json格式的数据
第四步
解析返回值(json数据)

Data=Response["data"]["list"]
Datum=[]
for i in range(len(Data)):
        Datum.append(Data[i])

说明
Response:是上面请求的返回值对象
Data:用于存储所有代理数据
Datum:列表,每条数据都是一个代理的详细数据
第五步
将数据写入文件

for i in range(len(Data)):
    Data=Datum[i]
    with open("./Proxys/"+str(i)+".txt","w") as f:
        f.write("ID:"+str(i)+"\n")
        f.write("Address:"+Data["ip"]+"\n")
        f.write("Port:"+str(Data["port"])+"\n")
        f.write("Type:"+str(Data["type"])+"\n")
        f.write("Https:"+str(Data["https"])+"\n")
        f.write("Duration:"+Data["duration"]+"\n")
        f.write("Percent:"+str(Data["percent"])+"\n")
        f.write("Time:"+str(Data["time"])+"\n")
    print("第{}条数据写入完毕!".format(str(i+1)))

第六步
判断用户返回值格式(text)

elif Arguments["format"]=="text":

请求:

Response=requests.get("http://api.newday.me/proxy/extract",Arguments).text

数据格式化(转换列表)

Data=" ".join(Response.split())
Datum=Data.split(" ")
for i in range(len(Datum)):
    a=Datum[i]
    b=a.split(":")

说明
Data:将请求到的数据用空格分隔
Datum:将Data用空格分离数据进列表
i:Datum的长度
a:提取Data里的第i条数据
b:将a用冒号分离数据进列表

第七步
将数据保存进文件:

for i in range(len(b)):
    with open("./Proxys/"+str(i)+".txt","w") as f:
        f.write("Address:"+b[i]+"\n")
        f.write("Port:"+b[i]+"\n")
    print("第{}条数据写入完毕!".format(str(i+1)))

第八步,也是本文重点
1.判断用户返回值类型(xml)

elif Arguments["format"]=="xml":

2.获取数据

Response=requests.get("http://api.newday.me/proxy/extract",Arguments).text

3.创建XML处理程序变量

root=ET.fromstring(Response)

4.定义变量

Data=[]
i=0

说明
Data:数据存储变量
i:循环变量

5.将数据转换为列表

for iterm in root.iterfind("data/list/item"):
    Data.append(iterm)

6.保存数据

for iterm in Data:
        ip=iterm.findtext("ip")
        port=iterm.findtext("port")
        type=iterm.findtext("type")
        https=iterm.findtext("https")
        duration=iterm.findtext("duration")
        percent=iterm.findtext("percent")
        time=iterm.findtext("time")
        with open("./Proxys/"+str(i)+".txt","w") as f:
            f.write("ID:"+str(i)+"\n")
            f.write("Address:"+ip+"\n")
            f.write("Port:"+str(port)+"\n")
            f.write("Type:"+str(type)+"\n")
            f.write("Https:"+str(https)+"\n")
            f.write("Duration:"+duration+"\n")
            f.write("Percent:"+str(percent)+"\n")
            f.write("Time:"+str(time)+"\n")
        print("第{}条数据写入完毕!".format(str(i+1)))
        i+=1

源代码

import requests,xml.etree.ElementTree as ET
Arguments={
    "https":input("是否支持HTTPS,0,不限;1,HTTPS代理,请输入:"),
    "type":input("代理类型,0,不限;1,透明代理;2,匿名代理;3,高匿代理,请输入:"),
    "format":input("返回格式,text,文本;json,JSON;xml,XML,请输入:"),
    "token":你的token
}
if Arguments["format"]=="json":
    Response=requests.get("http://api.newday.me/proxy/extract",Arguments).json()
    Data=Response["data"]["list"]
    Datum=[]
    for i in range(len(Data)):
        Datum.append(Data[i])
    for i in range(len(Data)):
        Data=Datum[i]
        with open("./Proxys/"+str(i)+".txt","w") as f:
            f.write("ID:"+str(i)+"\n")
            f.write("Address:"+Data["ip"]+"\n")
            f.write("Port:"+str(Data["port"])+"\n")
            f.write("Type:"+str(Data["type"])+"\n")
            f.write("Https:"+str(Data["https"])+"\n")
            f.write("Duration:"+Data["duration"]+"\n")
            f.write("Percent:"+str(Data["percent"])+"\n")
            f.write("Time:"+str(Data["time"])+"\n")
        print("第{}条数据写入完毕!".format(str(i+1)))
elif Arguments["format"]=="text":
    Response=requests.get("http://api.newday.me/proxy/extract",Arguments).text
    Data=" ".join(Response.split())
    Datum=Data.split(" ")
    for i in range(len(Datum)):
        a=Datum[i]
        b=a.split(":")
    for i in range(len(b)):
        with open("./Proxys/"+str(i)+".txt","w") as f:
            f.write("Address:"+b[i]+"\n")
            f.write("Port:"+b[i]+"\n")
        print("第{}条数据写入完毕!".format(str(i+1)))
elif Arguments["format"]=="xml":
    Response=requests.get("http://api.newday.me/proxy/extract",Arguments).text
    root=ET.fromstring(Response)
    Data=[]
    i=0
    for iterm in root.iterfind("data/list/item"):
        Data.append(iterm)
    for iterm in Data:
        ip=iterm.findtext("ip")
        port=iterm.findtext("port")
        type=iterm.findtext("type")
        https=iterm.findtext("https")
        duration=iterm.findtext("duration")
        percent=iterm.findtext("percent")
        time=iterm.findtext("time")
        with open("./Proxys/"+str(i)+".txt","w") as f:
            f.write("ID:"+str(i)+"\n")
            f.write("Address:"+ip+"\n")
            f.write("Port:"+str(port)+"\n")
            f.write("Type:"+str(type)+"\n")
            f.write("Https:"+str(https)+"\n")
            f.write("Duration:"+duration+"\n")
            f.write("Percent:"+str(percent)+"\n")
            f.write("Time:"+str(time)+"\n")
        print("第{}条数据写入完毕!".format(str(i+1)))
        i+=1