给定wget命令的Python等效项

我正在尝试创建一个与此wget命令具有相同功能的Python函数:

wget -c --read-timeout=5 --tries=0 "$URL"

--tries=0-如果下载中断,则从上次中断的地方继续。

--tries=0-如果5秒钟内没有新数据输入,请放弃并重试。 给定-c,这意味着它将从上次中断的地方重试。

--tries=0-永远重试。

串联使用的这三个参数导致下载不会失败。

我想在我的Python脚本中复制这些功能,但是我不知道从哪里开始...

Soviero asked 2020-01-25T02:52:30Z

7个解决方案

76 votes

还有一个很好的Python模块,名为out,非常易于使用。 在这里找到。

这证明了设计的简单性:

>>> import wget
>>> url = 'http://www.futurecrew.com/skaven/song_files/mp3/razorback.mp3'
>>> filename = wget.download(url)
100% [................................................] 3841532 / 3841532>
>> filename
'razorback.mp3'

请享用。

但是,如果out不起作用(某些PDF文件有问题),请尝试此解决方案。

编辑:您还可以使用out参数来使用自定义输出目录而不是当前工作目录。

>>> output_directory = 
>>> filename = wget.download(url, out=output_directory)
>>> filename
'razorback.mp3'
Blairg23 answered 2020-01-25T02:53:26Z
28 votes

urllib.request应该可以工作。只需在一会儿(未完成)循环中进行设置,检查本地文件是否已存在,是否发送带有RANGE标头的GET,并指定下载本地文件的距离。确保使用read()附加到本地文件,直到发生错误。

这也可能是Python urllib2的副本,当网络重新连接时,恢复下载不起作用

Eugene K answered 2020-01-25T02:52:49Z
15 votes
import urllib2
attempts = 0
while attempts < 3:
try:
response = urllib2.urlopen("http://example.com", timeout = 5)
content = response.read()
f = open( "local/index.html", 'w' )
f.write( content )
f.close()
break
except urllib2.URLError as e:
attempts += 1
print type(e)
Pujan Srivastava answered 2020-01-25T02:53:42Z
9 votes

我必须在没有将正确的选项编译到wget中的linux版本上执行类似的操作。 本示例用于下载内存分析工具“ guppy”。 我不确定它是否重要,但是我将目标文件的名称与url目标名称保持一致...

这是我想出的:

python -c "import requests; r = requests.get('https://pypi.python.org/packages/source/g/guppy/guppy-0.1.10.tar.gz') ; open('guppy-0.1.10.tar.gz' , 'wb').write(r.content)"

那是一线,这更具可读性:

import requests
fname = 'guppy-0.1.10.tar.gz'
url = 'https://pypi.python.org/packages/source/g/guppy/' + fname
r = requests.get(url)
open(fname , 'wb').write(r.content)

这适用于下载tarball。 我能够解压缩该软件包,并在下载后将其下载。

编辑:

为了解决一个问题,这是一个在STDOUT上打印进度条的实现。 没有2717626081665865876992软件包,可能有一种更便携的方法来执行此操作,但这已在我的机器上经过测试,可以正常工作:

#!/usr/bin/env python
from clint.textui import progress
import requests
fname = 'guppy-0.1.10.tar.gz'
url = 'https://pypi.python.org/packages/source/g/guppy/' + fname
r = requests.get(url, stream=True)
with open(fname, 'wb') as f:
total_length = int(r.headers.get('content-length'))
for chunk in progress.bar(r.iter_content(chunk_size=1024), expected_size=(total_length/1024) + 1):
if chunk:
f.write(chunk)
f.flush()
Will Charlton answered 2020-01-25T02:54:24Z
6 votes

我经常发现更简单,更可靠的解决方案是在python中简单地执行终端命令。 在您的情况下:

import os
url = 'https://www.someurl.com'
os.system(f"""wget -c --read-timeout=5 --tries=0 "{url}"""")
Yohan Obadia answered 2020-01-25T02:54:45Z
1 votes

让我通过线程来改进示例,以防您要下载许多文件。

import math
import random
import threading
import requests
from clint.textui import progress
# You must define a proxy list
# I suggests https://free-proxy-list.net/
proxies = {
0: {'http': 'http://34.208.47.183:80'},
1: {'http': 'http://40.69.191.149:3128'},
2: {'http': 'http://104.154.205.214:1080'},
3: {'http': 'http://52.11.190.64:3128'}
}
# you must define the list for files do you want download
videos = [
"https://i.stack.imgur.com/g2BHi.jpg",
"https://i.stack.imgur.com/NURaP.jpg"
]
downloaderses = list()
def downloaders(video, selected_proxy):
print("Downloading file named {} by proxy {}...".format(video, selected_proxy))
r = requests.get(video, stream=True, proxies=selected_proxy)
nombre_video = video.split("/")[3]
with open(nombre_video, 'wb') as f:
total_length = int(r.headers.get('content-length'))
for chunk in progress.bar(r.iter_content(chunk_size=1024), expected_size=(total_length / 1024) + 1):
if chunk:
f.write(chunk)
f.flush()
for video in videos:
selected_proxy = proxies[math.floor(random.random() * len(proxies))]
t = threading.Thread(target=downloaders, args=(video, selected_proxy))
downloaderses.append(t)
for _downloaders in downloaderses:
_downloaders.start()
Te ENe Te answered 2020-01-25T02:55:05Z
1 votes

像py一样简单:

class Downloder():
def download_manager(self, url, destination='Files/DownloderApp/', try_number="10", time_out="60"):
#threading.Thread(target=self._wget_dl, args=(url, destination, try_number, time_out, log_file)).start()
if self._wget_dl(url, destination, try_number, time_out, log_file) == 0:
return True
else:
return False
def _wget_dl(self,url, destination, try_number, time_out):
import subprocess
command=["wget", "-c", "-P", destination, "-t", try_number, "-T", time_out , url]
try:
download_state=subprocess.call(command)
except Exception as e:
print(e)
#if download_state==0 => successfull download
return download_state
pd shah answered 2020-01-25T02:55:24Z