python Markdown文档是什么样的 python markdown库

转载

ganmaobuhaowan 2024-08-01 15:04:15

文章标签 python windows miniconda List xml 文章分类 Python 后端开发

起因

方便查看
~~不想老是放在本地，找的麻烦~~
下面是些各种奇奇怪怪的神奇操作，通俗易懂
不定期更新

网站

1.1 删除Google Chrome上自带的8个标签页

自带的标签页是真滴不好看

流程

下载 ChromePak V5 工具

将Google Chrome版本号目录下的resources.pak拷贝到ChromePak V5同一目录

打开Windows PowerShell（windows自带），转到ChromePak V5路径下

输入pak_tools.exe的路径-c=unpack -fresources.pak的路径，注意有空格，回车，进行解压

进入解压目录resources\unknown，找到 297（我是66.0版本），搜索most-visited将标签下的内容注释掉

输入pak_tools.exe的路径-c=repack -f= resources.pak的路径，注意有空格，回车，进行打包

回到替换Google Chrome版本号目录下的resources.pak

2.1 Python文件打包

移植到其他电脑也能使用自己的脚本，是种快感

流程

下载 pyinstaller 工具，可 Github

解压后 pyinstall-develop 文件将放在Python\Scripts目录下（存放用户 pip 各种小库库的地方）

下载 pywin32 工具，安装界面很Win7

CMD进入Scripts目录，输入python pywin32_postinstall.py -install

在指令框输入cd ./pyinstaller-develop，然后执行python setup.py install

将写好的py脚本Copy到pyinstaller-develop目录下，在指令框输入python pyinstaller.py -F XXX.py，回车，等待等待… …

将会生成一个 XXX 文件夹在当前工作目录

在 XXX 文件夹的 dist 目录下，能看到 XXX.exe 这就是我们打包好的脚本了
其他电脑端也能使用，文件的大小和脚本引入了多少个库有关（滑稽）

Python文件打包

2.2 Pip镜像

快就完事了

流程

pip install -i https://pypi.tuna.tsinghua.edu.cn/simple xxxx

xxxx为你所需下载的内容

清华镜像

conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch/
conda config --set show_channel_urls yes
pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple

2.3 爬虫代理

伪装头部，获取代理类

流程

def init(self):里放置目标的 Url 和伪装的头文件
def getIpList(self): 爬去代理网站上的IP地址和端口号
def getRandomIp(self, ipList):将获得的代理写入proxies变量中，将获取带的地址和端口号加上Http /Https
def running(self):爬取目标网站上的内容，get方法的内容修改为：requests.get(url=url, headers=self.headers, proxies=proxies)

import lxml, os, requests, random, time
from bs4 import BeautifulSoup

class ImageCrawling():
    def __init__(self):
        self.url = 'https://xkcd.in'

        #伪装头
        self.headers = {
            'Connection': 'Keep-Alive',
            'Accept': 'text/html, application/xhtml+xml, */*',
            'Accept-Language': 'zh-Hans-CN;q=0.5,zh-Hans;q=0.3',
            'User-Agent':'Mozilla/5.0 (Linux; U; Android 6.0; zh-CN; MZ-m2 note Build/MRA58K) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/40.0.2214.89 MZBrowser/6.5.506 UWS/2.10.1.22 Mobile Safari/537.36'
        }
    
    def getIpList(self):
        print('获取代理IP')
        url = 'https://www.xicidaili.com/nn/'
        html = requests.get(url=url, headers=self.headers).text
        #使用lxml库整理res.text
        soup = BeautifulSoup(html, 'lxml')
        #找到所有的ip_list tr,多个代理块
        ips = soup.find(id='ip_list').find_all('tr')
        ipList = []
        for i in range(1, len(ips)):
            tbs = ips[i].find_all('td')
            #http://XXX.XXX.XXX.XXX:YYYY
            ipList.append(tbs[5].text + '://' + tbs[1].text+ ':' + tbs[2].text)
        print('捕获代理IP')
        return ipList
    
    def getRandomIp(self, ipList):
        print("正在设置随机代理...")
        proxyList = ipList[:]
        proxyIp = random.choice(proxyList)
        if proxyIp[:5] == 'HTTPS':
            proxies = {'https': proxyIp}
        else:
            proxies = {'http': proxyIp}
        print("代理设置成功.")
        return proxies

    def running(self):
        print('Start!')
        map = 'xkcdchine'
        os.makedirs(map, exist_ok=True)
        startTime = time.time()
        url = self.url

        ipList = self.getIpList()
        proxies = self.getRandomIp(ipList)
        #计数
        i = 1
        while not url.endswith('#'): 
            res = requests.get(url=url, headers=self.headers, proxies=proxies)
            res.raise_for_status()
            soup = BeautifulSoup(res.text, 'lxml')
            comicElem = soup.select('.comic-body img')

            if comicElem == []:
                print('could not find comc image.')
            else:
                comicUrl = 'https://xkcd.in' + comicElem[0].get('src')
                print('downloading image %s...' % (comicUrl))
                res = requests.get(comicUrl)
                res.raise_for_status()
                #
                imageFile = open(os.path.join(map, os.path.basename(comicUrl)), 'wb')
                for chunk in res.iter_content(1200000):
                    imageFile.write(chunk)
                imageFile.close()
            
            prevLink = soup.select('.nextLink a')[0]
            url = 'https://xkcd.in/' + prevLink.get('href')
            i += 1
            if i%15 == 0:
                print("休眠2s")
                time.sleep(10)
            if i%60==0:
                print("更换代理IP")
                proxies = self.getRandomIp(ipList)
if __name__ == "__main__":
    comic = ImageCrawling()
    comic.running()

3.1 Markdown使用

用到看就完事了

流程

在菜鸟教程有非常详细 Markdown 使用教程，哪里不会点哪里菜鸟教程上Md使用
Markdown 上还支持 Html 上标签，可以修改字体、颜色、换行等等标签的使用颜色的使用

附录

2020/04/19/23/38 1.1&& 2.1 && 2.2 && 3.1
2020/05/06/22/14 2.3

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：emmc字库刷写软件字库刷坏

下一篇：java 去除json数组里的符号 js去掉json数组指定元素

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯

python Markdown文档是什么样的 python markdown库

python Markdown文档是什么样的 python markdown库

起因

网站

1.1 删除Google Chrome上自带的8个标签页

流程

2.1 Python文件打包

流程

2.2 Pip镜像

流程

清华镜像

2.3 爬虫代理

流程

3.1 Markdown使用

流程

附录

51CTO博客