python 文本转换为Markdown格式 python将markdown转换为html

转载

数据小探 2023-11-06 23:50:12

文章标签 python markdown html HTML css 文章分类 Python 后端开发

文章目录

问题描述
解决方案
引入更多扩展
引入数学包
网页转PDF
封装
额外

代码行号
进度条

参考文献
显示效果

问题描述

将MarkDown转PDF

本文比较麻烦，还可以尝试 Pandoc

本文全部代码及其CSS下载地址

解决方案

使用 Typora
结合 wkhtmltopdf 使用 markdown 库和 pdfkit 库

1. 安装 mdutils

pip install markdown
pip install pdfkit

2. 安装 wkhtmltopdf

wkhtmltopdf 下载地址

添加到环境变量 Path 中（可使用绝对路径）

python 文本转换为Markdown格式 python将markdown转换为html_HTML

3. 代码

test.md 参考：

作业部落默认 Markdown 模板

import pdfkit
from markdown import markdown

input_filename = 'test.md'
output_filename = 'test.pdf'

with open(input_filename, encoding='utf-8') as f:
    text = f.read()

html = markdown(text, output_format='html')  # MarkDown转HTML
pdfkit.from_string(html, output_filename, options={'encoding': 'utf-8'})  # HTML转PDF

# wkhtmltopdf = r'C:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe'  # 指定wkhtmltopdf
# configuration = pdfkit.configuration(wkhtmltopdf=wkhtmltopdf)
# pdfkit.from_string(html, output_filename, configuration=configuration, options={'encoding': 'utf-8'})  # HTML转PDF

若没有配置环境变量，使用注释里的代码

4. 效果

不支持标注
不支持表格
不支持LaTeX
不支持代码块
不支持流程图、序列图、甘特图

python 文本转换为Markdown格式 python将markdown转换为html_markdown_02

引入更多扩展

test.md

|项目|价格|数量|
|---|---|---|
|计算机|$1600|5|
|手机|$12|12|
|管线|$1|234|

无法正确渲染

python 文本转换为Markdown格式 python将markdown转换为html_html_03

引入扩展 tables

import pdfkit
from markdown import markdown

text = '''|项目|价格|数量|
|---|---|---|
|计算机|$1600|5|
|手机|$12|12|
|管线|$1|234|'''
html = markdown(text, output_format='html', extensions=['tables'])  # MarkDown转HTML
pdfkit.from_string(html, 'test.pdf', options={'encoding': 'utf-8'})  # HTML转PDF

python 文本转换为Markdown格式 python将markdown转换为html_html_04

详细阅读：Available Extensions

引入数学包

HTML 引入 KaTeX
markdown库启用mdx_math

安装

pip install python-markdown-math

import pdfkit
from markdown import markdown

input_filename = 'test.md'
output_filename = 'test.pdf'
html = '<!DOCTYPE html><body><link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/katex/dist/katex.min.css" crossorigin="anonymous"><script src="https://cdn.jsdelivr.net/npm/katex/dist/katex.min.js" crossorigin="anonymous"></script><script src="https://cdn.jsdelivr.net/npm/katex/dist/contrib/mathtex-script-type.min.js" defer></script>{}</body></html>'
text = '$$E=mc^2$$'
text = markdown(text, output_format='html', extensions=['mdx_math'])  # MarkDown转HTML
html = html.format(text)
pdfkit.from_string(html, output_filename, options={'encoding': 'utf-8'})  # HTML转PDF

详细阅读：第三方扩展

网页转PDF

生成HTML代码效果不完美，可以使用作业部落的导出HTML功能，再转PDF

import pdfkit

pdfkit.from_file('test.html', 'test.pdf', options={'encoding': 'utf-8'})  # HTML转PDF

封装

使用官方扩展和部分第三方扩展

安装库

pip install python-markdown-math
pip install pygments
pip install pymdown-extensions

下载CSS

github-markdown.css 改名为 markdown.css
codehilite.css 生成命令 pygmentize -S default -f html -a .highlight > codehilite.css
linenum.css

[data-linenos]:before {
  content: attr(data-linenos);
}

tasklist.css

.markdown-body .task-list-item {
  list-style-type: none !important;
}

.markdown-body .task-list-item input[type="checkbox"] {
  margin: 0 4px 0.25em -20px;
  vertical-align: middle;
}

代码

import os
import pdfkit
from markdown import markdown
from pymdownx import superfences


def markdown2pdf(input, output='test.pdf', encoding='utf-8', savehtml=False):
    html = '''
    <!DOCTYPE html>
        <head>
            <meta charset="UTF-8">
            <meta name="viewport" content="width=device-width, initial-scale=1, minimal-ui">
            <title>{}</title>
            <link rel="stylesheet" href="linenum.css">
            <link rel="stylesheet" href="markdown.css">
            <link rel="stylesheet" href="tasklist.css">
            <link rel="stylesheet" href="codehilite.css">
            <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/katex/dist/katex.min.css" crossorigin="anonymous">
            <script src="https://unpkg.com/mermaid@8.7.0/dist/mermaid.min.js"></script>
            <script src="https://cdn.jsdelivr.net/npm/katex/dist/katex.min.js" crossorigin="anonymous"></script>
            <script src="https://cdn.jsdelivr.net/npm/katex/dist/contrib/mathtex-script-type.min.js" defer></script>
        </head>
        <body>
            <article class="markdown-body">
                {}
            </article>
        </body>
    </html>
    '''

    with open(input, encoding=encoding) as f:
        text = f.read()

    extensions = [
        'toc',  # 目录，[toc]
        'extra',  # 缩写词、属性列表、释义列表、围栏式代码块、脚注、在HTML的Markdown、表格
    ]
    third_party_extensions = [
        'mdx_math',  # KaTeX数学公式，$E=mc^2$和$$E=mc^2$$
        'markdown_checklist.extension',  # checklist，- [ ]和- [x]
        'pymdownx.magiclink',  # 自动转超链接，
        'pymdownx.caret',  # 上标下标，
        'pymdownx.superfences',  # 多种块功能允许嵌套，各种图表
        'pymdownx.betterem',  # 改善强调的处理(粗体和斜体)
        'pymdownx.mark',  # 亮色突出文本
        'pymdownx.highlight',  # 高亮显示代码
        'pymdownx.tasklist',  # 任务列表
        'pymdownx.tilde',  # 删除线
    ]
    extensions.extend(third_party_extensions)
    extension_configs = {
        'mdx_math': {
            'enable_dollar_delimiter': True  # 允许单个$
        },
        'pymdownx.superfences': {
            "custom_fences": [
                {
                    'name': 'mermaid',  # 开启流程图等图
                    'class': 'mermaid',
                    'format': superfences.fence_div_format
                }
            ]
        },
        'pymdownx.highlight': {
            'linenums': True,  # 显示行号
            'linenums_style': 'pymdownx-inline'  # 代码和行号分开
        },
        'pymdownx.tasklist': {
            'clickable_checkbox': True,  # 任务列表可点击
        }
    }  # 扩展配置
    title = '.'.join(os.path.basename(input).split('.')[:-1])
    text = markdown(text, output_format='html', extensions=extensions,
                    extension_configs=extension_configs)  # MarkDown转HTML
    html = html.format(title, text)
    print(html)
    if savehtml:
        with open(input.replace('.md', '.html'), 'w', encoding=encoding) as f:
            f.write(html)
    pdfkit.from_string(html, output, options={'encoding': 'utf-8'})  # HTML转PDF


if __name__ == '__main__':
    markdown2pdf('test.md', 'test.pdf', savehtml=True)
    print('完成')

效果看文末

PS：

缺什么找对应扩展即可，若无可自行编写。
原理为MarkDown转HTML转PDF，pdfkit效果并不好，所以效果也有限。
可使用 Adobe Acrobat Pro 转换，但流程图等转换依旧不完美。

额外

这部分非通用MarkDown

代码行号

linenum.md

```python
if __name__ == '__main__':
    print('Hello World!')
```

```python
import math

print(math.pi)  # 圆周率
```

linenum.css

[data-linenos]:before {
  content: attr(data-linenos);
}

linenum.py

from markdown import markdown

filename = 'linenum.md'
html = '''
<!DOCTYPE html>
    <head>
        <meta charset="UTF-8">
        <meta name="viewport" content="width=device-width, initial-scale=1, minimal-ui">
        <title>linenum</title>
        <link rel="stylesheet" href="codehilite.css">
        <link rel="stylesheet" href="linenum.css">
    </head>
    <body>
        <article class="markdown-body">
            {}
        </article>
    </body>
</html>
'''
encoding = 'utf-8'
with open(filename, encoding=encoding) as f:
    text = f.read()

extensions = [
    'pymdownx.superfences',  # 多种块功能允许嵌套，各种图表
    'pymdownx.highlight'  # 高亮显示代码
]
extension_configs = {
    'pymdownx.highlight': {
        'linenums': True,  # 显示行号
        'linenums_style': 'pymdownx-inline'  # 代码和行号分开
    }
}  # 扩展配置
text = markdown(text, output_format='html', extensions=extensions, extension_configs=extension_configs)  # MarkDown转HTML
html = html.format(text)
print(html)
with open(filename.replace('.md', '.html'), 'w', encoding=encoding) as f:
    f.write(html)
# pdfkit.from_string(html, output, options={'encoding': 'utf-8'})  # HTML转PDF

print('完成')

效果

python 文本转换为Markdown格式 python将markdown转换为html_HTML_05

这样直接复制代码不会带有行号

进度条

progressbar.css

.progress-label {
  position: absolute;
  text-align: center;
  font-weight: 700;
  width: 100%;
  margin: 0;
  line-height: 1.2rem;
  white-space: nowrap;
  overflow: hidden;
}

.progress-bar {
  height: 1.2rem;
  float: left;
  background-color: #2979ff;
}

.progress {
  display: block;
  width: 100%;
  margin: 0.5rem 0;
  height: 1.2rem;
  background-color: #eeeeee;
  position: relative;
}

.progress.thin {
  margin-top: 0.9rem;
  height: 0.4rem;
}

.progress.thin .progress-label {
  margin-top: -0.4rem;
}

.progress.thin .progress-bar {
  height: 0.4rem;
}

.progress-100plus .progress-bar {
  background-color: #00e676;
}

.progress-80plus .progress-bar {
  background-color: #fbc02d;
}

.progress-60plus .progress-bar {
  background-color: #ff9100;
}

.progress-40plus .progress-bar {
  background-color: #ff5252;
}

.progress-20plus .progress-bar {
  background-color: #ff1744;
}

.progress-0plus .progress-bar {
  background-color: #f50057;
}

progressbar.py

from markdown import markdown

filename = 'progressbar.md'
html = '''
<!DOCTYPE html>
    <head>
        <meta charset="UTF-8">
        <meta name="viewport" content="width=device-width, initial-scale=1, minimal-ui">
        <title>progressbar</title>
        <link rel="stylesheet" href="progressbar.css">
    </head>
    <body>
        {}
    </body>
</html>
'''
encoding = 'utf-8'
with open(filename, encoding=encoding) as f:
    text = f.read()

extensions = [
    'markdown.extensions.attr_list',
    'pymdownx.progressbar'
]
text = markdown(text, output_format='html', extensions=extensions)  # MarkDown转HTML
html = html.format(text)
print(html)
with open(filename.replace('.md', '.html'), 'w', encoding=encoding) as f:
    f.write(html)
# pdfkit.from_string(html, output, options={'encoding': 'utf-8'})  # HTML转PDF
print('完成')

progressbar.md

[=0% "0%"]
[=5% "5%"]
[=25% "25%"]
[=45% "45%"]
[=65% "65%"]
[=85% "85%"]
[=100% "100%"]
[=85% "85%"]{: .candystripe}
[=100% "100%"]{: .candystripe .candystripe-animate}

[=0%]{: .thin}
[=5%]{: .thin}
[=25%]{: .thin}
[=45%]{: .thin}
[=65%]{: .thin}
[=85%]{: .thin}
[=100%]{: .thin}

效果

python 文本转换为Markdown格式 python将markdown转换为html_python_06