Python处理HTML实体编码

python2

import HTMLParser  

char = r"〹"
http_parser = HTMLParser.HTMLParser();
uChar = http_parser.unescape(char);

python3

from html import unescape

s = u'position.php?&start=10#a" id="next">下一页</a>'

print(s)

print(unescape(s))

"""
position.php?&start=10#a" id="next">下一页</a>
position.php?&start=10#a" id="next">下一页</a>
"""


参考: ​​Python处理HTML实体编码​