lxml是python中处理xml的一个非常强大的库,可以非常方便的解析和生成xml文件。下面的内容翻译了链接中的一部分

 

1.生成空xml节点



from lxml import etree

root = etree.Element("root")
print(etree.tostring(root, pretty_print=True))



<root/>


2.生成xml子节点



from lxml import etree

root = etree.Element("root")
root.append(etree.Element("child1")) #方法一
child2 = etree.SubElement(root, "child2") #方法二
child2 = etree.SubElement(root, "child3")
print(etree.tostring(root))



<root>
<child1/>
<child2/>
<child3/>
</root>


3.生成带内容的xml节点



from lxml import etree

root = etree.Element("root")
root.text = "Hello World"
print(etree.tostring(root, pretty_print=True))



<root>Hello World</root>


4.属性

lxml中将属性以字典的形式存储

生成属性



from lxml import etree

root = etree.Element("root", intersting = "totally") #方法一
root.set("hello","huhu") #方法二
root.text = "Hello World"
print(etree.tostring(root))



<root intersting="totally" hello="huhu">Hello World</root>


获取属性

方法一:



root.get("interesting")
root.get("hello")



totally
huhu


方法二:



attributes = root.attrib
print(attributes["interesting"])


遍历属性



for name, value in sorted(root.items()):
print('%s = %r' % (name, value))


5.生成特殊内容

如下xml,中间的文字被<br/>分割,需要用到.tail



<html><body>Hello<br/>World</body></html>



html = etree.Element("html")
body = etree.SubElement(html, "body")
body.text = "TEXT"
br = etree.SubElement(body, "br")
br.tail = "TAIL"
etree.tostring(html)


6.遍历

遍历节点



for element in root.iter():
print("%s - %s" % (element.tag, element.text))


遍历指定子节点,将子节点名写入iter()



for element in root.iter("child"):
print("%s - %s" % (element.tag, element.text))


7.用XPath查找节点内容



build_text_list = etree.XPath("//text()") # lxml.etree only!
print(build_text_list(html))


8.查找节点

iterfind():遍历所有节点匹配表达式

findall():返回满足匹配的节点列表

find():返回满足匹配的第一个

findtext():返回第一个满足匹配条件的.text内容

设有以下xml内容



root = etree.XML("<root><a x='123'>aText<b/><c/><b/></a></root>")


查找子节点



>>> print(root.find("b"))
None
>>> print(root.find("a").tag)
a


查找树中任意节点



>>> print(root.find(".//b").tag)
b
>>> [ b.tag for b in root.iterfind(".//b") ]
['b', 'b']


查找具有指定属性的节点



>>> print(root.findall(".//a[@x]")[0].tag)
a
>>> print(root.findall(".//a[@y]"))
[]


9.字符串解析为XML



>>> some_xml_data = "<root>data</root>"

>>> root = etree.fromstring(some_xml_data)
>>> print(root.tag)
root
>>> etree.tostring(root)
b'<root>data</root>'


10.使用E-factory快速生成XML和HTML



>>> from lxml.builder import E

>>> def CLASS(*args): # class is a reserved word in Python
return {"class":' '.join(args)}

>>> html = page = (
E.html( # create an Element called "html"
E.head(
E.title("This is a sample document")
),
E.body(
E.h1("Hello!", CLASS("title")),
E.p("This is a paragraph with ", E.b("bold"), " text in it!"),
E.p("This is another paragraph, with a", "\n ",
E.a("link", href="http://www.python.org"), "."),
E.p("Here are some reservered characters: <spam&egg>."),
etree.XML("<p>And finally an embedded XHTML fragment.</p>"),
)
)
)

>>> print(etree.tostring(page, pretty_print=True))
<html>
<head>
<title>This is a sample document</title>
</head>
<body>
<h1 class="title">Hello!</h1>
<p>This is a paragraph with <b>bold</b> text in it!</p>
<p>This is another paragraph, with a
<a href="http://www.python.org">link</a>.</p>
<p>Here are some reservered characters: <spam&egg>.</p>
<p>And finally an embedded XHTML fragment.</p>
</body>
</html>