Resource Guides: File Conversion: Data Conversion with XML

XML Conversion Using Python

What is XML?

XML stands for Extensible Markup Language
Commonly used for distributing data over the Internet
Used to store data in the form of hierarchical elements

Example XML file:

In this case: <catalog> is called the root element, which is where the XML tree starts and from which child elements branch out from.

Resources

https://docs.python.org/3/library/xml.etree.elementtree.html

Python to XML

Python To XML (Writing to XML Files)

To write to an XML file, there are multiple methods we can use to create the hierarchy of elements. The .write() method writes the element tree to an XML file in the same folder as the current directory/folder you are running the python file in.

ET.Element(root): Creates an element for the element tree and provides the root element as the parameter.
ET.SubElement(parent, tag, attribute): Creates a subelement with parameters of the parent element of that subelement, the tag for the subelement, and attributes of the subelement.
ET.ElementTree(root): Creates the element tree with the root of the element tree as the parameter.

XML to Python

XML To Python (Reading in XML Files)

1. We use Python’s built-in module xml.etree.ElementTree to parse XML data. First, we need to import the module. It’s convention to import xml.etree.ElementTree as ET.

2. First, we have to create an ElementTree object like this:

If we print catalogTree, we get the ElementTree object:

3. Then, we have to get the root element, which in our example above is catalog. We can do this with the following line:

If we print the root, we get this:

Now, we can search for child tags and return element objects in a list (or specific element objects) using the following methods.
1. root.find(‘tag’): Returns the first Element object with the name given for the tag or None if the tag name is not found.
2. root.findall(‘tag’): Returns a list of all Element objects with the given tag name or an empty list if the name is not found

For example, if we wanted to find the first “book” tag in our XML example from above, we could use this line of code:

Printing catalogElement gives us this:

If we wanted to get a list of all the book tags, we could use this line:

Printing catalogList gives us a list of the “book” tags in the XML file:

5. Notice how each book element doesn’t just print by itself: it prints with a lot of other text. We can get rid of this “extra” text by iterating through each element in catalogList and using .text: