Skip to main content

Command Palette

Search for a command to run...

Convert XML to Excel or PDF in Python Like a Pro

Published
5 min read

Ever stared at an XML file full of nested tags and thought, “How on earth am I supposed to make sense of this?” You’re not alone. Luckily, with Python, turning that messy XML into a neat Excel spreadsheet or a well-organized PDF report is easier than you think. In this article, I’ll walk you through practical ways to convert your XML data like a pro — so you can focus on analyzing the data instead of wrestling with raw XML.

Step 1: Understand and parse the XML

First, understanding what an XML file is will greatly help in the subsequent conversion process. XML (Extensible Markup Language) is a standard text format used for storing and transmitting structured data. Its main characteristics are human readability, clear hierarchical structure, and the ability to define custom tags.

An XML file typically contains elements, attributes, text values, and nested elements. Elements represent data items, attributes provide additional information for elements, text values store the actual data, and nested elements allow for complex hierarchical relationships. Understanding these structures is essential for lossless XML conversion, as each element and attribute may need to be mapped to rows and columns in Excel or to tables in a PDF document.

To convert XML files to other formats without losing data, we need to parse the XML structure in Python to extract each record and its fields. Python offers multiple ways to read and manipulate XML, with the most commonly used being the standard library xml.etree.ElementTree. The following example demonstrates how to load an XML file, access its root node, iterate through data elements, and read each field:

import xml.etree.ElementTree as ET

# Load the XML file
tree = ET.parse('/salesreport.xml')
root = tree.getroot()

# Iterate through <Record> elements
for record in root.find('Records').findall('Record'):
    record_id = record.attrib.get('id')  # <Record id="R001">
    product = record.find('Product').text
    region = record.find('Region').text
    month = record.find('Month').text
    units_sold = record.find('UnitsSold').text
    unit_price = record.find('UnitPrice').text
    revenue = record.find('Revenue').text
    notes = record.find('Notes').text if record.find('Notes') is not None else ""

    print(record_id, product, region, month, units_sold, unit_price, revenue, notes)

This code illustrates the basic process of parsing XML and extracting each record. Once you understand this workflow, you can map these data fields into Excel sheets or PDF documents, achieving a lossless conversion.

Step 2: Convert XML to Excel in Python Easily

After gaining a clear understanding of the XML structure, let's get to the main topic: converting an XML file into Excel. In this section, we use Spire.XLS for Python, a professional and powerful third-party library. With Spire.XLS, you can create a spreadsheet, insert data from the XML file into the cells, adjust cell settings as needed, and finally save the workbook as an Excel file.

The following code demonstrates how to convert the XML file we parsed earlier into Excel:

from spire.xls import Workbook, HorizontalAlignType
import xml.etree.ElementTree as ET

# Create a Workbook and Worksheet
workbook = Workbook()
workbook.Worksheets.Clear()
worksheet = workbook.Worksheets.Add("SalesReport")

# Read the XML file and get the root element
tree = ET.parse("E:/Administrator/Python1/input/salesreport.xml" )
root = tree.getroot()

# Get all <Record> elements
records = root.find('Records').findall('Record')

# Get the header from the first record
first_record = records[0]
header = ['RecordID'] + [child.tag for child in first_record]

# Write headers
for col_index, header_text in enumerate(header, start=1):
    worksheet.SetValue(1, col_index, header_text)

# Write data rows
for row_index, record in enumerate(records, start=2):
    # Write the RecordID to the first column
    worksheet.SetValue(row_index, 1, record.attrib.get('id'))
    # Write child node text to corresponding columns
    for col_index, child in enumerate(record, start=2):
        worksheet.SetValue(row_index, col_index, child.text if child.text else "")

# Set cell styles
worksheet.AllocatedRange.AutoFitColumns()
worksheet.AllocatedRange.HorizontalAlignment = HorizontalAlignType.Left
worksheet.Range[f"A1:{chr(64+len(header))}1"].Style.Font.IsBold = True

# Save the data to Excel
workbook.SaveToFile("/SalesReport_Q1_2025.xlsx")

workbook.Dispose()

print(f"XML has been converted to Excel successfully: {output_file}")

How to Convert XML to Excel in Python

Steps explained:

  • Use the xml.etree.ElementTree library to retrieve data from an XML file.

  • Create an object of the Workbook class and add a new worksheet to it.

  • Write data that extracted from the XML file to cells using the Worksheet.SetValue() method.

  • Set cell styles if you need.

  • Save the workbook to an Excel file through the Workbook.SaveToFile() method.

Alternative Output: Convert XML to PDF in Python

If you want to convert an XML file into a PDF, Spire.XLS can handle that as well. The implementation logic is very similar to the Excel conversion: extract data from the XML file, insert it into a worksheet, and finally save the result as a PDF using the Worksheet.SaveToPdf() method. Along the way, you can also apply page setup options to customize the layout of the output PDF.

The following code demonstrates how to convert an XML file to PDF:

from spire.xls import Workbook, HorizontalAlignType
import xml.etree.ElementTree as ET

# Create a new workbook and add a worksheet
workbook = Workbook()
workbook.Worksheets.Clear()
worksheet = workbook.Worksheets.Add("SalesReport")

# Read the XML file
tree = ET.parse("/salesreport.xml")
root = tree.getroot()

# Get all <Record> elements
records = root.find('Records').findall('Record')

# Get the table header
first_record = records[0]
header = ['RecordID'] + [child.tag for child in first_record]

# Write header row
for col_index, header_text in enumerate(header, start=1):
    worksheet.SetValue(1, col_index, header_text)

# Write data rows
for row_index, record in enumerate(records, start=2):
    worksheet.SetValue(row_index, 1, record.attrib.get('id'))
    for col_index, child in enumerate(record, start=2):
        worksheet.SetValue(row_index, col_index, child.text if child.text else "")

# Set cell styles
worksheet.AllocatedRange.AutoFitColumns()
worksheet.AllocatedRange.HorizontalAlignment = HorizontalAlignType.Left
worksheet.Range[f"A1:{chr(64+len(header))}1"].Style.Font.IsBold = True

# Adjust the page settings
workbook.ConverterSetting.SheetFitToPage = True
pageSetup = worksheet.PageSetup
pageSetup.TopMargin = 0.3
pageSetup.BottomMargin = 0.3
pageSetup.LeftMargin = 0.3
pageSetup.RightMargin = 0.3
pageSetup.IsPrintGridlines = True

# Save it as a PDF
output_pdf = "/SalesReport_Q1_2025.pdf"
worksheet.SaveToPdf(output_pdf)

workbook.Dispose()

print(f"XML has been converted to PDF successfully: {output_pdf}")

Convert an XML File to PDF in Python

The Conclusion

Converting XML files into Excel or PDF with Python doesn’t have to be complicated. By first understanding the structure of XML and then applying the right tools like Spire.XLS, you can easily transform raw data into well-structured reports that are ready for analysis or sharing. Whether you need spreadsheets for data processing or PDFs for distribution, the workflow remains straightforward and highly effective.