XML et Python

Il existe quantité de modules Python pour créer un arbre XML et l’enregistre dans un fichier, charger un fichier XML en mémoire, le parcourir, en extraire des portions, etc. Par exemple, le module xml.etree.ElementTree permet de réaliser les opérations standard de manipulation de données (les opérations CRUD, pour create, read, update, delete).

Ne jamais générer ou parser soi-même de l’XML (ne pas réinventer la roue). Passer plutôt du temps à étudier ce que font les modules existants, et choisir le plus adapté.

Importer et analyser des données (cRud)

Soit :

<?xml version="1.0"?>
<data>
    <country name="Liechtenstein">
        <rank>1</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
    <country name="Singapore">
        <rank>4</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name="Malaysia" direction="N"/>
    </country>
    <country name="Panama">
        <rank>68</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W"/>
        <neighbor name="Colombia" direction="E"/>
    </country>
</data>

importation depuis un fichier

import xml.etree.ElementTree as ET
tree = ET.parse('data.xml')
root = tree.getroot()

importation depuis une chaine

import xml.etree.ElementTree as ET
root = ET.fromstring(string_data)

root est de type xml.etree.ElementTree.Element. Tout objet de ce type possède des attributs tag, text, et attrib :

print(root.tag)       # data
print(root.attrib)    # {}

ainsi qu’une méthode get(attrib_name) récupèrant l’attribut spécifié.

On peut itérer sur les sous-éléments directs :

for child in root:
     print(child.tag, child.attrib)

On peut aussi itérer sur des sous-éléments de toute profondeur, correspondant à un nom donné :

for neighbor in root.iter('neighbor'):
    print(neighbor.get('name'))

Rechercher

La méthode findall permet d’appliquer un chemin XPATH simple (tout XPath n’est pas supporté par ce module) :

root.findall(".")
print(myself[0].tag) # data

neighbors = root.findall("./country/neighbor")
for n in neighbors:
    print(n.get('name'), end=" ") # Austria Switzerland Malaysia Costa Rica Colombia

nodes = root.findall(".//neighbor[@name='Colombia']/..")
for n in nodes:
    print(n.get('name')) # Panama

Modifier des données (crUd)

for rank in root.iter('rank'):
     new_rank = int(rank.text) + 1
     rank.text = str(new_rank)
     rank.set('updated', 'yes')
tree.write('output.xml')

Supprimer un élément (cruD)

for country in root.findall('country'):
     rank = int(country.find('rank').text)
     if rank > 50:
         root.remove(country)
tree.write('output.xml')

Ajouter des éléments (Crud)

new = ET.Element('new_element', attrib={"key1":"val1","key2":"val2"})
new.text = "my new text"

for country in root.findall("country[@name='Singapore']"):
     print(country.get('name'))
     country.insert(1,new)
     country.append(new)
tree.write('/tmp/output.xml')