Writing and Parsing XML

Study group dedicated to learning how to code in the Python language.

Moderators: snarkout, Patrick, dann

Post Reply
User avatar
Jza
Posts: 466
Joined: Sun Oct 30, 2005 7:01 pm
Location: Mexico
Contact:

Writing and Parsing XML

Post by Jza » Wed Jun 06, 2007 2:59 pm

So I have been doing some python with XML and after trying SAX and giving up on it. I decided to give a try to minidom and found it to be easier.

One of the first goals is to write an XML tree. I found this quite simple and even if I didn't applied any functions (wrote the node one by one) I am guessing I can improve it as time goes by.

So here is the RSS from Keith and the Girl:

Code: Select all

<?xml version="1.0" encoding="UTF-8"?>
<!-- TalkCast(TM) feed generated by TalkShow(TM) -->
<rss version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd">
    <channel>
        <!-- begin RSS 2.0 tags -->
        <title>Keith and The Girl</title>
        <link>http://www.talkshoe.com/talkshoe/web/tscmd/tc/27183</link>
        <language>en-us</language>
        <copyright>This work is licensed under a Creative Commons License - Attribution-NonCommercial-ShareAlike - http://creativecommons.org/licenses/by-nc-sa/2.0/</copyright>
        <category>Arts & Entertainment</category>
        <description>Keith and his girlfriend talk shit. Also visit KeithandTheGirl.com.
This Podcast was created using www.talkshoe.com</description>
        <ttl>720</ttl>
        <image>
            <url>http://www.talkshoe.com/custom/images/icons/TC-27183-MainIcon.jpg</url>
            <title>Keith and The Girl</title>
            <link>http://www.talkshoe.com/talkshoe/web/tscmd/tc/27183</link>
        </image>
        <!-- end RSS 2.0 tags -->
        <item>
            <!-- begin RSS 2.0 tags -->
            <title>506: The PaleSkin and FreckleFace Show</title>
            <guid>http://recordings.talkshoe.com/TC-27183/TS-25680.mp3</guid>
            <pubDate>Wed, 30 May 2007 19:00:00 -0400</pubDate>
            <author>info@keithandthegirl.com</author>
            <link>http://recordings.talkshoe.com/TC-27183/TS-25680.mp3</link>
            <enclosure url="http://recordings.talkshoe.com/TC-27183/TS-25680.mp3" length="68395646" type="audio/mpeg" />
            <comments>http://recordings.talkshoe.com/TC-27183/TS-25680.mp3</comments>
            <description>That&apos;s what you do when a Dee is right there.</description>
            <category>Arts & Entertainment</category>
            <!-- end RSS 2.0 tags -->
        <item>
            <!-- begin RSS 2.0 tags -->
            <title>505: Girls are Stupid</title>
            <guid>http://recordings.talkshoe.com/TC-27183/TS-25435.mp3</guid>
            <pubDate>Mon, 28 May 2007 23:59:00 -0400</pubDate>
            <author>info@keithandthegirl.com</author>
            <link>http://recordings.talkshoe.com/TC-27183/TS-25435.mp3</link>
            <enclosure url="http://recordings.talkshoe.com/TC-27183/TS-25435.mp3" length="93006712" type="audio/mpeg" />
            <comments>http://recordings.talkshoe.com/TC-27183/TS-25435.mp3</comments>
            <description>Can&apos;t wait for your spin-off show, Dummy.</description>
            <category>Arts & Entertainment</category>
            <!-- end RSS 2.0 tags -->
        </item>
So that's their RSS xml, and here is what I need to do to create it. First I will generate a simple node:

Code: Select all

from xml.dom.minidom import Document

x = Document()

rss = x.createElement("rss")
rss.setAttribute("rss", "2.0")
rss.setAttribute("xml:itunes", "http://www.itunes.com/dtds/podcast-1.0.dtd")
x.appendChild(rss)
So I just use 3 methods, the createElement(), setAttribute(), and appendChild().

If I want to do a tree like tags it will be done like this:

Code: Select all

   <channel>
        <title>Keith and The Girl</title>
   </channel>
The python generation will be like this:

Code: Select all

channel = x.createElement("channel")
rss.appendChild(channel)

title= x.createElement("title")
titleText= x.createTextNode("Keith and the Girl")
channel.appendChild(title)
title.appendChild(titleText)
This actually pretty simple, I guess if having some for processes here and there will simplify the script and make it more object oriented.

Here is the complete code.
Alexandro COLORADO

User avatar
Jza
Posts: 466
Joined: Sun Oct 30, 2005 7:01 pm
Location: Mexico
Contact:

Post by Jza » Wed Jun 06, 2007 3:33 pm

Parsing has been a bit more complex since it will be more complex. Or is it that I know less so I will do more light XML.

Code: Select all

<body>
   <h1>title</h1>
   <p>paragraph</p>
</p>
So using the python minidom will use the method getElementsByTagName(), parse(). So first thing we will do is load the document and then do a single print.

Code: Select all

from xml.dom.minidom import parse, parseString

a = parse('file.xml')
print a.getElementsByTagName('p')
Unfortunately this output was a bit more cumbersome than what we expect it. Because the output comes in some hex intstance:

Code: Select all

<xml.dom.minidom.Document instance at 0xb785c16c>
To get the actual node conent as opposed to the instance we will use the method toxml()]/b].
So will have to do something like the following:

Code: Select all

from xml.dom.minidom import parse, parseString

a = parse('file.xml')
b = a.getElementsByTagName('p')
print b.toxml()
Alexandro COLORADO

User avatar
riddlebox
Posts: 86
Joined: Mon Jul 03, 2006 2:09 pm
Contact:

Post by riddlebox » Sat Jun 09, 2007 8:31 am

how would you just parse all the text, and maybe put it into a txt file? I have a certain perl script that I use that goes to a weather site, grabs the xml file for your area, then parses the text(if thats correct) to a file, then uses festival to create a wav file for it, then I play that wav file from asterisk. Anyway, for some reason I cannot understand perl code just by reading the file....the beauty of python. I would like to get rid of this script, and replace it with python...

Post Reply