pythondataframeparsingkmlpykml

Parsing KML file using pyKML


I'm learning how to parse KML files in Python using the pyKML module. The specific file I'm using can be found here and I've also added it at the bottom of this post. I have saved the file on my computer and name it test.kml.

After some research, I managed to extract a specific portion of the test.kml file and save the result to a DataFrame. Here's my code:

from pykml import parser
import pandas as pd

filename = 'test.kml'
with open(filename) as fobj:
    folder = parser.parse(fobj).getroot().Document

plnm = []

for pm in folder.Placemark:
    plnm1 = pm.name
    plnm.append(plnm1.text)

df = pd.DataFrame()
df['name'] = plnm

print(df)
          name
0   Club house
1  By the lake

I would like to add a new column to my DataFrame corresponding to the value of the "holeNumber". I have tried to add the following lines in my for loop but without success.

for pm in folder.Placemark:
    plnm1 = pm.name
    val1 = pm.ExtendedData.holeNumber.value
    plnm.append(plnm1.text)
    val.append(val1.text)

I'm not sure how to access the value from that specific node. The resulting DataFrame I'm looking for is the following:

| name        | holeNumber |
|-------------|------------|
| Club house  | 1          |
| By the lake | 5          |

Any help would be appreciated.

<kml xmlns="http://www.opengis.net/kml/2.2">
<Document>
  <name>My Golf Course Example</name>
  <Placemark>
    <name>Club house</name>
    <ExtendedData>
      <Data name="holeNumber">
        <value>1</value>
      </Data>
      <Data name="holeYardage">
        <value>234</value>
      </Data>
      <Data name="holePar">
        <value>4</value>
      </Data>
    </ExtendedData>
    <Point>
      <coordinates>-111.956,33.5043</coordinates>
    </Point>
  </Placemark>
  <Placemark>
    <name>By the lake</name>
    <ExtendedData>
      <Data name="holeNumber">
        <value>5</value>
      </Data>
      <Data name="holeYardage">
        <value>523</value>
      </Data>
      <Data name="holePar">
        <value>5</value>
      </Data>
    </ExtendedData>
    <Point>
      <coordinates>-111.95,33.5024</coordinates>
    </Point>
  </Placemark>
</Document>
</kml>

Solution

  • Here's a quick way to parse the KML.

    plnm = []
    holeNumber = []
    for pm in folder.Placemark:
        plnm1 = pm.name
        val1 = pm.ExtendedData.Data[0].value
        plnm.append(plnm1.text)
        holeNumber.append(val1.text)
    
    df = pd.DataFrame()
    df['name'] = plnm
    df['holeNumber'] = holeNumber
    
    print(df)
    

    Or

    df = pd.DataFrame(columns=('name', 'holeNumber'))
    for pm in folder.Placemark:
        name = pm.name.text
        value = pm.ExtendedData.Data[0].value.text
        df = df.append({ 'name' : name, 'holeNumber' : value }, ignore_index=True)
    print(df)
    

    Output:

              name holeNumber
    0   Club house          1
    1  By the lake          5