xmlbashshellsed

Extract XML Value in bash script


I'm trying to extract a value from an xml document that has been read into my script as a variable. The original variable, $data, is:

<item> 
  <title>15:54:57 - George:</title>
  <description>Diane DeConn? You saw Diane DeConn!</description> 
</item> 
<item> 
  <title>15:55:17 - Jerry:</title> 
  <description>Something huh?</description>
</item> 

and I wish to extract the first title value, so

15:54:57 - George:

I've been using the sed command:

title=$(sed -n -e 's/.*<title>\(.*\)<\/title>.*/\1/p' <<< $data)

but this only outputs the second title value:

15:55:17 - Jerry:

Does anyone know what I have done wrong? Thanks!


Solution

  • As Charles Duffey has stated, XML parsers are best parsed with a proper XML parsing tools. For one time job the following should work.

    grep -oPm1 "(?<=<title>)[^<]+"
    

    ###Test:

    $ echo "$data"
    <item> 
      <title>15:54:57 - George:</title>
      <description>Diane DeConn? You saw Diane DeConn!</description> 
    </item> 
    <item> 
      <title>15:55:17 - Jerry:</title> 
      <description>Something huh?</description>
    $ title=$(grep -oPm1 "(?<=<title>)[^<]+" <<< "$data")
    $ echo "$title"
    15:54:57 - George: