tcl

Tcl: search in text file and outputs parts of the next line


I have an xml file which include these lines:

    <Integer Name="XMLVersionVal">
        <Value>16</Value>
    </Integer>

I would like to create a tcl script to find the value of the parameter XMLVersionVal in this case "16".

I've made this script (see below) which is working fine but I find it ugly and too lengthy. I would like to knew if there is a better way to do what I want. I basically look every line of the file, find the one with Name="XMLVersionVal" then remove the and from the next line.

Thanks for your help

set filePath "C:/tclWork/xml_file.xml"
if {[file exist $filePath]} {
    set xmlFile [open $filePath r]
    set file_content [read $xmlFile]
    close $xmlFile
    set xmlVerFound 0
    foreach {line} $file_content {
        if {[regexp {1} $xmlVerFound]} {
            regsub {<Value>} $line {} line
            regsub {</Value>} $line {} line
            set xmlVer $line
        }
    if {[string first {Name="XMLVersionVal"} $line] != -1} {
            set xmlVerFound 1
    } else {
            set xmlVerFound 0
        }
   }
}

Solution

  • For simplicity I'd recommend the following:

    ## just for testing
    set data {Name="XMLVersionVal"> <Value>16</Value>}
    regexp -nocase {Name[\r\n\f\t\s]*=[\r\n\f\t\s]*"XMLVersionVal"[\r\n\f\t\s]*>[\r\n\f\t\s]*<Value>([^<>]*)</Value>} $data dummy result
    puts $result ;# prints 16
    

    While the sequence of [\r\n\f\t\s] matches any white character = carrier return, newline, form feed, tab, space;

    regexp itself returns 0 or 1 depending if a match is found.

    If one knows the content to search for one could also limit to numbers which would improve liability:

    regexp -nocase {Name[\r\n\f\t\s]*=[\r\n\f\t\s]*"XMLVersionVal"[\r\n\f\t\s]*>[\r\n\f\t\s]*<Value>([0-9]*)</Value>} $data dummy result
    

    Note:

    Character class changed from (.*) to ([^<>]*) which now matches any character but <>.