pythonxml-parsing

How to correctly use escape characters to write a mathematical string to a XML file using Python?


I have an XML file which I am trying to edit using Python. This is an example of the content of the XML file:

<?xml version="1.0" encoding="utf-8"?>
<NEKTAR>
    <CONDITIONS>
        <FUNCTION NAME="Baseflow">
            <E VAR="u0" VALUE="(x>3) * (y <= 3) * 4 * x"/>
        </FUNCTION>
    </CONDITIONS>
</NEKTAR>

In the original file, the value for u0 is set to 0. Using Python, I want to change this value to the mathematical expression shown in the code-block. I have tried two approaches un-successfully:

  1. Write the in-equalities using the escape characters such as &lt; and &gt;
  2. Try to use CDATA

What would be the right approach to do something like this using xml.etree.ElementTree ?


Solution

  • Edit

    The op needs a version that doesn't escape and leave the equation as is. This can be done with string replacement by regular expressions (re). Beware that the processed string is never a valid XML and cannot be parsed by any xml package by far as I know.

    import re
    
    ORIGINAL = """<?xml version="1.0" encoding="utf-8"?>
    <NEKTAR>
        <CONDITIONS>
            <FUNCTION NAME="Baseflow">
                <E VAR="u0" VALUE="0"/>
            </FUNCTION>
        </CONDITIONS>
    </NEKTAR>"""
    
    TARGET = '(x>3) * (y <= 3) * 4 * x'
    
    replaced_match = re.search(r'VALUE="(.+?)"', ORIGINAL)
    if replaced_match:
        result = ORIGINAL[:replaced_match.start(1)]
        result += TARGET
        result += ORIGINAL[replaced_match.end(1):]
    print(result)
    
    """
    <?xml version="1.0" encoding="utf-8"?>
    <NEKTAR>
        <CONDITIONS>
            <FUNCTION NAME="Baseflow">
                <E VAR="u0" VALUE="(x>3) * (y <= 3) * 4 * x"/>
            </FUNCTION>
        </CONDITIONS>
    </NEKTAR>
    """
    

    xml package automatically do the escaping jobs for you. Does this satisfy your need? (replace the VALUE of E tag with given equation that needs escaping)

    from xml.etree import ElementTree
    
    ORIGINAL = """<?xml version="1.0" encoding="utf-8"?>
    <NEKTAR>
        <CONDITIONS>
            <FUNCTION NAME="Baseflow">
                <E VAR="u0" VALUE="0"/>
            </FUNCTION>
        </CONDITIONS>
    </NEKTAR>"""
    
    TARGET = '(x>3) * (y <= 3) * 4 * x'
    
    x = ElementTree.fromstring(ORIGINAL)
    
    # find all E tags
    els = x.findall('.//E')
    
    for el in els:
        el.set('VALUE', TARGET)
    x_mod = ElementTree.tostring(x).decode()
    
    print(x_mod)
    

    result:

    <NEKTAR>
        <CONDITIONS>
            <FUNCTION NAME="Baseflow">
                <E VAR="u0" VALUE="(x&gt;3) * (y &lt;= 3) * 4 * x" />
            </FUNCTION>
        </CONDITIONS>
    </NEKTAR>