xmlpostgresqlxpathxpath-1.0

Postgresql - XML- Processing xpath with date comparission


I have ingested XML file in Postgresql DB and I trying to perform XPATH operation on the field with date comparison but it is not working. From Postgresql official site I found out that Postgresql uses Xpath1 and Xpath1 does not support date comparison. I tried changing my Xpath query to use string based comparison but still it is not matching.

Input Xpath:

//val:key[@modifiedDate>"2024-06-06"]

Since Direct date comparison is not available in Xpath1 I tried String based comparison.

Converted Xpath:

//val:key[substring(@modifiedDate, 1, 10) > "2024-06-06"]

But this XPath is also not matching even though there is a matching Xpath. How should I frame my Xpath to match the field with date compaission.

The sample structure of XML ingested in table.

<root xmlns:val="http://some.uri">
    <val:key modifiedDate="2024-06-07T12:34:56">Key1</val:key>
    <val:key modifiedDate="2024-06-05T08:21:00">Key2</val:key>
    <val:key modifiedDate="2024-06-10T14:22:33">Key3</val:key>
    <val:key modifiedDate="2024-06-06T09:00:00">Key4</val:key>
</root>

The expected result is we should be able to match this XML becuase it xpaths <val:key modifiedDate="2024-06-07T12:34:56">Key1</val:key> and <val:key modifiedDate="2024-06-10T14:22:33">Key3</val:key>, as their modifiedDate attributes start with dates later than "2024-06-06".

SQL Query for reproducing the problem:

CREATE TABLE xml_data (
    id SERIAL PRIMARY KEY,
    document XML
);

INSERT INTO xml_data (document)
VALUES ('<root xmlns:val="http://some.uri">
            <val:key modifiedDate="2024-06-07T12:34:56">Key1</val:key>
            <val:key modifiedDate="2024-06-05T08:21:00">Key2</val:key>
            <val:key modifiedDate="2024-06-10T14:22:33">Key3</val:key>
            <val:key modifiedDate="2024-06-06T09:00:00">Key4</val:key>
        </root>');

Query to filter data:

SELECT 
    id, document
FROM 
    xml_data
WHERE 
    xpath_exists('//val:key[substring(@modifiedDate, 1, 10) > "2024-06-06"]', document);

Solution

  • As Pepe N O points out, using the > operator in your XPath expression will convert the date operands from strings into numbers, but since both operands contain non-numeric characters, the conversion will produce NaN in both cases.

    However, you could modify your XPath expression to use the XPath translate function to convert your dates into purely numerical values.

    For example the expression translate('2024-06-07T12:34:56', '-T:', '') equals "20240607123456".