xqueryxidel

How to print all lines that match the regex from the file?


Could you please tell me how can I print lines that matches a regex from a file? In the topic: Extract string matching regex from string I only see how to print exact match, and when I use it to a file: ./xidel -e 'analyze-string(unparsed-text-lines("repl.k"), "uniq")//fn:match/text()' I also see exact matches and when I remove //fn:match/text() It dumps all the file.

BTW: unparsed-text-lines probably stores all the lines in memory. What should I do so the file could be read one line at a time?

Could you please advice? Thank you


Solution

  • I'm not sure exactly what you mean by the question.

    Also I'm not sure why you want to use a regex to match "uniq" when a simple contains() would suffice.

    To get all the lines that contain "uniq" as a substring:

    unparsed-text-lines("repl.k")[contains(., "uniq")]
    

    To get all the lines equal to "uniq":

    unparsed-text-lines("repl.k")[. = "uniq"]
    

    To get all the lines that contain a substring that matches the regex "u*niq":

    unparsed-text-lines("repl.k")[matches(., "u*niq")]
    

    To get all the lines that in their entirety match the regex "u*niq":

    unparsed-text-lines("repl.k")[matches(., "^u*niq$")]
    

    (I suspect that you are relying too heavily on StackOverflow for your information about XQuery. You stumbled on a question where analyse-string() was the answer, and used this for a problem that doesn't need anything that powerful or complicated).

    As for the question whether all the lines are stored in memory, that depends entirely on the XQuery implementation, and I don't know what Xidel does. Certainly Saxon will process the file one line at a time where (as here) the query allows it. In fact the reason unparsed-text-lines() was added to the function library was explicitly to make this easier to achieve.