Could you please tell me how can I print lines that matches a regex from a file?
In the topic:
Extract string matching regex from string
I only see how to print exact match, and when I use it to a file:
./xidel -e 'analyze-string(unparsed-text-lines("repl.k"), "uniq")//fn:match/text()
'
I also see exact matches and when I remove //fn:match/text() It dumps all the file.
BTW: unparsed-text-lines probably stores all the lines in memory. What should I do so the file could be read one line at a time?
Could you please advice? Thank you
I'm not sure exactly what you mean by the question.
Also I'm not sure why you want to use a regex to match "uniq" when a simple contains() would suffice.
To get all the lines that contain "uniq" as a substring:
unparsed-text-lines("repl.k")[contains(., "uniq")]
To get all the lines equal to "uniq":
unparsed-text-lines("repl.k")[. = "uniq"]
To get all the lines that contain a substring that matches the regex "u*niq":
unparsed-text-lines("repl.k")[matches(., "u*niq")]
To get all the lines that in their entirety match the regex "u*niq":
unparsed-text-lines("repl.k")[matches(., "^u*niq$")]
(I suspect that you are relying too heavily on StackOverflow for your information about XQuery. You stumbled on a question where analyse-string()
was the answer, and used this for a problem that doesn't need anything that powerful or complicated).
As for the question whether all the lines are stored in memory, that depends entirely on the XQuery implementation, and I don't know what Xidel does. Certainly Saxon will process the file one line at a time where (as here) the query allows it. In fact the reason unparsed-text-lines()
was added to the function library was explicitly to make this easier to achieve.