https://www.iana.org/domains/arpa
I can get following output using the xpath '//table[@id="arpa-table"]/tbody/tr/join((td[1], normalize-space(td[2])), x:cps(9))' with xidel. But I want to put things like RFC 3172
in a 3rd column and /go/rfc3172
in a forth column. Does anybody let me know how you do it?
arpa▸ Reserved exclusively to support operationally-critical infrastructural identifier spaces as advised by the Internet Architecture Board RFC 3172¬
as112.arpa▸ For sinking DNS traffic for reverse IP address lookups and other applications RFC 7535¬
e164.arpa▸ For mapping E.164 numbers to Internet URIs RFC 6116¬
home.arpa▸ For non-unique use in residential home networks RFC 8375¬
in-addr-servers.arpa▸ For hosting authoritative name servers for the in-addr.arpa domain RFC 5855¬
in-addr.arpa▸ For mapping IPv4 addresses to Internet domain names RFC 1035¬
ip6-servers.arpa▸ For hosting authoritative name servers for the ip6.arpa domain RFC 5855¬
ip6.arpa▸ For mapping IPv6 addresses to Internet domain names RFC 3152¬
ipv4only.arpa▸ For detecting the presence of DNS64 and for learning the IPv6 prefix used for protocol translation RFC 7050¬
iris.arpa▸ For locating Internet Registry Information Services RFC 4698¬
uri.arpa▸ For resolving Uniform Resource Identifiers according to the Dynamic Delegation Discovery System RFC 3405 RFC 8958¬
urn.arpa▸ For resolving Uniform Resource Names according to the Dynamic Delegation Discovery System RFC 3405¬
The first row should be something like
arpa▸ Reserved exclusively to support operationally-critical infrastructural identifier spaces as advised by the Internet Architecture Board▸ RFC 3172¬
By default xidel
prints the node/element its string-value (string()
). It's "the concatenation of the string-values of all its descendant text nodes", as E. Lenz puts it:
$ xidel -s https://www.iana.org/domains/arpa -e '
//table[@id="arpa-table"]/tbody/tr[1]/td[2] ! (position(),.)
'
#or
$ xidel -s https://www.iana.org/domains/arpa -e '
//table[@id="arpa-table"]/tbody/tr[1]/td[2]/string() ! (position(),.)
'
1
Reserved exclusively to support operationally-critical infrastructural identifier spaces as advised by the Internet Architecture Board
RFC 3172
As you can see, 1 item/node.
That's why normalize-space(td[2])
returns Reserved exclusively [...] RFC 3172
.
With text()
on the other hand you'll get the node/element its direct text-nodes:
$ xidel -s https://www.iana.org/domains/arpa -e '
//table[@id="arpa-table"]/tbody/tr[1]/td[2]/text() ! (position(),.)
'
1
Reserved exclusively to support operationally-critical infrastructural identifier spaces as advised by the Internet Architecture Board
2
3
Or all of its descendant text-nodes:
$ xidel -s https://www.iana.org/domains/arpa -e '
//table[@id="arpa-table"]/tbody/tr[1]/td[2]//text() ! (position(),.)
'
1
Reserved exclusively to support operationally-critical infrastructural identifier spaces as advised by the Internet Architecture Board
2
3
RFC 3172
4
As you can see, 3 and 4 different items/nodes.
To get the 1st text-node, simply td[2]/text()[1]
would do, but normalize-space(td[2]/text())
and even normalize-space(td[2]//text())
would work too.