I need only the URL's from the dmoz/ODP file. But the file is in RDF. How do I get only the url's from the odp file? I want to extract all the url's in a text file.
Anyone knows of any script to parse only urls from rdf file ?
Maybe something like this then?
#!/usr/bin/perl
use strict;
use warnings;
my $file = "kt-content.rdf.u8";
my @urls;
open(my $fh, "<", $file) or die "Unable to open $file\n";
while (my $line = <$fh>) {
if ($line =~ m/<(?:ExternalPage about|link r:resource)="([^\"]+)"\/?>/) {
push @urls, $1;
}
}
close $fh;
And then print the contents of @urls to a text file.