On page MediaWiki::DumpFile following code is present:
use MediaWiki::DumpFile;
$mw = MediaWiki::DumpFile->new;
$sql = $mw->sql($filename);
$sql = $mw->sql(\*FH);
$pages = $mw->pages($filename);
$pages = $mw->pages(\*FH);
$fastpages = $mw->fastpages($filename);
$fastpages = $mw->fastpages(\*FH);
use MediaWiki::DumpFile::Compat;
$pmwd = Parse::MediaWikiDump->new;
I'm completely new to Perl and don't know what to do with $fastpages to save all HTML pages (or text, it doesn't matter) from XML dump. Can you help me? And what is *FH ?
I haven't used it but the documentation for MediaWiki::DumpFile::FastPages
has the following example for printing the title and text of each article in a dump file:
use MediaWiki::DumpFile::FastPages;
$pages = MediaWiki::DumpFile::FastPages->new($file);
$pages = MediaWiki::DumpFile::FastPages->new(\*FH);
while(($title, $text) = $pages->next) {
print "Title: $title\n";
print "Text: $text\n";
}
This will write everything to stdout
. When you create the MediaWiki::DumpFile::FastPages
object, you can pass either a file name, e.g.
$file = "/path/to/dump/file";
$pages = MediaWiki::DumpFile::FastPages->new($file);
or a reference to a file handle, e.g.
open FH, "<", "/path/to/dump/file" or die "Failed to open file: $!";
$pages = MediaWiki::DumpFile::FastPages->new(\*FH);