rdffreebaselarge-data

Iterate through entities in Freebase rdf data dump with streaming parser


How can I iterate through the Freebase RDF data dump with a streaming parser and print the title of each entity and it's type ( type/object/type ) in PHP?

For example with expat: https://www.php.net/manual/en/book.xml.php

or the new XML reader functions: https://www.php.net/manual/en/book.xmlreader.php

or anything else that is a streaming parser that will parse the Freebase RDF data dump.


Solution

  • You really don't need a streaming XML parser. The Freebase RDF data dumps are not XML, they're N-Triples and they've been specially formatted so that you can split them apart on tabs. All you need to do is open the file, read it one line at a time and split each line on tabs.