I need to extract some data from a 1 GB XML file into <key,value>
tables using ets and dets. I have searched the whole web and also in here but I did not find any simple example on how to handle big XML files.
For the beginning I just want to understand how to read the file without uploading the whole of it into memory.
come on ! What you need is a SAX XML parser called Erlsom. For small files, its possible to load it all into memory and then parse it as in the answer i gave to this question. But, for your case, these big files need the SAX method. The Sax examples are here.
SAX ensures that you do not load a file into memory to parse it. The tokens that the parser gets , is what it gives to you. You will need an advanced skill of tail recursion, pattern matching and stateful programming.
EDIT
lib
, a location where all built-in applications are located. Rename its extraction folder like this: erlsom-1.0
. Create a file called: Emakefile
in the erlsom-1.0
folder. Put this inside that file and save. {"src/*", [verbose,report,warn_obsolete_guard,{outdir, "ebin"}]}.The erlsom-1.0 folder, should look like this:
erlsom-1.0The rest of the other files do not matter. Now, open an erlang shell, whose
|-doc/
|-ebin/
|-examples/
|-include/
|-src/
|-Emakefile
pwd()
is looking into the erlsom-1.0
folder. Run the function: make:all().
like this Eshell V5.9 (abort with ^G) 1> make:all(). Recompile: src/ucs Recompile: src/erlsom_writeHrl Recompile: src/erlsom_write Recompile: src/erlsom_ucs Recompile: src/erlsom_simple_form Recompile: src/erlsom_sax_utf8 Recompile: src/erlsom_sax_utf16le Recompile: src/erlsom_sax_utf16be Recompile: src/erlsom_sax_list Recompile: src/erlsom_sax_lib Recompile: src/erlsom_sax_latin1 Recompile: src/erlsom_sax Recompile: src/erlsom_pass2 Recompile: src/erlsom_parseXsd Recompile: src/erlsom_parse Recompile: src/erlsom_lib Recompile: src/erlsom_compile Recompile: src/erlsom_add Recompile: src/erlsom up_to_date 2>So, its done. So if the folder
erlsom-1.0
is in your erlang lib
, then, you can call the erlsom methods from any erlang shell whichever pwd()
it may have.