xmlfileerlang

Convert 1 GB XML file to ets and dets in Erlang


I need to extract some data from a 1 GB XML file into <key,value> tables using ets and dets. I have searched the whole web and also in here but I did not find any simple example on how to handle big XML files.

For the beginning I just want to understand how to read the file without uploading the whole of it into memory.


Solution

  • come on ! What you need is a SAX XML parser called Erlsom. For small files, its possible to load it all into memory and then parse it as in the answer i gave to this question. But, for your case, these big files need the SAX method. The Sax examples are here.

    SAX ensures that you do not load a file into memory to parse it. The tokens that the parser gets , is what it gives to you. You will need an advanced skill of tail recursion, pattern matching and stateful programming.

    EDIT


    Now, download erlsom, and extract it into your erlang lib , a location where all built-in applications are located. Rename its extraction folder like this: erlsom-1.0. Create a file called: Emakefile in the erlsom-1.0 folder. Put this inside that file and save.
    {"src/*", [verbose,report,warn_obsolete_guard,{outdir, "ebin"}]}.
    
    The erlsom-1.0 folder, should look like this:
    erlsom-1.0
    |-doc/
    |-ebin/
    |-examples/
    |-include/
    |-src/
    |-Emakefile
    The rest of the other files do not matter. Now, open an erlang shell, whose pwd() is looking into the erlsom-1.0 folder. Run the function: make:all(). like this
    Eshell V5.9  (abort with ^G)
    1> make:all().
    Recompile: src/ucs
    Recompile: src/erlsom_writeHrl
    Recompile: src/erlsom_write
    Recompile: src/erlsom_ucs
    Recompile: src/erlsom_simple_form
    Recompile: src/erlsom_sax_utf8
    Recompile: src/erlsom_sax_utf16le
    Recompile: src/erlsom_sax_utf16be
    Recompile: src/erlsom_sax_list
    Recompile: src/erlsom_sax_lib
    Recompile: src/erlsom_sax_latin1
    Recompile: src/erlsom_sax
    Recompile: src/erlsom_pass2
    Recompile: src/erlsom_parseXsd
    Recompile: src/erlsom_parse
    Recompile: src/erlsom_lib
    Recompile: src/erlsom_compile
    Recompile: src/erlsom_add
    Recompile: src/erlsom
    up_to_date
    2>
    
    So, its done. So if the folder erlsom-1.0 is in your erlang lib, then, you can call the erlsom methods from any erlang shell whichever pwd() it may have.