erlangbackuprestoremnesia

what is the proper way to backup/restore a mnesia database?


WARNING: the background info is pretty long. Skip to the bottom if you think you need the question before the background info. Appreciate the time this is gonna take!

I've been all over the web (read google) and I have not found a good answer. YES, there are plenty of links and references to the Mnesia documentation on the erlang.org site but even those links suffer from version-itis.

So in the simplest case where the node() you are currently connected to is the same as the owner of the table set then the backup/restore is going to work. For example:

$ erl -sname mydatabase

> mnesia:start().
> mnesia:create_schema(...).
> mnesia:create_table(...).
> mnesia:backup("/tmp/backup.bup").
> mnesia:restore("/tmp/backup.bup", [{default_op, recreate_tables}]).

Hey this works great!

However, if the database is actually running on a remote node() or a remote node() on a remote mating then you have to initiate the backup this way:

$ erl -sname mydbadmin

> rpc:call(mydatabase@host, mnesia, backup, ["/tmp/backup.bup"]).
> rpc:call(mydatabase@host, mnesia, restore, ["/tmp/backup.bup", [{default_op, recreate_tables}]]).

Of course this was simple too. Now here are the tricky things....

But here is where things get complicated. While acquaintances of mine, who are erlang and mnesia experts suggest that mnesia's replication is severely flawed and that you should not use it (there are currently no alternatives that I know of and what are the chances that you are going to implement better version; not likely)

So you have two nodes() that are replicating ram and disc based tables. You have been maintaining a policy of backing up the database regularly with the standard backup using the default BackupMod. And one day a manager asks you to verify the backups. Only when you attempt to restore the database you get:

{atomic,[]}

And according to the documentation this means that there were no errors... and yet no tables were restored.

Not wanting to run the change_node procedure you remember that the node() and hostname must match so you change the hostname and the -sname param to match the machine where the data was backed up. This time however you get a strange error:

{aborted,{'EXIT',{aborted,{bad_commit,{missing_lock,mydatabase@otherhost}}}}}

Still not wanting to run the change_node procedure I quickly clone restore my server so that I have two similar machines. I name then appropriately to match the production servers. And I begin the restore process. Eureka! I now have real working data on the restore servers.

I'd like to say that this was the end of the road... but I have not asked a question yet and that the point of SO.... so here it is?

QUESTION: if I want to restore a backup which was taken from a cluster of replicated mnesia nodes, how do I modify the file (similar to the change_node procedure) so that the other nodes are either ignored or removed from the backup?

Asked slightly differently: How do I restore a replicated-multi-node() mnesia database on a single node()?


Solution

  • I think that this problem falls in the broader category of Mnesia questions that are related to a simple one:

    How do I rename a Mnesia node?

    The first and simplest solution, if your db is not huge, is to use the mnesia:traverse_backup function (see Mnesia User guide). Following is an example from the Mnesia User guide:

    change_node_name(Mod, From, To, Source, Target) ->
        Switch =
            fun(Node) when Node == From -> To;
               (Node) when Node == To -> throw({error, already_exists});
               (Node) -> Node
            end,
        Convert =
            fun({schema, db_nodes, Nodes}, Acc) ->
                    {[{schema, db_nodes, lists:map(Switch,Nodes)}], Acc};
               ({schema, version, Version}, Acc) ->
                    {[{schema, version, Version}], Acc};
               ({schema, cookie, Cookie}, Acc) ->
                    {[{schema, cookie, Cookie}], Acc};
               ({schema, Tab, CreateList}, Acc) ->
                    Keys = [ram_copies, disc_copies, disc_only_copies],
                    OptSwitch =
                        fun({Key, Val}) ->
                                case lists:member(Key, Keys) of
                                    true -> {Key, lists:map(Switch, Val)};
                                    false-> {Key, Val}
                                end
                        end,
                    {[{schema, Tab, lists:map(OptSwitch, CreateList)}], Acc};
               (Other, Acc) ->
                    {[Other], Acc}
            end,
        mnesia:traverse_backup(Source, Mod, Target, Mod, Convert, switched).
    
    view(Source, Mod) ->
        View = fun(Item, Acc) ->
                       io:format("~p.~n",[Item]),
                       {[Item], Acc + 1}
               end,
        mnesia:traverse_backup(Source, Mod, dummy, read_only, View, 0).
    

    The most important part here is the manipulation of the {schema, db_nodes, Nodes} tuple which let you rename or replace the db nodes.

    BTW, I've used that function in the past and one thing I noticed is that the backup terms format changes between mnesia versions, but maybe it was simply me writing bad code. Just print a backup log for a small mnesia database to check backup term format, if you wanna be sure.

    Hope this helps!