serializationgostructgob

Efficient Go serialization of struct to disk


I've been tasked to replace C++ code to Go and I'm quite new to the Go APIs. I am using gob for encoding hundreds of key/value entries to disk pages but the gob encoding has too much bloat that's not needed.

package main

import (
    "bytes"
    "encoding/gob"
    "fmt"
)
type Entry struct {
    Key string
    Val string
}

func main() {
    var buf bytes.Buffer
    enc := gob.NewEncoder(&buf)
    e := Entry { "k1", "v1" }
    enc.Encode(e)
    fmt.Println(buf.Bytes())
}

This produces a lot of bloat that I don't need:

[35 255 129 3 1 1 5 69 110 116 114 121 1 255 130 0 1 2 1 3 75 101 121 1 12 0 1 3 86 97 108 1 12 0 0 0 11 255 130 1 2 107 49 1 2 118 49 0] 

I want to serialize each string's len followed by the raw bytes like:

[0 0 0 2 107 49 0 0 0 2 118 49]

I am saving millions of entries so the additional bloat in the encoding increases the file size by roughly x10.

How can I serialize it to the latter without manual coding?


Solution

  • Use protobuf to efficiently encode your data.

    https://github.com/golang/protobuf

    Your main would look like this:

    package main
    
    import (
        "fmt"
        "log"
    
        "github.com/golang/protobuf/proto"
    )
    
    func main() {
        e := &Entry{
            Key: proto.String("k1"),
            Val: proto.String("v1"),
        }
        data, err := proto.Marshal(e)
        if err != nil {
            log.Fatal("marshaling error: ", err)
        }
        fmt.Println(data)
    }
    

    You create a file, example.proto like this:

    package main;
    
    message Entry {
        required string Key = 1;
        required string Val = 2;
    }
    

    You generate the go code from the proto file by running:

    $ protoc --go_out=. *.proto
    

    You can examine the generated file, if you wish.

    You can run and see the results output:

    $ go run *.go
    [10 2 107 49 18 2 118 49]