jsongoencodingdeep-copygob

Quicker way to deepcopy objects in golang, JSON vs gob


I am using go 1.9. And I want to deepcopy value of object into another object. I try to do it with encoding/gob and encoding/json. But it takes more time for gob encoding than json encoding. I see some other questions like this and they suggest that gob encoding should be quicker. But I see exact opposite behaviour. Can someone tell me if I am doing something wrong? Or any better and quicker way to deepcopy than these two? My object's struct is complex and nested.

The test code:

package main

import (
    "bytes"
    "encoding/gob"
    "encoding/json"
    "log"
    "time"

    "strconv"
)

// Test ...
type Test struct {
    Prop1 int
    Prop2 string
}

// Clone deep-copies a to b
func Clone(a, b interface{}) {

    buff := new(bytes.Buffer)
    enc := gob.NewEncoder(buff)
    dec := gob.NewDecoder(buff)
    enc.Encode(a)
    dec.Decode(b)
}

// DeepCopy deepcopies a to b using json marshaling
func DeepCopy(a, b interface{}) {
    byt, _ := json.Marshal(a)
    json.Unmarshal(byt, b)
}

func main() {
    i := 0
    tClone := time.Duration(0)
    tCopy := time.Duration(0)
    end := 3000
    for {
        if i == end {
            break
        }

        r := Test{Prop1: i, Prop2: strconv.Itoa(i)}
        var rNew Test
        t0 := time.Now()
        Clone(r, &rNew)
        t2 := time.Now().Sub(t0)
        tClone += t2

        r2 := Test{Prop1: i, Prop2: strconv.Itoa(i)}
        var rNew2 Test
        t0 = time.Now()
        DeepCopy(&r2, &rNew2)
        t2 = time.Now().Sub(t0)
        tCopy += t2

        i++
    }
    log.Printf("Total items %+v, Clone avg. %+v, DeepCopy avg. %+v, Total Difference %+v\n", i, tClone/3000, tCopy/3000, (tClone - tCopy))
}

I get following output:

Total items 3000, Clone avg. 30.883µs, DeepCopy avg. 6.747µs, Total Difference 72.409084ms

Solution

  • JSON vs gob difference

    The encoding/gob package needs to transmit type definitions:

    The implementation compiles a custom codec for each data type in the stream and is most efficient when a single Encoder is used to transmit a stream of values, amortizing the cost of compilation.

    When you "first" serialize a value of a type, the definition of the type also has to be included / transmitted, so the decoder can properly interpret and decode the stream:

    A stream of gobs is self-describing. Each data item in the stream is preceded by a specification of its type, expressed in terms of a small set of predefined types.

    This is explained in great details here: Efficient Go serialization of struct to disk

    So while in your case it's necessary to create a new gob encoder and decoder each time, it is still the "bottleneck", the part that makes it slow. Encoding to / decoding from JSON format, type description is not included in the representation.

    To prove it, make this simple change:

    type Test struct {
        Prop1 [1000]int
        Prop2 [1000]string
    }
    

    What we did here is made the types of fields arrays, "multiplying" the values a thousand times, while the type information is effectively remained the same (all elements in the arrays have the same type). Creating values of them like this:

    r := Test{Prop1: [1000]int{}, Prop2: [1000]string{}}
    

    Now running your test program, the output on my machine:

    Original:

    2017/10/17 14:55:53 Total items 3000, Clone avg. 33.63µs, DeepCopy avg. 2.326µs, Total Difference 93.910918ms

    Modified version:

    2017/10/17 14:56:38 Total items 3000, Clone avg. 119.899µs, DeepCopy avg. 462.608µs, Total Difference -1.02812648s

    As you can see, in the original version JSON is faster, but in the modified version gob became faster, as the cost of transmitting type info amortized.

    Testing / benching method

    Now on to your testing method. This way of measuring performance is bad and can yield quite inaccurate results. Instead you should use Go's built-in testing and benchmark tools. For details, read Order of the code and performance.

    Caveats of these cloning

    These methods work with reflection and thus can only "clone" fields that are accessible via reflection, that is: exported. Also they often don't manage pointer equality. By this I mean if you have 2 pointer fields in a struct, both pointing to the same object (pointers being equal), after marshaling and unmarshaling, you'll get 2 different pointers pointing to 2 different values. This may even cause problems in certain situations. They also don't handle self-referencing structures, which at best returns an error, or in wrose case causes an infinite loop or goroutine stack exceeding.

    The "proper" way of cloning

    Considering the caveats mentioned above, often the proper way of cloning needs help from the "inside". That is, cloning a specific type is often only possible if that type (or the package of that type) provides this functionality.

    Yes, providing a "manual" cloning functionality is not convenient, but on the other side it will outperform the above methods (maybe even by orders of magnitude), and will require the least amount of "working" memory required for the cloning process.