gojournal

Making a journal file in golang


I have a small project in Go that are receiving text lines over tcp to process. However, to ensure robustness, I want to create some sort of journal so that nothing is lost in case of power failure (e.g. a frame of data is received by my app, but is not yet processed).

I have googled for any guides on how a journal file should be implemented, but the search results are heavily polluted by Oracle RDBMS documentation and such.

My tought was something like: immediately after receiving a line, write it to a file with a "not processed flag". After processing, update the file so that this flag is cleared, opening for overwrites. At the same time as this flag is cleared, send an "processed ack" to the data sender. Perhaps its easiest to deal with fixed size "slots" in the journal to ensure that I can reuse freed slots rather than having a ever-increasing file and maintain a "free list" of unused slots.

Is there any "best practice" for implementing such files in custom code, i.g.e with regards to file structure, padding and locking? Are there any concerns doing so in Go as it is cross-platform rather than using native file-system APIs?


Solution

  • You shouldn't rewrite a journal. Just append the operations to it so that you can recreate them, and then control the strictness level you want.

    The logic should simply be:

    1. receive message

    2. write it to journal

    3. optionally do an fsync on the journal now - depending on your consistency requirements.

    4. optionally then send a "received ack" - depends on your needs.

    5. process the message.

    6. optionally write another "processed" record to the file with an id of the record. you don't always need that but this where you don't rewrite the old record. Alternatively you can write a separate file with the "top transaction id" you've processed, so you'll automatically know where to begin processing again in case of a failure. this will reduce the journal size.

    7. send a "processed ack" or "processing failure" - again, depends on what you want.

    Databases usually let you control the fsync behavior - every write, every N seconds, when the os decides - it's a matter of speed vs. durability.

    A good read on the subject might be this post on redis persistence: http://oldblog.antirez.com/post/redis-persistence-demystified.html

    [EDIT] another great read on the subject - http://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying

    As for the Go aspect of it - there are a few options of writing to files, from a low level file handler to a buffered writer. Of course a file handler will keep you most in control of what's going on under the hood. I'm not sure how much caching behind the scenes a normal file writer in Go does, I'd suggest you read the code if you intend to use it.