javahadoopserializationjava-iowritable

Why to use Writable when we can directly use DataInput and DataOutput?


Probably because implementing Writable would present us with a serializable object. I know DataInput and DataOutput directly deal with byte streams but I see no harm in directly reading values off them and storing it in primitive types too.

That being said, the very use of readFields() and write() methods seems futile and only could be used with the perspective of modularity. Creating objects for DataInput and DataOutput classes for instance variables and directly taking inputs (using DataInput and DataOutput like a Scanner utility class) seems quite simple. Creating an interface for them and implementing those obvious methods (be it in pre-defined box classes or our own custom classes) look like syntactic sugar as far as I can see.

Help me see through it if there's something to be seen.

UPDATE: DataInput and DataOutput classes produce serialized objects! :o


Solution

  • DataOutput and DataInput serialize/deserialze only the most basic types, that is, the primitive types and not the custom or complex objects.

    This is why by implementing Writable, and in turn its method readFields(DataInput in) and write(DataOutput out), we can serialize the members/instance variables of our own class and traverse their inputs or outputs. And since Writables are written for specific classes, they are compact, small (and not 5-bytes long) and so provide higher performance as we don't have to store the metdata for the class type and allow easy streaming over the distributed network as compared to Java Serializable.