I'm self-studying databases in my spare time, trying to learn by implementing one ground-up.
One of the first things you have to implement is the underlying data format and storage mechanisms.
In DB's, there is a structure called a "Slotted Page", which looks like this:
+-----------------------------------------------------------+
| +----------------------+ +-+ +-+ +-+ +-+ +-+ +-+ +-+ +-+ |
| | HEADER | | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | | |
| +----------------------+ +-+ +-+ +-+ +-+ +-+ +-+ +-+ +-+ |
| SLOT ARRAY |
| |
| |
| |
| +--------------------+ +----------------+ |
| | TUPLE #4 | | TUPLE #3 | |
| | | | | |
| +--------------------+ +----------------+ |
| +--------------------------+ +------------------+ |
| | TUPLE #2 | | TUPLE #1 | |
| | | | | |
| +--------------------------+ +------------------+ |
+-----------------------------------------------------------+
The page data is stored via binary serialization to a file. The slots are the simplest part, where the definition might look something like this:
struct Slot {
uint32_t offset;
uint32_t length;
}
And in C++ the process of reading/writing this might be a std::memcpy
// Ignoring offset of header size in below
void write_to_buffer(char *buffer, Slot& slot, uint32_t slot_idx) {
memcpy(buffer + sizeof(Slot) * slot_idx, &slot.offset, sizeof(uint32_t));
memcpy(buffer + sizeof(Slot) * slot_idx + sizeof(uint32_t), &slot.length, sizeof(uint32_t));
}
void read_from_buffer(char *buffer, Slot& slot, uint32_t slot_idx) {
memcpy(&slot.offset, buffer + sizeof(Slot) * slot_idx, sizeof(uint32_t));
memcpy(&slot.length, buffer + sizeof(Slot) * slot_idx + sizeof(Slot), sizeof(uint32_t));
}
In Java, to my knowledge you can do one of either two things:
record Slot(int offset, int length) {
void write(ByteBuffer buffer) {
buffer.putInt(offset).putInt(length);
}
static Slot read(ByteBuffer buffer) {
return new Slot(buffer.getInt(), buffer.getInt());
}
}
record Slot(int offset, int length) {
public static MemoryLayout LAYOUT = MemoryLayout.structLayout(
ValueLayout.JAVA_INT.withName("offset"),
ValueLayout.JAVA_INT.withName("length"));
public static TupleSlot from(MemorySegment memory) {
return new TupleSlot(
memory.get(ValueLayout.JAVA_INT, 0),
memory.get(ValueLayout.JAVA_INT, Integer.BYTES));
}
public void to(MemorySegment memory) {
memory.set(ValueLayout.JAVA_INT, 0, offset);
memory.set(ValueLayout.JAVA_INT, Integer.BYTES, length);
}
}
What would the performance difference be between these?
I'd prefer the ByteBuffer API if it's negligible.
Answering with response from Paul Sandoz on the panama-dev
mailing list:
Hi Gavin,
Using MemorySegment will given you far more control over the description (layout) and management (freeing and pooling) than ByteBuffer. Also, if it’s an issue you will also not be constrained by ByteBuffer’s size limitation. Performance wise using MemorySegment should be as good as or better than ByteBuffer.
In many respects MemorySegment is a better API to interact with native memory. ByteBuffer was introduced in Java 1.4 with NIO and had additional design constraints in mind that are less relevant today (such as an internal mutable index).
Paul.