Ada records : detect lost bytes, propose optimisation

I'm working on an legacy Ada project with significant RAM constraints.

In order to save memory for additional features, I would like to analyze all records definitions, in order to:

detect holes (i.e. wasted bytes)
propose a record declaration order (or a representation) that minimizes the memory footprint (with some algorithm that should be similar to Knapsack problem)

Please note that I'm not (yet) in the process of saving any lost bits (no pragma pack needed here, nor rep. clause for strictly contiguous records like in this question). Only bytes for now.

Simplified example (real world record are way more complex and may have discriminants, tagged types) :

type My_Record is record
    field1 : Foo; -- enum with 3 values
    field2 : Bar; -- some other record
    field3 : Float; -- 32 bits
    field4 : Flex;-- enum with 12 values
end record;

Its -gnatR2s output would look like (32 bits world):

for My_Record'Alignment use 4;
for My_Record use record
    field1 at 0  use 0.. 7;
    field2 at 4  use 0..47;  -- 3 bytes lost from field 1
    field3 at 12 use 0..31;  -- 2 bytes lost from field 2
    field4 at 16 use 0.. 7;  -- another 3 bytes lost
end record;

What i would like to do (memory usage optimized record):

-- rewrite record, not the preferred way since record writing order may have some human readable context purpose
type My_Record is record
    field2 : Bar;  -- 2 unused bytes
    field1 : Foo;  -- 1 byte left
    field4 : Flex; -- 0 byte left
    field3 : Float; -- 4 bytes used
    -- 0 wasted bytes
end record;

or:

-- preferred way : keep record declaration, but force rep. clause
type My_Record is record
    field1 : Foo;
    field2 : Bar;
    field3 : Float;
    field4 : Flex;
end record;
-- optimization is here
for My_Record'alignment use 4;
for My_Record use record
    field2 at 0  use 0..47;
    field1 at 6  use 0.. 7; -- exploit the Bar unused bytes
    field4 at 7  use 0.. 7; -- exploit the Bar unused bytes
    field3 at 8  use 0..31;
end record;

(apologies for any mistake in the example, I hope you get the point)

How can I do this ?

ASIS (but I have 0% skill, I'm not even sure it can do what I want)
libadalang (how does it obtain rep. clauses without compiling the units ?)
just use -gnatR2s on all compilation units and write a .rep parser in python
there is a hidden compilation option, a pragma or an existing GNAT tool that can be of help (like pragma component_alignment or pragma optimize_alignment, but I can't say if they address the matter, since it affects alignment, but not necessarily alignment + ordering)

For context on repl clauses, Ada Reference Manual and GNAT small differences, one may read this link

Solution

I think you already already state all options: using the implementation specific pragma's you mention, optimize manually and/or write a custom optimizer that analyzes the representation clauses and optimizes them using some (tunable) cost function that trades space, access time, etc.. Writing a custom optimizer will most certainly be most costly in terms of time and effort. I'm not aware of any other "hidden" options (if they are hidden, then it might be for a reason).

I would start by formulating a space budget (if that's possible), then analyze how much padding bytes each record type has in order to have a better understanding of how they are distributed and to estimate the potential maximum amount of bytes that can be recovered by removing all padding bytes (note that instances count: a small type being instantiated a lot might have a larger impact on the memory footprint than large type which is instantiated only once). And only then I would determine the strategy to reduce the padding bytes: use a pragma for all types, or only some types, optimize some records by hand or conclude that you really need a custom optimizer. Of course, time + cost matters here. I wouldn't recommend an extensive analysis if the job must be finished by tomorrow.

For the analysis I would just parse the output of -gnatR2. From what I know (but you might want to check this), libadalang focuses on source code (which may not state an explicit representation clause). Regarding ASIS, I think it might work, but I doubt if it's worth the effort. Parsing the output of -gnatR2 using some scripting language will probably be more time efficient.