In RFC 8285, which deals with RTP Header Extensions, the structure for a 1-byte header extension is as shown below (Section 4.2):
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 0xBE | 0xDE | length=3 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ID | L=0 | data | ID | L=1 | data...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
...data | 0 (pad) | 0 (pad) | ID | L=3 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
I understand the OxBEDE which is explained in the RFC. Then comes the "length=3" bits which are followed by the actual extensions. Each extension consists of the ID followed by length. A similar structure is defined for two-byte header extensions.
In both types of headers, I do not understand the "length=3" bits section. Is it just padding used for 32-bit boundary? If so, what purpose does this serve? Ease in parsing? Why not have extension elements started immediately after the xBEDE. Certainly would have been space efficient. May be I am missing something basic.
This probably dates back to RFC 3550. Specifying the length field explicitly like this allows clients that do not understand extensions to skip them more easily. Also note that until extended by RFC 5285 (updated by 8285) there could only be a single extension so what you see is a backward compability hack.