I am trying to use objcopy to get a binary dump of an elf file that has not yet gone through the link stage. It's actually an RP2040 object file cross compiled by gcc version 6.3.1. (The latest version available for 32 bit ARM in the repositories for Ubuntu Bionic).
readelf -a shows the following:
$ readelf -S pico-sdk/src/rp2_common/pico_stdio/stdio.c.obj
There are 69 section headers, starting at offset 0x912c:
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .text PROGBITS 00000000 000034 000000 00 AX 0 0 2
[ 2] .data PROGBITS 00000000 000034 000000 00 WA 0 0 1
[ 3] .bss NOBITS 00000000 000034 000000 00 WA 0 0 1
[ 4] .text.stdio_out_c PROGBITS 00000000 000034 000010 00 AX 0 0 4
[ 5] .text.stdio_out_c PROGBITS 00000000 000044 0000bc 00 AX 0 0 4
[ 6] .rel.text.stdio_o REL 00000000 006550 000008 08 I 67 5 4
[ 7] .text.stdio_buffe PROGBITS 00000000 000100 000064 00 AX 0 0 4
[ 8] .rel.text.stdio_b REL 00000000 006558 000018 08 I 67 7 4
[ 9] .text.stdout_seri PROGBITS 00000000 000164 00002c 00 AX 0 0 4
[10] .rel.text.stdout_ REL 00000000 006570 000018 08 I 67 9 4
[11] .text.stdout_seri PROGBITS 00000000 000190 000010 00 AX 0 0 4
[12] .rel.text.stdout_ REL 00000000 006588 000010 08 I 67 11 4
[13] .text.stdio_put_s PROGBITS 00000000 0001a0 0000f8 00 AX 0 0 4
[14] .rel.text.stdio_p REL 00000000 006598 000048 08 I 67 13 4
[15] .text.stdio_get_u PROGBITS 00000000 000298 000084 00 AX 0 0 4
[16] .rel.text.stdio_g REL 00000000 0065e0 000018 08 I 67 15 4
[17] .text.stdio_putch PROGBITS 00000000 00031c 000094 00 AX 0 0 4
[18] .rel.text.stdio_p REL 00000000 0065f8 000038 08 I 67 17 4
[19] .text.stdio_puts_ PROGBITS 00000000 0003b0 000034 00 AX 0 0 4
[20] .rel.text.stdio_p REL 00000000 006630 000018 08 I 67 19 4
[21] .text.stdio_set_d PROGBITS 00000000 0003e4 000030 00 AX 0 0 4
[22] .rel.text.stdio_s REL 00000000 006648 000008 08 I 67 21 4
[23] .text.stdio_flush PROGBITS 00000000 000414 000020 00 AX 0 0 4
[24] .rel.text.stdio_f REL 00000000 006650 000008 08 I 67 23 4
[25] .text.stdio_init_ PROGBITS 00000000 000434 00000c 00 AX 0 0 4
[26] .rel.text.stdio_i REL 00000000 006658 000008 08 I 67 25 4
[27] .text.stdio_deini PROGBITS 00000000 000440 000024 00 AX 0 0 4
[28] .rel.text.stdio_d REL 00000000 006660 000010 08 I 67 27 4
[29] .text.stdio_getch PROGBITS 00000000 000464 000094 00 AX 0 0 4
[30] .rel.text.stdio_g REL 00000000 006670 000020 08 I 67 29 4
[31] .text.stdio_filte PROGBITS 00000000 0004f8 00000c 00 AX 0 0 4
[32] .rel.text.stdio_f REL 00000000 006690 000008 08 I 67 31 4
[33] .text.stdio_set_t PROGBITS 00000000 000504 000010 00 AX 0 0 4
[34] .text.stdio_set_c PROGBITS 00000000 000514 000028 00 AX 0 0 4
[35] .rel.text.stdio_s REL 00000000 006698 000008 08 I 67 34 4
[36] .text.__wrap_getc PROGBITS 00000000 00053c 000074 00 AX 0 0 4
[37] .rel.text.__wrap_ REL 00000000 0066a0 000020 08 I 67 36 4
[38] .text.__wrap_putc PROGBITS 00000000 0005b0 000094 00 AX 0 0 4
[39] .rel.text.__wrap_ REL 00000000 0066c0 000038 08 I 67 38 4
[40] .text.__wrap_puts PROGBITS 00000000 000644 000034 00 AX 0 0 4
[41] .rel.text.__wrap_ REL 00000000 0066f8 000018 08 I 67 40 4
[42] .text.__wrap_vpri PROGBITS 00000000 000678 0000cc 00 AX 0 0 4
[43] .rel.text.__wrap_ REL 00000000 006710 000048 08 I 67 42 4
[44] .text.__wrap_prin PROGBITS 00000000 000744 000018 00 AX 0 0 4
[45] .rel.text.__wrap_ REL 00000000 006758 000008 08 I 67 44 4
[46] .bss.drivers NOBITS 00000000 00075c 000004 00 WA 0 0 4
[47] .bss.filter NOBITS 00000000 00075c 000004 00 WA 0 0 4
[48] .mutex_array PROGBITS 00000000 00075c 000008 00 WA 0 0 4
[49] .rodata.crlf_str. PROGBITS 00000000 000764 000002 00 A 0 0 4
[50] .debug_info PROGBITS 00000000 000766 0020b7 00 0 0 1
[51] .rel.debug_info REL 00000000 006760 001380 08 I 67 50 4
[52] .debug_abbrev PROGBITS 00000000 00281d 00057f 00 0 0 1
[53] .debug_loc PROGBITS 00000000 002d9c 000ba5 00 0 0 1
[54] .rel.debug_loc REL 00000000 007ae0 000bf8 08 I 67 53 4
[55] .debug_aranges PROGBITS 00000000 003941 0000c8 00 0 0 1
[56] .rel.debug_arange REL 00000000 0086d8 0000b8 08 I 67 55 4
[57] .debug_ranges PROGBITS 00000000 003a09 0002a0 00 0 0 1
[58] .rel.debug_ranges REL 00000000 008790 000410 08 I 67 57 4
[59] .debug_line PROGBITS 00000000 003ca9 0009a6 00 0 0 1
[60] .rel.debug_line REL 00000000 008ba0 0000b0 08 I 67 59 4
[61] .debug_str PROGBITS 00000000 00464f 00115c 01 MS 0 0 1
[62] .comment PROGBITS 00000000 0057ab 000032 01 MS 0 0 1
[63] .debug_frame PROGBITS 00000000 0057e0 0002a4 00 0 0 4
[64] .rel.debug_frame REL 00000000 008c50 000160 08 I 67 63 4
[65] .ARM.attributes ARM_ATTRIBUTES 00000000 005a84 000032 00 0 0 1
[66] .shstrtab STRTAB 00000000 008db0 00037a 00 0 0 1
[67] .symtab SYMTAB 00000000 005ab8 0007f0 10 68 93 4
[68] .strtab STRTAB 00000000 0062a8 0002a7 00 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
y (purecode), p (processor specific)
What I expect is that
arm-none-eabi-objcopy -Obinary pico-sdk/src/rp2_common/pico_stdio/stdio.c.obj bin.out
will include every section that is PROGBITS and has an 'A' flag. This corresponds to 26 sections with a total size of 1842 bytes. (This is from my own ELF parser which I think makes it easier to read.)
Text Sections:
Offset Length Name
---------------------------------------------
0x00000034 0x00000000 ".text"
0x00000034 0x00000000 ".data"
0x00000034 0x00000010 ".text.stdio_out_chars_no_crlf"
0x00000044 0x000000bc ".text.stdio_out_chars_crlf"
0x00000100 0x00000064 ".text.stdio_buffered_printer"
0x00000164 0x0000002c ".text.stdout_serialize_begin"
0x00000190 0x00000010 ".text.stdout_serialize_end"
0x000001a0 0x000000f8 ".text.stdio_put_string"
0x00000298 0x00000084 ".text.stdio_get_until"
0x0000031c 0x00000094 ".text.stdio_putchar_raw"
0x000003b0 0x00000034 ".text.stdio_puts_raw"
0x000003e4 0x00000030 ".text.stdio_set_driver_enabled"
0x00000414 0x00000020 ".text.stdio_flush"
0x00000434 0x0000000c ".text.stdio_init_all"
0x00000440 0x00000024 ".text.stdio_deinit_all"
0x00000464 0x00000094 ".text.stdio_getchar_timeout_us"
0x000004f8 0x0000000c ".text.stdio_filter_driver"
0x00000504 0x00000010 ".text.stdio_set_translate_crlf"
0x00000514 0x00000028 ".text.stdio_set_chars_available_callback"
0x0000053c 0x00000074 ".text.__wrap_getchar"
0x000005b0 0x00000094 ".text.__wrap_putchar"
0x00000644 0x00000034 ".text.__wrap_puts"
0x00000678 0x000000cc ".text.__wrap_vprintf"
0x00000744 0x00000018 ".text.__wrap_printf"
0x0000075c 0x00000008 ".mutex_array"
0x00000764 0x00000002 ".rodata.crlf_str.6304"
Data Sections:
Offset Length Name
---------------------------------------------
0x00000034 0x00000000 ".bss"
0x0000075c 0x00000004 ".bss.drivers"
0x0000075c 0x00000004 ".bss.filter"
Total code size = 1842 in 26 sections.
Total data size = 8 in 3 sections.
However, what I get from objcopy is actually a bin.out file that is only 248 bytes long.
$ ls -l bin.out
-rw-rw-r-- 1 devel devel 248 Oct 6 03:53 bin.out
After racking my brain for several days, I realized that what objcopy is generating is only the longest TEXT section (".text.stdio_put_string") out of the 26. It isn't actually appending each text section to the one before it.
I can't find an option in objcopy that does what I want. I've tried options like --gap-fill=0. Everything just results in the same 248 byte file. Does anyone know if there is a way to resolve this issue? I would really like to come up with a way to generate binary files from this that use standard tools.
Thank you for any advice.
(In case anyone is curious, this is for tracking digital signatures of precursor files that get compiled into a binary.)
After racking my brain for several days, I realized that what objcopy is generating is only the longest TEXT section (".text.stdio_put_string") out of the 26. It isn't actually appending each text section to the one before it.
You're close, but not quite right.
By default objcopy
copies sections that have non-zero size, have the ALLOC
flag, and a type not equal to NOBITS
. Call such
a section an image-section. Note than an image section is not necessarily a PROGBITS
section, e.g. a NOTE
section may be an image-section.
Your objcopy -Obinary
command outputs N
bytes where N
is the size of the largest image-section (which in your case happens to be
the largest .text*
section), but these N
bytes are not that section. They are the garbage that results from
outputting each eligible section, in their ELF section order, each on top of the last - i.e. all of them aligned to the
start of the raw binary.
A demo of that.
If you don't need convincing you can skip to The objcopy solution
From man objcopy
, we read:
-j sectionpattern --only-section=sectionpattern
Copy only the indicated sections from the input file to the output file. This option may be given more than once. Note that using this option inappropriately may make the output file unusable. Wildcard characters are accepted in sectionpattern.
If the first character of sectionpattern is the exclamation point (!) then matching sections will not be copied, even if earlier use of --only-section on the same command line would otherwise copy it. For example:
--only-section=.text.* --only-section=!.text.foo
will copy all sectinos matching '.text.*' but not the section '.text.foo'.
This would suggest that, e.g.
$ objcopy -Obinary -j .text* -j .data* -j .eh_frame file.o out.bin
would concatenate all the sections with names matching .text*
, .data*
or .eh_frame
from file.o
into out.bin
.
Let's see:
$ cat file.c
int rr[3] = {2,5,7};
int vv[4] = {11,13,17,19};
int aa(int a, int b) {
return a + b;
}
void bb(int * a, int * b, unsigned sz) {
for (unsigned i = 0; i < sz; ++i) {
b[i] = a[i];
}
}
unsigned cc(unsigned a) {
unsigned b = 0;
for ( ; a; b+=a, --a){};
return b;
}
Compile file.o
with fine-grained sections:
$ gcc -c -ffunction-sections -fdata-sections file.c
$ readelf -WS file.o
There are 17 section headers, starting at offset 0x3a0:
Section Headers:
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
[ 0] NULL 0000000000000000 000000 000000 00 0 0 0
[ 1] .text PROGBITS 0000000000000000 000040 000000 00 AX 0 0 1
[ 2] .data PROGBITS 0000000000000000 000040 000000 00 WA 0 0 1
[ 3] .bss NOBITS 0000000000000000 000040 000000 00 WA 0 0 1
[ 4] .data.rr PROGBITS 0000000000000000 000040 00000c 00 WA 0 0 8
[ 5] .data.vv PROGBITS 0000000000000000 000050 000010 00 WA 0 0 16
[ 6] .text.aa PROGBITS 0000000000000000 000060 000018 00 AX 0 0 1
[ 7] .text.bb PROGBITS 0000000000000000 000078 000054 00 AX 0 0 1
[ 8] .text.cc PROGBITS 0000000000000000 0000cc 000029 00 AX 0 0 1
[ 9] .comment PROGBITS 0000000000000000 0000f5 000027 01 MS 0 0 1
[10] .note.GNU-stack PROGBITS 0000000000000000 00011c 000000 00 0 0 1
[11] .note.gnu.property NOTE 0000000000000000 000120 000020 00 A 0 0 8
[12] .eh_frame PROGBITS 0000000000000000 000140 000078 00 A 0 0 8
[13] .rela.eh_frame RELA 0000000000000000 0002c0 000048 18 I 14 12 8
[14] .symtab SYMTAB 0000000000000000 0001b8 0000f0 18 15 5 8
[15] .strtab STRTAB 0000000000000000 0002a8 000017 00 0 0 1
[16] .shstrtab STRTAB 0000000000000000 000308 000094 00 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
D (mbind), l (large), p (processor specific)
The aggregate size of the image-sections (.text*
+ .data*
+ .eh_frame
+ .note.gnu.property
) is:
0xc + 0x10 + 0x18 + 0x54 + 0x29 + 0x20 + 0x78 = 0x149 = 329 bytes.
However:
$ objcopy -Obinary -j .text* -j *.data -j .eh_frame -j .note.gnu.propery file.o image.bin
$ stat -c "%s" image.bin
120
Only 120 = 0x78 bytes have been output - the size of the largest matching section, .eh_frame
. And the output of:
$ objcopy -Obinary -j .eh_frame file.o eh_frame.bin
is identical with image.bin
$ cmp image.bin eh_frame.bin; echo Done
Done
But in this case the largest image-section is also the last image-section in the file. So prima facie
it might constitute the whole contents of image.bin
just because the largest section was chosen or because the last
section overlaid at the start of the output happened to be the largest one.
We can decide between these cases by eliminating .eh_frame
from the output, so that the
largest image-section is not the last. In fact I'll eliminate all but the .text*
and .data*
sections,
which will shorten matters a bit and have the same effect.
$ objcopy -Obinary -j .text* -j .data* file.o text+data.bin
In text+data.bin
, the largest section will be .text.bb
, size 0x54 = 84 bytes. The aggregate size of
the .text*
+ .data*
sections is:
0xc + 0x10 + 0x18 + 0x54 + 0x29 = 0xB1 = 177 bytes
The size of the text+data.bin
, as we now expect, isn't that; it is:
$ stat -c "%s" text+data.bin
84
Let's have .text.bb
in a file by itself:
$ objcopy -Obinary -j .text.bb file.o bb.bin
It's the same size as text+data.bin
:
$ stat -c "%s" bb.bin
84
But it is not identical:
$ cmp text+data.bin bb.bin
text+data.bin bb.bin differ: byte 9, line 1
That refutes the theory that the largest section is chosen.
text.bb
is not the last section selected for text+data.bin
: that is .text.cc
, size 0x29 = 41 bytes.
So let's have .text.cc
in a file by itself:
$ objcopy -Obinary -j .text.cc file.o cc.bin
$ stat -c "%s" cc.bin
41
And see that:
$ cmp -n41 text+data.bin cc.bin; echo Done
Done
The first 41 bytes of text+data.bin
are the section .text.cc
. .text.cc
is the 2nd largest
selected section. So let's see if the remaining 84-41 = 43 bytes of text+data.bin
are the
last 43 bytes of .text.bb
:
$ dd if=text+data.bin of=tail-text+data.bin bs=41 skip=1
1+1 records in
1+1 records out
43 bytes copied, 8.6186e-05 s, 499 kB/s
$ dd if=bb.bin of=tail-bb.bin bs=41 skip=1
1+1 records in
1+1 records out
43 bytes copied, 0.000129071 s, 333 kB/s
$ cmp tail-text+data.bin tail-bb.bin; echo done
done
And so they are. For objcopy -Obinary
, the selected sections are overlaid, in ELF order, at the start of the output file.
The objcopy solution
In the light of that finding, we can read between the lines of the paragraph that man objcopy
devotes to
to explaining -O binary
:
objcopy can be used to generate a raw binary file by using an output target of binary (e.g., use -O binary). When objcopy generates a raw binary file, it will essentially produce a memory dump of the contents of the input object file. All symbols and relocation information will be discarded. The memory dump will start at the load address of the lowest section copied into the output file.
[my emphasis]
The emphasised clause implies that the section load addresses found in the object file
will be used as the file offsets of the respective sections in the output file. In file.o
,
an unlinked file, those addresses are of course all 0. So all the sections are output at start-of-file.
A remedy for this would be to change the output section addresses so as to lay out the sections consecutively. objcopy
has an option for that purpose:
--change-section-address sectionpattern{=,+,-}val --adjust-section-vma sectionpattern{=,+,-}val
Set or change both the VMA address and the LMA address of any section matching sectionpattern. If = is used, the section address is set to val. Otherwise, val is added to or subtracted from the section address. See the comments under --change-addresses, above. If sectionpattern does not match any sections in the input file, a warning will be issued, unless --no-change-warnings is used.
So, to the first selected section we can assign address 0x0, then to each subsequent selected section assign an address equal to that of the previous section + its size. That would be like:
$ objcopy -O binary -j .data* -j .text* --change-section-address .data.rr=0x0 \
--change-section-address .data.vv=0xc --change-section-address .text.aa=0x1C \
--change-section-address .text.bb=0x34 --change-section-address .text.cc=0x88 \
file.o text+data-redux.bin
With which:
$ stat -c "%s" text+data-redux.bin
177
is exactly the noted size of the .text*
+ .data*
sections.
We can prove that the selected sections are consecutive and correct:-
Need the rest of the single-section binaries:
$ objcopy -Obinary -j .data.rr file.o rr.bin
$ objcopy -Obinary -j .data.vv file.o vv.bin
$ objcopy -Obinary -j .text.aa file.o aa.bin
.data.rr
= 1st section copied, length 12:
$ cmp text+data-redux.bin rr.bin
cmp: EOF on rr.bin after byte 12, in line 1
Matched first 12 bytes. Discard them:
$ dd if=text+data-redux.bin of=tail-rr.bin bs=12 skip=1
13+1 records in
13+1 records out
165 bytes copied, 0.000108776 s, 1.5 MB/s
.data.vv
= 2nd section copied, length 16:
$ cmp tail-rr.bin vv.bin
cmp: EOF on vv.bin after byte 16, in line 1
Matched next 16 bytes. Discard them:
$ dd if=tail-rr.bin of=tail-vv.bin bs=16 skip=1
9+1 records in
9+1 records out
149 bytes copied, 0.000169584 s, 879 kB/s
.text.aa
= 3rd section copied, length 24:
$ cmp tail-vv.bin aa.bin
cmp: EOF on aa.bin after byte 24, in line 1
Matched next 24 bytes. Discard them:
$ dd if=tail-vv.bin of=tail-aa.bin bs=24 skip=1
5+1 records in
5+1 records out
125 bytes copied, 8.5904e-05 s, 1.5 MB/s
.text.bb
= 4th section copied, length 84:
$ cmp tail-aa.bin bb.bin
cmp: EOF on bb.bin after byte 84, in line 1
Matched next 84 bytes. Discard them:
$ dd if=tail-aa.bin of=tail-bb.bin bs=84 skip=1
0+1 records in
0+1 records out
41 bytes copied, 0.00014661 s, 280 kB/s
.text.cc
= last section copied, length 41:
$ cmp tail-bb.bin cc.bin; echo Done
Done
Matched last 41 bytes
Automation
Unaided by automation, this solution grows rapidly unwieldy with the number of sections to be
copied, which just with -function-sections
/-fdata-sections
compilations can be arbitrarily large.
You say that you want a solution using "standard tools". If that means at most a
pipe of stock commands I think you're out of luck, but at this point I expect you'd consider
a bash script. Here's one that harnesses objcopy -O binary --change-section-address ...
to write the image-sections of an input ELF file to an output
file consecutively without gaps1.
$ cat cat_elf_image_sections.sh
#!/bin/bash
# cat_elf_image_sections.sh
# concatenate sections of ELF file that are non-0 size, type != `NOBITS` and flags inc. `A` into an output file
# $1 = input file
# $2 = output file
rm -f $2
tot_sz=0
idx=0
section_addr=0x0
objcopy_cmd="objcopy -O binary"
stats=($(readelf -WS $1 | awk '$3 != "NOBITS" && $8 ~ "A" { print $2 " " $6}; $4 != "NOBITS" && $9 ~ "A" { print $3 " " $7 }'))
for ((idx=0;idx< ${#stats[@]} ;idx+=2));
do
section=${stats[idx]}
sz_str=${stats[idx + 1]}
(( section_sz=16#$sz_str ))
printf "section %s : size %u bytes" $section $section_sz
if [[ $section_sz -gt 0 ]]; then
(( tot_sz+=section_sz ))
hex_section_addr=$(printf "0x%x" $section_addr)
printf " : output offset %u\n" $section_addr
objcopy_cmd+=" --change-section-address $section=$hex_section_addr"
(( section_addr+=section_sz ))
else
printf "\n"
fi
done
echo "Total size " $tot_sz " bytes"
objcopy_cmd+=" $1 $2"
$objcopy_cmd
Trying it:
$ ./cat_elf_image_sections.sh file.o file1.bin
section .text : size 0 bytes
section .data : size 0 bytes
section .data.rr : size 12 bytes : output offset 0
section .data.vv : size 16 bytes : output offset 12
section .text.aa : size 24 bytes : output offset 28
section .text.bb : size 84 bytes : output offset 52
section .text.cc : size 41 bytes : output offset 136
section .note.gnu.property : size 32 bytes : output offset 177
section .eh_frame : size 120 bytes : output offset 209
Total size 329 bytes
$ stat -c "%s" file1.bin
329
We calculated earlier that the aggregate size of the image-sections in file.o
is 329 bytes.
.data*
sections in file.o
is 28 bytes, but the linker on my system will merge them into a
32 byte output .data
section with 4 bytes of alignment padding.