I want to analyze a stream object in a PDF file which is encoded using /FlateDecode
.
Are there any tools which allow one to decode such encoding (ASCII85decode, LZWDecode, RunlenghtDecode etc.) used in PDFs?
The stream content is most likely a PE file structure, which the PDF probably will use later in the exploit.
Also, there are two xref
tables in the PDF, that is alright but also two %%EOF which follow the xref
.
Is the presence of these allright? (Note: The second xref
points to the 1st xref
using the /prev
name.
this xref
refers to the second xref
:
xref 5 6 0000000618 00000 n 0000000658 00000 n 0000000701 00000 n 0000000798 00000 n 0000045112 00000 n 0000045219 00000 n 1 1 0000045753 00000 n 3 1 0000045838 00000 n trailer > startxref 46090 %%EOF
the second xref
:
xref 0 5 0000000000 65535 f 0000000010 00000 n 0000000067 00000 n 0000000136 00000 n 0000000373 00000 n trailer > startxref 429 %%EOF
"Two xref
tables and two %%EOF
"?
This alone is not an indication of a malicious PDF file. There can by two or even more instances of each, if the file was generated via the "incremental update" feature. (Each digitally signed PDF file is like that, and each file which was changed in Acrobat and saved by using the 'Save' button/menu instead of the 'Save as...' button/menu is like that too.)
"How to decode a compressed PDF stream from a specific object"?
Have a look at Didier Stevens' Python script pdf-parser.py
. With this command line tool, you can dump the decoded stream of any PDF object into a file. Example command to dump the stream of PDF object number 13:
pdf-parser.py -o 13 -f -d obj13.dump my.pdf