For the interest of this station, I reckon a PDF papers malformed when it makes n't detect the basic structure of a PDF document
I 've seen a few malicious, malformed PDF papers. The most recent was a malicious swine influenza PDF papers that incorporates another, bening, PDF papers with info about the swine influenza ( obtained from the CDC site ). This 2nd PDF papers is exhibited to misguide the user while the feat runs.

This 2nd PDF papers is XOR-encoded and affixed to the terminal of the malicious PDF papers, doing the malicious PDF papers malformed ( FYI: the PDF file formatting supports embedded files, but this was n't applied here ). A PDF reader like Adobe or Foxit holds no jobs opening this deformed PDF, because it scans a PDF papers for the laggard ( % % EOF ) commencing from the terminal of the papers. Everything that follows this lagger and makes n't adhere to the PDF syntax is but snubbed.

I 've added some new features to my PDF tools
to care malformed PDF papers.
PDFiD
The new version of PDFiD
holds an Surplus option. Like it calls imply, employ it to add redundant analysis data to the PDFiD study. The excess option adds entropy
computations to the study:

For a normal PDF file, anticipate the entire information and the information of bytes inside watercourse objects to be or so the maximum value8.0. This intends that the distribution of byte values is roughly random, which is characteristic of compact and inscribed data.
Outside watercourses objects, the data seems much less random, and the information is much lower, usually around4.0 or5.0.
Stillly, for malformed PDF papers, where data is added without utilizing watercourse objects, the entropy outside watercourse objects is much higher. Here is the study for the malicious swine influenza PDF:

Another datum added to the study by utilizing the Surplus option is for the end-of-file marker % % EOF.
The `` % % EOF '' line references the figure of times % % EOF looks in the papers ( to a higher degree once usually points incremental updates
`` After last % % EOF '' numbers the figure of bytes after the last % % EOF. This value will be not be zero when data holds been affixed.
pdf-parser
The old versions of pdf-parser
output very much of `` todo 10 data ( an indicant of misshapen PDF data ) when they parse a distorted PDF papers. I 've suppresed this doings, you 'll necessitate to utilize option Long-winded to enable it from now on, should you involve it. Since I firstly apply PDFiD to check a PDF papers before applying pdf-parser, I make n't view the `` todo '' output relevant anymore, as PDFiDs information and % % EOF study will say me if a PDF papers is malformed.

But the other new option in pdf-parser, -extract, is more important. Representative:
pdf-parser.py -extractpayload.binmalformed.pdf
This option will pull all malformed data frommalformed.pdf and indite it to registerpayload.bin, giving you easy admission to the embedded load.
Samples
You can download a normal and malformed Hullo Cosmos PDF file here
to get adjusted with my updated tools. 4096 random bytes hold been affixed to the terminal of the PDF papers to do it malformed.
Here is a last representative when the entropy computing can be handy even if the load is stored inside a watercourse object:

The ground the full information and information of bytes inside watercourse objects is really low here, is that this malicious PDF papers holds a warhead with a really long, uncompressed NOP-sled ( to a higher degree one million times 090 ).