pdftagsabcpdf

How to remove tags from a pdf


I have this pdf where there are tags like this:

62 0 obj
<< /Type /StructElem /S /DokumentNavn /P 56 0 R /K 2 /Pg 58 0 R >>
endobj
60 0 obj
<< /Type /StructElem /S /Bundtekst /P 56 0 R /K 0 /Pg 58 0 R >>
endobj
61 0 obj
<< /Type /StructElem /S /ReferenceLinjer /P 56 0 R /Lang (da) /K 1 /Pg 58 0 R >>
endobj
68 0 obj
<< /Type /StructElem /S /Fritekst /P 56 0 R /K 6 /Pg 58 0 R >>
endobj

I have "removed" them by overwriting them with % However the tool that checks against a whitelist still complains So I'm thinking that maybe the tags are also used in the binary sections of the pdf. Can abcpdf remove tags or is there another solution?


Solution

  • Docotic.Pdf library can remove structure information from PDF documents.

    Below is a sample code for the task:

    public static void saveWithoutStructureInformation(string input, string output)
    {
        using (PdfDocument document = new PdfDocument(input))
        {
            document.RemoveStructureInformation();
    
            document.SaveOptions.RemoveUnusedObjects = true;
            document.Save(output);
        }
    }
    

    Disclaimer: I work for the vendor of the library.