I need to scan pdf document, extract some metadata from annotations, get it svg representation, and save it to database. I am using PDFTron and .NET for pdf processing.
During my research, I have found two ways to do it:
fdf
data from initial document. Lets name it in_pdf
fdf
doc. So I can get pdf only with annotations. Lets name it temp_pdf
temp_pdf
to svg.in_pdf
, and try to find corresponding svg tag for every annotation. But I do not know, how to find corresponding tagThe second way:
fdf
data from initial document for every annnotaion. In fact, make separate fdf
for every annotaion.temp_pdf
. In fact, make separate pdf
for every annotation.temp_pdf
to svg. Using this way, gives me mapping between each annnotation and its svg string. But causes creation many temporary documents.All the stuff would be much simplier, if I have some instrument to convert each annotation in svg directly, not the whole document. Is there a way to do it, using PDFTron?
You can export the appearance of annotations to a PDF page, and then you can convert that page to SVG.
This forum post shows how to render a specific annotation to an image. https://groups.google.com/d/msg/pdfnet-sdk/s8eeLmyNuGc/b_0gA02He3IJ
To customize that code to your use case, the following should work great. For SVG generation you can do the following.
Page temp_page = doc.PageCreate();
temp_page.AnnotPushBack(annot);
annot.Flatten(temp_page); // move annotation content stream into page content stream, and remove the annotation
temp_page.SetMediaBox(temp_page.GetVisibleContentBox())
Convert.ToSvg(temp_page, "out_path", svg_options);
From here you can use standard XML tools to merge this SVG content to your target SVG file.
To position the annotation, and size, the annotation, you would call
annot.GetRect()
The x1,y1 values give you the bottom left, corner, and x2,y2 gives you top right corner.
The generated SVG output has the same scale as the PDF, so you can use the values as is.