When I convert text documents to pdfs in Python using the FPDF2 library it breaks apart urls that span multiple lines by adding a space or a newline (I'm not sure which one).
I use the following code to convert a text file to a pdf file using FPDF2
library:
import glob
from fpdf import FPDF
#txt files
txt_files = glob.glob(path + r'\*.txt')
for txt_file in txt_files:
pdf=FPDF()
doc=[]
with open (txt_file, 'r', encoding='utf-8') as infile:
print(txt_file)
doc = infile.read()
pdf.add_page()
pdf.add_font("dejavu-sans", style="", fname="DejaVuSans.ttf")
pdf.set_font(family="dejavu-sans", style="", size=12)
pdf.write(5, doc)
pdf.output(txt_file[:-4]+'.pdf')
My input text file looks like:
My generated pdf looks like this:
I use the refextract
Python library and get this extracted reference:
At first I thought this was an issue with the refextract library but when I select the hyperlink in the pdf file it looks like the probelm is with FPDF2 breaking apart the url (hovering over the url in adobe only shows the partial address too):
Does anyone know how to overcome this so that nothing is inserted midway through a url when converting text files to pdfs using FPDF2?
P.S. Sorry, I don't have enough reputation to post the images within the post (i.e. not via links - it's not that I don't know how).
If you want to render text without performing any wrapping, you can use FPDF.text():
from fpdf import FPDF
pdf = FPDF()
pdf.add_page()
pdf.set_font("Helvetica", size=14)
pdf.text(pdf.x, pdf.y, "https://stackoverflow.com/questions/77930540/fpdf2-breaking-url-links-that-span-multiple-lines")
pdf.output("fpdf2-breaking-url-links-that-span-multiple-lines.pdf")