The following codes gives me a nearly black image
import fitz
import cv2 #opencv for preprocessing of image
import numpy as np
filename = 'sample_pdf\muscle model\Invoice of muscle model from HealthLink.pdf'
doc = fitz.Document(filename) #same as fitz.open
page1 = doc[0]
pix = page1.get_pixmap(colorspace='RGB', annots=False)
bytes = np.frombuffer(pix.samples, dtype=np.int8)
img_rgb = bytes.reshape(pix.height, pix.width, pix.n)
img_bgr = img_rgb[:,:,::-1]#convert from rgb to bgr for cv2.imwrite
cv2.imwrite('test.png', img_bgr)
doc.close()
I know there could be other functions that can turn a page in PDF to an image. But i still wonder why the above codes fails.
As pointed out in the comments by Dan Mašek the cause of the issue is the wrong datatype passed to the numpy.frombuffer
function: instead of int8
it should be uint8
.
Intensity of the image's pixels are, normally,
With int8
positivity is not grant and data seems to be centred around the origin, from -127 to +128.