pdf.jsresponse-headerspdfjs-dist

PDF.js unable to get range/streaming PDF's working


We've got an application going where we use PDF's, but because of the size of PDF's used by customers we're running into issues with caching. Decided to check out streaming/using range requests to download PDF's as we go.

Here's what I'm seeing:

accept-ranges: bytes
access-control-allow-credentials: true
access-control-allow-headers: Authorization, Content-Type, body, Content-Length, Accept-Ranges, Range
access-control-allow-methods: GET,POST,PUT,DELETE
access-control-allow-origin: http://example.test
access-control-max-age: 1000
cache-control: max-age=31536000
content-length: 185124353
content-type: application/pdf
date: Thu, 05 Dec 2019 14:03:42 GMT
etag: "some-etag-that-works-nicely"

There's a lot of CORS because I'm running this locally now, before I even consider pushing this up to the dev environment. I think we've added all the required headers to make the PDF.js detect that we support range calls, but it doesn't seem to work properly.

When I dive into the PDFJS-dist/build/pdf.js file on line 23744 (v2.3.200) I see this:

if (getResponseHeader('Accept-Ranges') !== 'bytes') {
 return returnValues;
}

Which made me think; maybe this getResponseHeader() thing is case-sensitive, and for some reason I cannot get the API to respond its headers in the neat mixed-case we're used to. So I decided to hack it a bit and make its returnValues return allowRangeRequests = true.

This works sort off, as then I see a 200 OK with the same headers as above (after the OPTIONS when working locally), which should be cancelled but isn't, followed by a bunch of new calls with 206 PARTIAL with incremental range: byte=0-65000 etc. headers looking like this:

REQUEST
range: bytes=0-65535
//...and other headers of course, omitted for brevity.
RESPONSE
accept-ranges: bytes
access-control-allow-credentials: true
access-control-allow-headers: Authorization, Content-Type, body, Content-Length, Accept-Ranges, Range
access-control-allow-methods: GET,POST,PUT,DELETE
access-control-max-age: 1000
cache-control: max-age=31536000
content-length: 65536
content-type: application/pdf

And so on, this also gives me an actual working PDF (or at least a few pages) in the view; so that suggests it at least partially works.

Now why do I need to "hack" this, what headers am I missing for PDF.js to detect that we actually do support ranges as it seems to be implemented correctly? And is this also the cause for why it won't cancel the initial fetch without the range: bytes=0-65535 because of another part of "range support detection"?


Solution

  • We've gotten it to work, it seems the PDFjs internal implementation is quite picky about the headers. When you use it in conjunction with CORS (so you have an OPTIONS call first), it seems to not pickup the correct headers at all. This may be a bug but I haven't taken the time to investigate and make sure it's something we should report.

    Secondly, the HTTP2 SPDY protocol makes all the headers lowercase, and it seems the internal implementation that PDFJS relies on is picky about Case-Sensitive headers. When we disable HTTP2 SPDY and try again without CORS, we got it to work with no issues.