perlcurlwww-mechanizenet-httptransfer-encoding

Bad chunk-size in HTTP response: Net/HTTP/Methods.pm line 542


Questions that pose a similar problem:

Issues with LWP when using HTTP/1.1: bad chunk-size, truncated responses.


I am using the Perl module WWW::Mechanize to scrape web sites. As far as I understand, WWW::Mechanize uses the Net::HTTP module to implement the HTTP protocol.

Here is the issue:

my $url = 'https://somewebsite.com/a/b/c?skey=svalue';
my $browser = WWW::Mechanize->new();
$browser->get($url);

When I execute the above snippet (assuming all imports are in place), I get an empty response content with the following error in response header inside the response object of WWW:Mechanize:

'x-died' = "Bad chunk-size in HTTP response: { at path/ to/perl/vendor/lib/Net/HTTP/Methods.pm line 542."

Notice the '{' in the exception message. I then tried to debug the Methods.pm module to see what was going on and it looks like the exception happens inside the read_entity_body subroutine.

I also did a curl for the url that I have and got the following response headers:

< HTTP/1.1 200 OK
< Set-Cookie: JSESSIONID=C61B57BA5DD0A05912C98CE1CFBAD435; Path=/; HttpOnly
< X-Frame-Options: DENY
< Transfer-Encoding: chunked
< Strict-Transport-Security: max-age=31536000 ; includeSubDomains
< Server: Apache-Coyote/1.1
< Cache-Control: no-cache, no-store, max-age=0, must-revalidate
< X-Content-Type-Options: nosniff
< Content-Disposition: attachment;filename=f.txt
< Pragma: no-cache
< Expires: 0
< X-XSS-Protection: 1; mode=block
< Date: Thu, 21 Sep 2017 18:31:27 GMT
< Content-Type: application/json;charset=UTF-8
< Transfer-Encoding: chunked

and with the following content:

{
  "total" : 1,
  "page" : 1,
  "records" : 1,
  "rows" : [ {
    "infoPostRptId" : 2,
    "mngPplId" : 1,
    "infoPostRptXsdId" : 1,
    "rptFmtCode" : "XML",
    "createUserId" : 5183202,
    "updateUserId" : 1,
    "statusId" : 309403,
    "seqNbr" : 0,
    "urlAnchor" : null,
  } ],
  "errors" : null
}
* Connection #0 to host xxxxxxx left intact

If I am not wrong, it looks like the content that came through from the website is not actually chunk encoded though the headers mention the transfer-encoding to be chunked.

More information regarding the Methods.pm module:

From what I understand, the read_entity_body subroutine tries to decode and combines the chunks to form the response content.

I think the problem is that the response headers have Transfer-Encoding: chunked but the content in fact is not chunked encoded.

Any help is highly appreciated. Thanks.

EDIT 1:

Versions:

WWW:Mechanize: 1.83, LWP:UserAgent: 6.15 and Net::HTTP: 6.12

EDIT 2:

Output of curl -s --raw -D - "https://....":

HTTP/1.1 200 OK
Set-Cookie: JSESSIONID=A29B1E0F561F1E4FBAF12583C0C2DE08; Path=/; HttpOnly
X-Frame-Options: DENY
Transfer-Encoding: chunked
Strict-Transport-Security: max-age=31536000 ; includeSubDomains
Server: Apache-Coyote/1.1
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
X-Content-Type-Options: nosniff
Content-Disposition: attachment;filename=f.txt
Pragma: no-cache
Expires: 0
X-XSS-Protection: 1; mode=block
Date: Fri, 22 Sep 2017 02:36:51 GMT
Content-Type: application/json;charset=UTF-8
Transfer-Encoding: chunked

45c
{
  "total" : 1,
  "page" : 1,
  "records" : 1,
  "rows" : [ {
        "infoPostRptId" : 2,
        "mngPplId" : 1,
        "infoPostRptXsdId" : 1,
        "rptFmtCode" : "XML",
        "createUserId" : 5183202,
        "updateUserId" : 1,
        "statusId" : 309403,
        "seqNbr" : 0,
        "urlAnchor" : null,
  } ],
  "errors" : null
}
0

Like the previous JSON content, I have removed/altered some values just to anonymize data.

EDIT 3: This is what I get when I execute the following command:

 perl -MLWP::UserAgent -e'print LWP::UserAgent->new->get($ARGV[0])->as_string' 'https://......'

  HTTP/1.1 200 OK
  Cache-Control: no-cache, no-store, max-age=0, must-revalidate
  Connection: close
  Date: Fri, 22 Sep 2017 04:15:06 GMT
  Pragma: no-cache
  Server: Apache-Coyote/1.1
  Content-Type: application/json;charset=UTF-8
  Expires: 0
  Client-Aborted: die
  Client-Date: Fri, 22 Sep 2017 04:15:06 GMT
  Client-Peer: 67.221.172.5:443
  Client-Response-Num: 1
  Client-SSL-Cert-Issuer: /C=US/ST=Arizona/L=Scottsdale/O=GoDaddy.com, Inc./OU=http://certs.godaddy.com/repository//CN=Go Daddy Secure Certificate Authority - G2
  Client-SSL-Cert-Subject: /OU=Domain Control Validated/CN=*.trellisenergy.com
  Client-SSL-Cipher: ECDHE-RSA-AES128-SHA256
  Client-SSL-Socket-Class: IO::Socket::SSL
  Client-Transfer-Encoding: chunked
  Content-Disposition: attachment;filename=f.txt
  Set-Cookie: JSESSIONID=5CAC35648DBBE25E3229DE9BF21C3794; Path=/; HttpOnly
  Strict-Transport-Security: max-age=31536000 ; includeSubDomains
  X-Content-Type-Options: nosniff
  X-Died: Bad chunk-size in HTTP response: { at /usr/local/share/perl5/Net/HTTP/Methods.pm line 544.
  X-Frame-Options: DENY
  X-XSS-Protection: 1; mode=block

EDIT 4: TCP Dump:

Did the following command in one terminal window:

perl -MLWP::UserAgent -e'print LWP::UserAgent->new->get($ARGV[0])->as_string' 'https://vgs.trellisenergy.com/ptms/public/infopost/getInfoPostRpts.do?tspId=1&proxyTspId=1&rptId=2&downloadInd=0&searchInd=0&showLatestInd=0&cycleId=10303&startDate=09/20/2017&endDate=09/20/2017&_search=false&nd=1505846852955&rows=10&page=1&sidx=&sord=asc&_=1505846826289'

And the following in another:

tcpdump -w tcpdump.pcap -A -s0 -e -n -vvv -i eth0 host vgs.trellisenergy.com

Pretty print tcpdump using:

tcpick -C -yP -r tcpdump.pcap

TCP Dump:

Starting tcpick 0.2.1 at 2017-09-22 10:24 MDT
Timeout for connections is 600
tcpick: reading from tcpdump.pcap
1      SYN-SENT       10.1.1.10:24876 > 67.221.172.5:https
1      SYN-RECEIVED   10.1.1.10:24876 > 67.221.172.5:https
1      ESTABLISHED    10.1.1.10:24876 > 67.221.172.5:https
...........Y.8..*m.i.'ZZP*....1...d
.._.$.^....0.,.(.$...
.....k.j.9.8.....2...*.&.......=.5.../.+.'.#... .....g.@.3.2.....E.D.1.-.).%.......<./...A.........
..................._.........vgs.trellisenergy.com.........
. .....................................
.....0..1.0.......U....US1.0...U....Arizona1.0...U...............>.s].s.a^.
Scottsdale1.0...U.
..........0..0A1!0...U....Domain Control Validated1.0...U....*.trellisenergy.com0.."0 Secure Certificate Authority - G20..
h@s0.*$.H.4./..E8.m.V......'!..f...!tY'.(..`......... ...E.)Tz..z2.%..KEi....Dd.....s....JW_.Y  ..8..6..Y ........i.r............"...a.
LI1V    6t....C.....20uB'..#:...n..(-...(..P..M..O...p.3L.].@A.........0...0...U.......0.0...U.%..0...+.........+.......0...U...........07..U...00.0,.*.(.&http://crl.godaddy.com/gdig2s1-337.crl0]..U. .V0T0H..`.H...m....0907..+........+http://certificates.godaddy.com/repository/0...g.....0v..+........j0h0$..+.....0...http://ocsp.godaddy.com/0@..+.....0..4http://certificates.godaddy.com/repository/gdig2.crt0...U.#..0...@..'..4.0.3..l...,..01..U...*0(..*.trellisenergy.com............z...;^..'.@.l..,Cj...N.LY.S.......~p...k.. ...Y..S}.\}o.......(.
.....H..SG.D.vy}...qM(.0LT.C.....R.......y...   Y.....wz.s4..Q.t...u...].8.|..q..+.>5...?..`z.X2. .{.%..[ 7.. r...y.yjY..h]...0I.$..x,O....h......n.b.....c.<.....X.Gi.P.vTM.d.B.
.....0..1.0...a...U....US1.0...U....Arizona1.0...U...
Scottsdale1.0...U.
310503070000Z0..1.0110/...U....US1.0...U....Arizona1.0...U...rity - G20..
Scottsdale1.0...U.
..........0.., Inc.1-0+..U...$http://certs.godaddy.com/repository/1301..U...*Go Daddy Secure Certificate Authority - G20.."0
...........v...b.0d...l...b../.>e...b.<R...EKU.xkc.b...il.....L.E3......+..a.yW....?0<]G.....7.AQ..KT.(.....08...&.fGcm.q&G.8GS.F......E...q..o....0:yO_LG...[...`;..C...3N...'O.%........t.dW..DU.-*:>....2
..d..:P.J..y3.. .....9.i.lcR.w...t.....PT5KiN.;.I.....R..........0...0...U.......0....0...U...........0...U......@..'..4.0.3..l...,..0...U.#..0...:....g(.....An .....04..+........(0&0$..+.....0...http://ocsp.godaddy.com/05..U....0,0*.(.&.......`..r.s$..."....bXD...%......b.Q...Q*...s.v.6....,....*...Mu..?.A.#}[K...X.F..``..}PA......../..T.D..}.C.D..p
...3..-v6&.....a....o.F.(..&}
.....0..1.0.......U....US1.0...U....Arizona1.0...U...
Scottsdale1.0...U.
09GoDaddy.com, Inc.110/..U...(Go Daddy Root Certificate Authority - G20..
371231235959Z0..1.0     ..U....US1.0...U....Arizona1.0...U...
Scottsdale1.0...U.
..........0.., Inc.110/..U...(Go Daddy Root Certificate Authority - G20.."0
..f"..im6.......`.8......F.. C.;....I.'....N...p..2...>.N...O/Y0"...Vk......u.9Q{..5.tN......?........j..............;F|2
>.]|.|..+S..biQ%.a.D..,.C.#..:...)....]....0
............]y...Yg.a.~;.1u-. .Oe......../..Z..t.s.8B..{..u...........S.~.F.....+....'....Z.7....l....=.$Oy.5._.......-.......s@.r%......h..W...:       ..D...7...2..8..d.,~........h..".8-z..T.i._3.z={
.8.. 'e...]p-..N.(F...6.....(....k.Q......8k...v...v...(...=!.:...;.L.....K./.....D....xH .Zi.<!.}i. t.c.!yWY..c.I......?.._.e......"...v.'8Qq.d].......O(8._M....%........]:LU....]l.  .....
............iA...~....C5...k.43... .F6. .\!....X......bJ.e..@.....[.uO.&..-....7.O. .......g2..R.b....H7.........G.....%u1.....8$.u..O....za..T..........P...V2.;.......j.L.Px;..-....&.......H...yQ,n.s..<KFx#...2..K.G..n4OG{N.5.6../...
......
....PU.T....A.d...*.iw..        c.Wjm.V\. ..vP.Z%......v...k......l...b7.|.u..c.=:....$.3K..
........v.{u...`..+.qU. .'.t.g....V......1..P.g..aO....nY..C..F...4x.d...Y....|3..Pz;.K.~]...H..;...PIR..hRv...)].=?.:..[...h...A.. /4..d.......C`....]LZK.Y..q......Q.L.R..D&...l..t..I.j2....8...y.L..).y.n..).u|..'.....z ..,Yg..md."i.......M.74x...3..N.b.6..tm.).u...|-.xK.9R..M,......!....}..[=B.J......     ...~Gx.8p.5.UQ........sJ
...w..Xf.#^..,..G.w.f4.V..'..Bb_..*e.i......P1.
U6!.l..%...ts. u!c5.0>.!.2J.G)p.W.........dF*5.....5..M.        .....G+.....I..vG&..>.}(....E.  ...9...N.i..Jm&b...G...3Wo#k.........e:..p........:w....V.L'9.-..)......d.P_....#..iide@.2..E>.?|..:....B.,mr...N.JAS1]:...O.......i..c..T.pZZ)..E."\b.r2HA..r!....L........K....~1.....x!.Gp.K..G..D*s.u....WN.?..(+..rU..g?d.....eG.L.^...*..a...]/...N0.gX..;...T...%...;.P?.O4{.i.....%.T.|..
...U..Ug......d...a3:$...p...v..t."...
.......%..J`E....5....n..M....>...ge.r.,...s..,..       k..R.N._>3}...=.0...........T.d..       ...u 7?T...3b.?.lr...8o.Gk.}xkBY[...l..^.-.Wt}..G/..l.f..z..^F.A.G.i8l4.....#.a.....BS.c.Q7..=y...{ELUP.R..c.{...a9.u3..-@F.H..M..2.o.j@.pI..S....R  ..vx.u.<-x..".T.d-...:...>......n..Z|..?Dz@N..?...#.../.....2.Z..y..Ej..........Q.....'8.....nC..7.....)e..7r..[..H...R.....h...x7G.+.......eBErwo.r....,..e*.8O..oQ. `O.@.J#...5).9.....!d.u....,...pV..oS...%.o..F..G.7....I...N...s .G..G@.".w6d......R..j
..........G.D..l....0..EH.Y..4.e.\#~s.i.-WKoyK...w.'.o.X-.,x.......4......T.*.>#..
..G(wP.V.i...F.U...t...-.\.!...Y4,...._............7..|<DM3.&u.%.0..G.......9....
.....Y......55ZW..X......Tz..D...r.6$..B...Wv..R..8.."../dL..-...i^o..>:..O...s.W.).i....gOH...@.....8k.......Q........#.....#.R..^.....f.......x^X....^S.R..u.7.._..T]A'/4>k\..Lg....H...J....o>.2 ......$.......PP..#..=.E..;2..>k...`...9..>*.....N...4........(...a....n....)w.I.@O+.(.cV..g.....%G..^.Z#.'EG...]..$_...!e...%.;VG.7.5.&...C........s4..1....t[
1      FIN-WAIT-1     10.1.1.10:24876 > 67.221.172.5:https
1      TIME-WAIT      10.1.1.10:24876 > 67.221.172.5:https
1      CLOSED         10.1.1.10:24876 > 67.221.172.5:https
tcpick: done reading from tcpdump.pcap

22 packets captured
1 tcp sessions detected

Solution

  • That's a bug in the server or (more likely) a bug in the application running on the server. If one is sending the following request:

    GET /some-path HTTP/1.1
    Host: some-host
    

    The server is responding with a correct chunked response. Interestingly the Transfer-Encoding: chunked header is sent twice - one at the beginning of the HTTP header and one at the end:

    HTTP/1.1 200 OK
    Set-Cookie: ...
    X-Frame-Options: DENY
    Transfer-Encoding: chunked
    ...
    Content-Type: application/json;charset=UTF-8
    Transfer-Encoding: chunked
    
    45c
    {
    

    Now, when sending a slightly changed request with an added Connection: close header the response looks different:

    GET /some-path HTTP/1.1
    Host: some-host
    Connection: close
    
    ----
    
    HTTP/1.1 200 OK
    Set-Cookie: ...
    X-Frame-Options: DENY
    Transfer-Encoding: chunked
    ...
    Content-Type: application/json;charset=UTF-8
    
    {
    

    The leading Transfer-Encoding: chunked is still there but the last one is no longer there. And the response body is not chunked anymore, even though there is still a Transfer-Encoding: chunked in the response header! .

    This is whats is happening with LWP contrary to curl: LWP is sending a Connection: TE, close header while curl is not sending a Connection header. This means LWP is getting the broken response and is complaining correctly while curl does not get the broken response and thus has no reason to complain. But, if you explicitly add a Connection: close header to curl it will run into the same problem:

     $ curl -H 'Connection:close' https://...
     curl: (56) Illegal or missing hexadecimal sequence in chunked-encoding
    

    Further tests show that the leading Transfer-Encoding: chunked header is also sent if the client is doing a HTTP/1.0 request! This should not happen at all because chunked is only defined with HTTP/1.1.

    This suggests that some part of the web application running on the server and not the web server itself is issuing the first Transfer-Encoding: chunked header. Thus, if you have access to the application or to the developer of the application you should fix it there.