I'm creating a download link to an item that is stored on AWS / S3.
As I am building out this link, I've confirmed that the data is encoded in UTF-8, but when the user goes to download it, they are hit with this error anytime the link contains anything other than ASCII encoding.
<Error>
<Code>InvalidArgument</Code>
<Message>Header value cannot be represented using ISO-8859-1.</Message>
<ArgumentName>response-content-disposition</ArgumentName>
<ArgumentValue>attachment;filename="今日も僕は 用もないのに.mp3"</ArgumentValue>
<RequestId>12345</RequestId>
<HostId>12345</HostId>
</Error>
//first attempt - whatever encoding it IS, convert it to utf-8
$encoded_file = mb_convert_encoding($original_filename, "UTF-8", mb_detect_encoding($original_filename));
//second attempt - force filename to use html entities
$encoded_file = mb_convert_encoding($original_filename,'HTML-ENTITIES','UTF-8');
$obj_data['ResponseContentDisposition'] = 'attachment;filename="' . $encoded_file . '"';
$cmd = $s3->getCommand('GetObject', $obj_data);
$presign_url_request = $s3->createPresignedRequest($cmd, AWS_PRESIGNED_URL_EXPIRATION);
Forcing the attachment;filename
to use htmlentities works - but it's really ugly. If I am converting the filename into UTF-8, why am I getting this error from AWS that the header value cannot use ISO-8859-1?
HTTP headers are forbidden from containing anything other than ISO-8859-1, and strings of any other incompatible encoding must be encoded in conformance to established specs.
In this case, it is RFC6266.
function rfc6266_encode($string, $encoding) {
$out = '';
for( $i=0,$l=strlen($string); $i<$l; ++$i ) {
$o = ord($string[$i]);
if( $o >= 127 ) {
$out .= sprintf('%%%02x', $o);
} else {
$out .= $string[$i];
}
}
return sprintf('%s"%s"', $encoding, $out);
}
var_dump(rfc6266_encode('今日も僕は 用もないのに.mp3', 'utf-8'));
Output:
string(111) "utf-8"%e4%bb%8a%e6%97%a5%e3%82%82%e5%83%95%e3%81%af %e7%94%a8%e3%82%82%e3%81%aa%e3%81%84%e3%81%ae%e3%81%ab.mp3""
And you would use it in your code like:
$obj_data['ResponseContentDisposition'] = 'attachment;filename="' . rfc6266_encode($original_filename, $original_filename_encoding) . '"';
That said, do not rely on mb_detect_encoding()
as it make a guess as to what the encoding might be. String encoding is metadata that must be captured alongside the data itself and preserved.