javascriptasciifilereader

FileReader.readAsArrayBuffer handling of non-ASCII including £ (pound sterling)


I am using FileReader.readAsArrayBuffer(file) and converting the result into a Uint8Array.

If the text file input contains a pound sterling sign (£), then this single character results in two byte codes, one for  and one for £. I understand that this is because £ is in the extended-ASCII set.

Is there a way to prevent this extra character? If not, will it always be an Â? If so, I can strip them out.


Solution

  • You didn't provide your js code, But it seems this happen due to a mismatch between the character encoding of the text file and how you're js interpreting it. If i assume you're reading the file as text, maybe i was right in my thinking. I will just drop this playground to give you a reference and hoping you will solve your problem.

    function checkFile(file){
        const fileReader = new FileReader();
        fileReader.onload = function(event) {
            const uint8Array = new Uint8Array(event.target.result);
            
            // Use TextDecoder to convert Uint8Array into string 
            const textDecoder = new TextDecoder('utf-8', { fatal: true });
            try{
                const result = textDecoder.decode(uint8Array);
                console.log(result); // This should correctly display the pound sign and show £ without Â.
            }catch(error){
                console.error('Decoding was Failed:', error);
            }
        };
        fileReader.readAsArrayBuffer(file);
    }
    
    function uploadFile(){
        const file = event.target.files[0];
        if(file){
           checkFile(file);
        }
    }
    <input type="file" onchange="uploadFile()" />