node.jsgithuboctokit-js

Encoding issue with reading markdown files on Github


When I load a markdown file from GitHub I am running into a lot of errors. I think I am not using the right encoding for GitHub files through Octokit. Any suggestions on how to fix my Buffer code in Node.js?

Is base64 and then to ascii correct for Github content? It works fine when loading it directly in my project without GitHub. I have a feeling GitHub stores their files in a different format but can't find docs on it.

const repos = await octokit.repos.getContents({
    owner: 'owner-hidden',
    repo: 'repo-hidden'
    path: '/dinner.md
});

// repo loads with data.content just fine
const bufferedData = Buffer.from(repos.data.content, 'base64').toString('ascii');
const ymlData = YAML.parse(bufferedData); ## issue with reading this

The error is below, but the error doesn't necessarily matter because it works when I load it directly in my project there are no errors.

YAMLException: the stream contains non-printable characters at line 36, column 126:
       ... auteLed spinach and ratatouille
                                           ^

Loading the markdown file directly in my project there was no errors:

const fs = require('fs');
const path2 = require('path');
const file = path2.resolve(__dirname, '/dinner.md');
const content = fs.readFileSync(file);

const bufferedData = Buffer.from(content).toString('ascii');
console.log({bufferedData});

Solution

  • As one of the members of Octokit replied to me on my Github issue, I don't need to encode with ascii, I should be using uft8 as shown here:

        - const bufferedData = Buffer.from(repos.data.content, 'base64').toString('ascii')
        - const bufferedData = Buffer.from(repos.data.content, 'base64').toString()
    

    buffer.toString() defaults to utf8 which is what I want.