I know that is currently possible to download objects by byte range in Google Cloud Storage buckets.
const options = {
destination: destFileName,
start: startByte,
end: endByte,
};
await storage.bucket(bucketName).file(fileName).download(options);
However, I would need to read by line as the files I deal with are *.csv
:
await storage
.bucket(bucketName)
.file(fileName)
.download({ destination: '', lineStart: number, lineEnd: number });
I couldn't find any API for it, could anyone advise on how to achieve the desired behaviour?
You could not read a file line by line directly from Cloud Storage, as it stores them as objects , as shown on this answer:
The string you read from Google Storage is a string representation of a multipart form. It contains not only the uploaded file contents but also some metadata.
To read the file line by line as desired, I suggest loading it onto a variable and then parse the variable as needed. You could use the sample code provided on this answer:
const { Storage } = require("@google-cloud/storage");
const storage = new Storage();
//Read file from Storage
var downloadedFile = storage
.bucket(bucketName)
.file(fileName)
.createReadStream();
// Concat Data
let fileBuffer = "";
downloadedFile
.on("data", function (data) {
fileBuffer += data;
})
.on("end", function () {
// CSV file data
//console.log(fileBuffer);
//Parse data using new line character as delimiter
var rows;
Papa.parse(fileBuffer, {
header: false,
delimiter: "\n",
complete: function (results) {
// Shows the parsed data on console
console.log("Finished:", results.data);
rows = results.data;
},
});
To parse the data, you could use a library like PapaParse as shown on this tutorial.