Here is my full scenario: I have a static site hosted on S3. Its just an interface for uploading an file. The upload is done using API Gateway as an endpoint, that triggers an lambda function.
This lambda function will read the XLSX file, process it and upload the json to an S3 Bucket. The PUT event on the bucket, triggers another lambda function, which will read this json and send as messages to an SQS Queue. Then, as soon an message arrives it spins up an EC2 instance for processing.
The problem is: I don't know what is going on, but my output is getting totally scrambled. It is something like nothing is decoding the base64 after receiving it.
Here is an example of my output:
The API Gateway has been setup correctly, i'm pretty sure about that:
When testing sending a file trough Postman, it works without problems. Based on this i suppose the API has been setup fine and also the function, its something happening between the browser and the lambda function \ api call.
This is what my log on the lambda shows:
2024-09-03T22:13:31.337Z 2024-09-03T22:13:31.337Z a107932e-0bde-4b4c-ad8d-478625019503 INFO Processed data: [ [ { "numCpfCnpj": "Content-Disposition: form-data", "numAcordo": " name="file"" }
], null, ]....lots of binary data
This part specially:
"numCpfCnpj": "Content-Disposition: form-data", "numAcordo": " name=\"file\"" }
Got my attention, because it looks like something is wrong with the decoding, since those headers are present where there should be data.
This is the code i'm using to Read the XLSX file:
import XLSX from 'xlsx';
import { S3Client, PutObjectCommand } from '@aws-sdk/client-s3';
const s3Client = new S3Client({ region: process.env.AWS_REGION });
const processXlsxFile = (fileContent) => { // Corrected arrow function syntax and added closing brace
const workbook = XLSX.read(fileContent, { type: 'buffer' });
const sheet = workbook.Sheets[workbook.SheetNames[0]];
// Convert all rows to JSON, treating every cell as text to preserve leading zeros
const data = XLSX.utils.sheet_to_json(sheet, {
header: 1, // Use the first row as the header
raw: false // Ensure that all cells are treated as strings
}).slice(1); // Skip the header row
// Process the data and pad CPF/CNPJ numbers as needed
return data.reduce((acc, row, index) => { // Corrected arrow function syntax
// Skip empty or invalid rows
if (!row[0] || !row[1]) return acc;
// Determine if the number is CPF (11 digits) or CNPJ (14 digits)
const numCpfCnpj = String(row[0]).length <= 11
? String(row[0]).padStart(11, '0') // Pad CPF to 11 digits
: String(row[0]).padStart(14, '0'); // Pad CNPJ to 14 digits
const groupIndex = Math.floor(index / 2);
if (!acc[groupIndex]) acc[groupIndex] = [];
acc[groupIndex].push({
numCpfCnpj,
numAcordo: String(row[1])
});
return acc;
}, []);
};
export const handler = async (event) => { // Corrected arrow function syntax and added closing brace
try {
console.log("Received event:", JSON.stringify(event, null, 2));
if (event.body) {
// Decode the base64-encoded file content
const fileContent = Buffer.from(event.body, 'base64');
console.log("File content received");
// Process the Excel file content
const processedData = processXlsxFile(fileContent);
console.log("Processed data:", JSON.stringify(processedData, null, 2));
// Flatten the processed data to a single array
const flatData = processedData.flat();
console.log("Flattened data:", JSON.stringify(flatData, null, 2));
// Convert the result to a JSON string
const jsonString = JSON.stringify(flatData);
const jsonFileName = `converted_data_${Date.now()}.json`;
const bucketName = process.env.BUCKET_NAME;
// Upload the JSON to S3
const putObjectParams = {
Bucket: bucketName,
Key: jsonFileName,
Body: jsonString,
ContentType: 'application/json'
};
const command = new PutObjectCommand(putObjectParams);
await s3Client.send(command);
console.log(`JSON file uploaded successfully to S3: ${jsonFileName}`);
return {
statusCode: 200,
body: JSON.stringify({ message: 'File processed and JSON uploaded to S3 successfully.' }),
headers: {
'Content-Type': 'application/json'
}
};
} else {
return {
statusCode: 400,
body: JSON.stringify({ message: 'No file uploaded' }),
headers: {
'Content-Type': 'application/json'
}
};
}
} catch (error) {
console.error("Error processing file:", error);
return {
statusCode: 500,
body: JSON.stringify({
message: 'Internal Server Error',
error: error.message
}),
headers: {
'Content-Type': 'application/json'
}
};
}
};
Any input is appreciated.
Also, if anoyone got any questions, i'll be glad to reply as well.
Edit: HTML Code
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>File Processor</title>
<link rel="stylesheet" href="styles/styles.css"> <!-- Link to external CSS file -->
</head>
<body>
<div class="container">
<h1>Process XLSX File</h1>
<input type="text" id="txtInput" readonly placeholder="No file chosen">
<input type="file" id="fileInput" style="display:none" accept=".xlsx">
<button id="btnBrowse">Procurar arquivo</button>
<button id="btnProcess">Processar</button>
<div id="results"></div>
</div>
<script src="scripts/main.js"></script> <!-- Link to external JavaScript file -->
</body>
</html>
Js Code:
document.addEventListener('DOMContentLoaded', () => {
const fileInput = document.getElementById('fileInput');
const txtInput = document.getElementById('txtInput');
const btnBrowse = document.getElementById('btnBrowse');
const btnProcess = document.getElementById('btnProcess');
const resultsDiv = document.getElementById('results');
btnBrowse.addEventListener('click', () => {
fileInput.click();
});
fileInput.addEventListener('change', () => {
txtInput.value = fileInput.files[0] ? fileInput.files[0].name : 'No file chosen';
});
btnProcess.addEventListener('click', async () => {
if (!fileInput.files.length) {
alert('Please select a file first.');
return;
}
const formData = new FormData();
formData.append('file', fileInput.files[0]);
try {
const response = await fetch('https://5lvbhojaaf.execute-api.sa-east-1.amazonaws.com/conversion/readFile', {
method: 'POST',
body: formData
});
if (!response.ok) {
throw new Error(`HTTP error! status: ${response.status}`);
}
const result = await response.json();
// Format JSON output for better readability
resultsDiv.innerHTML = `<pre>${JSON.stringify(result, null, 2)}</pre>`;
} catch (error) {
console.error('Error processing file:', error);
resultsDiv.innerHTML = '<p>Error processing file. Check the console for details.</p>';
}
});
});
Postman endpoint: https://5lvbhojaaf.execute-api.sa-east-1.amazonaws.com/conversion/readFile
The weird characters appears after the function execution. On the CloudWatch logs i can see the data being received as base64, and after the function execution it becomes scrambled.
At the client (html/js) you are not encoding the file as base64, so at the lambda layer, this doesn't make sense:
const fileContent = Buffer.from(event.body, 'base64');
Maybe using the postman, you are sending the file as base64, but in your html/js you are not converting the file to base64. You are sending it as Content-Type: multipart/form-data
(1) Get the content from multipart/form-data is not easy compared when the content type is Content-Type: application/json
(2) Usually is sent the file name and file content (binary)
(3) Usually this content type has sections like, size, disposition, content, etc
Almost no body deal directly with this (in nodejs), that's why libraries in all languages help the final developer. For example multer library makes our live easy:
app.post('/upload', upload.single('file'), function(req, res) {
const title = req.body.title;
const file = req.file;
//file is ready to use (pdf, xls, zip, images, etc)
References
event.body
doesn't have the file ready to use. That's is another of your errors
A quick research, didn't get me some "easy" ways to deal with multipart/form-data using aws lambdas
Reviewing some libraries, I found the magic to get a file from lambda event variable:
module.exports.parse = (event, spotText) => {
const boundary = getBoundary(event);
const result = {};
event.body
.split(boundary)
.forEach(item => {
if (/filename=".+"/g.test(item)) {
result[item.match(/name=".+";/g)[0].slice(6, -2)] = {
type: 'file',
filename: item.match(/filename=".+"/g)[0].slice(10, -1),
contentType: item.match(/Content-Type:\s.+/g)[0].slice(14),
content: spotText? Buffer.from(item.slice(item.search(/Content-Type:\s.+/g) + item.match(/Content-Type:\s.+/g)[0].length + 4, -4), 'binary'):
item.slice(item.search(/Content-Type:\s.+/g) + item.match(/Content-Type:\s.+/g)[0].length + 4, -4),
};
} else if (/name=".+"/g.test(item)){
result[item.match(/name=".+"/g)[0].slice(6, -1)] = item.slice(item.search(/name=".+"/g) + item.match(/name=".+"/g)[0].length + 4, -4);
}
});
return result;
};
This is already implemented in these libraries:
Choose one and try it. I'm sure that the file will be ready to be used in your
const workbook = XLSX.read(file, { type: 'buffer' });