I'm trying to upload an image to AWS Bedrock as part of a conversation using the ConverseCommand (in @aws-sdk/client-bedrock-runtime v3.686). According to the documentation, it requires the bytes as a Uint8Array, which I think is the same as a buffer. However, I have tried many permutations and combinations with .buffer
and new TextEncoder().encode(imageBuffer)
and it always says something like:
I'm sorry, but you haven't actually shared an image with me yet. Could you please upload an image and I'll take a look at it?
It feels like this should be trivial, but I haven't been able to find any working code using the @aws-sdk/client-bedrock-runtime that includes an image or document as part of the messages. Any ideas?
const fs = require("fs");
const {
BedrockRuntimeClient,
ConverseCommand,
} = require("@aws-sdk/client-bedrock-runtime");
const modelId = "anthropic.claude-3-haiku-20240307-v1:0";
let conversation = [];
const askQuestion = async () => {
const client = new BedrockRuntimeClient({
region: "us-east-1",
credentials: {
accessKeyId: process.env.AWS_ACCESS_KEY_ID,
secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY,
},
});
let content = {};
const imageBuffer = fs.readFileSync("./butterfly.png");
content.image = {
format: "png",
source: {
bytes: imageBuffer.buffer,
},
};
content.text = "What's in this image?";
conversation.push({
role: "user",
content: [content],
});
try {
const response = await client.send(
new ConverseCommand({
modelId,
messages: conversation,
})
);
conversation.push(response.output.message);
console.log(response.output.message.content[0].text);
return response.output.message.content[0].text;
} catch (err) {
console.log(`ERROR: Can't invoke '${modelId}'. Reason: ${err}`);
}
};
askQuestion();
Answer from Oluwafemi Sule was correct, but I needed to change the way I sent the image as well. This worked:
const imageBuffer = new Uint8Array(fs.readFileSync("./butterfly.png").buffer);
content.push({
image: {
format: "png",
source: {
bytes: imageBuffer,
},
},
});
content.push({ text: "What's in this image?" });
conversation.push({
role: "user",
content,
});
try {
const response = await client.send(
new ConverseCommand({
modelId,
messages: conversation,
})
);
conversation.push(response.output.message);
console.log(response.output.message.content[0].text);
return response.output.message.content[0].text;
} catch (err) {
console.log(`ERROR: Can't invoke '${modelId}'. Reason: ${err}`);
}
The document is handled as a text because a text field is provided. This is because the implementation prioritizes the text over the image.
The type definition for an image content block is such that only the image field can have a value. You should set only the image field for an image content block.
export interface ImageMember {
text?: never;
image: ImageBlock;
document?: never;
toolUse?: never;
toolResult?: never;
guardContent?: never;
$unknown?: never;
}
In order to ask about the content in the image, make another content block containing the question.
const content1 = {
text: "What's in this image?"
};
const content2 = {
image: {
format: "png",
source: {
bytes: imageBuffer.buffer,
},
}
};
conversation.push({
role: "user",
content: [content1, content2],
});