I have a Blazor Hosted WebAssembly application under .NET8. That means I have a Client, a Server and a Shared projects.
Into Server, I have the following controller, which has a loop which triggers for every new entry the backend service provides:
[HttpPost, Route("/api/chat/communicate_async")]
public async IAsyncEnumerable<string> DoCommunicateAsync(ChatRequest chat)
{
IAsyncEnumerable<string> results = _IChat.DoCommunicateAsync(chat);
await foreach (var result in results)
{
yield return result;
}
}
Into Client, on the razor page, I have the following code to make each loop iteration on the controller be presented into UI:
CancellationToken cancellationToken = GetCancellationToken();
var requestContent = new StringContent(System.Text.Json.JsonSerializer.Serialize(chatRequest), Encoding.UTF8, "application/json");
using var requestMessage = new HttpRequestMessage(HttpMethod.Post, "api/chat/communicate_async")
{
Content = requestContent
};
requestMessage.SetBrowserResponseStreamingEnabled(true); // Enable response streaming
using var response = await Http.SendAsync(requestMessage, HttpCompletionOption.ResponseHeadersRead);
using Stream stream = await response.Content.ReadAsStreamAsync(cancellationToken);
var lines = System.Text.Json.JsonSerializer.DeserializeAsyncEnumerable<string>(
stream,
new JsonSerializerOptions
{
PropertyNameCaseInsensitive = true,
DefaultBufferSize = 128
},
cancellationToken);
await foreach (string? line in lines)
{
chatResponse.response += line;
StateHasChanged();
}
I also have tried different approach on Client to make each loop iteration on the controller be presented into UI:
CancellationToken cancellationToken = GetCancellationToken();
chatResponse.response = string.Empty;
var requestContent = new StringContent(System.Text.Json.JsonSerializer.Serialize(chatRequest), Encoding.UTF8, "application/json");
var requestMessage = new HttpRequestMessage(HttpMethod.Post, "api/chat/communicate_async")
{
Content = requestContent
};
//requestMessage.Headers.Accept.Append(new MediaTypeWithQualityHeaderValue("application/stream+json"));
requestMessage.SetBrowserResponseStreamingEnabled(true); // Enable response streaming
var response = await Http.SendAsync(requestMessage, HttpCompletionOption.ResponseHeadersRead);
IAsyncEnumerable<string?> results = response.Content.ReadFromJsonAsAsyncEnumerable<string>();
await foreach (string? result in results)
{
chatResponse.response += result;
StateHasChanged();
}
The client (with two approaches) seems that is waiting until all iterations are finished in controller in order to make the json loop and update UI. I have read lot of articles so far and tried different things from here: https://www.tpeczek.com/2021/07/aspnet-core-6-and-iasyncenumerable.html And from here: Streaming lines of text over HTTP with Blazor using IAsyncEnumerable Tried already change also DefaultBufferSize but seems this is not the problem as each trigger from controller should trigger the loop in client.
Update 2024/05/28
Seems that the below code which generates parts of a sentence, although it iterates though IAsyncEnumerable does not seems to be streamed on client (this is from local LLM from LLAMASharp library:
await foreach (string? result in session.ChatAsync(new ChatHistory.Message(AuthorRole.User, chat.prompt), inferenceParams, cancellationToken))
{
if (result != null)
{ yield return result; }
}
While the below code seems that is streamed though client from Azure.AI library:
await foreach (StreamingChatCompletionsUpdate chatUpdate in client.GetChatCompletionsStreaming(completionsOptions))
{
string? result = chatUpdate.ContentUpdate;
if (result != null)
{ yield return result; }
}
Any ideas?
I have opened this as a bug into LLAmaSharp and the respond was a mysterious workaround by placing a Time Sleep exacly before the yield return result:
await foreach (string? result in session.ChatAsync(new ChatHistory.Message(AuthorRole.User, chat.prompt), inferenceParams, cancellationToken))
{
if (result != null)
{
await Task.Delay(TimeSpan.FromMilliseconds(1));
yield return result;
}
}
See the conversation here: https://github.com/SciSharp/LLamaSharp/issues/762 It seems to me that something is happening with the buffering in .NET8