azurecloudazure-cognitive-searchembeddingacs

Getting 'Request Entity Too Large' error when uploading items to Azure Cognitive Search Index - How to resolve?


I am encountering an error while attempting to upload items(list of sentence and its embedding vector) to Azure Cognitive Search Index.

The error message I receive is "RequestEntityTooLargeError: Operation returned an invalid status 'Request Entity Too Large'. Content: The page was not displayed because the request entity is too large." I am using the search_client.upload_documents() method to send a payload of items. However, it seems that the size of the payload I am sending exceeds the maximum allowed limit by the server, resulting in this error.

Below is the code:

search_client = SearchClient(endpoint=service_endpoint, index_name=index_name, credential=credential) 
result = search_client.upload_documents(items)

Error log:

RequestEntityTooLargeError                Traceback (most recent call last)
File c:\Users\anaconda3\envs\llm\lib\site-packages\azure\search\documents\_search_client.py:643, in SearchClient._index_documents_actions(self, actions, **kwargs)
    642 try:
--> 643     batch_response = self._client.documents.index(
    644         batch=batch, error_map=error_map, **kwargs
    645     )
    646     return cast(List[IndexingResult], batch_response.results)

File c:\Users\anaconda3\envs\llm\lib\site-packages\azure\core\tracing\decorator.py:76, in distributed_trace..decorator..wrapper_use_tracer(*args, **kwargs)
     75 if span_impl_type is None:
---> 76     return func(*args, **kwargs)
     78 # Merge span is parameter is set, but only if no explicit parent are passed

File c:\Users\anaconda3\envs\llm\lib\site-packages\azure\search\documents\_generated\operations\_documents_operations.py:1268, in DocumentsOperations.index(self, batch, request_options, **kwargs)
   1267 if response.status_code not in [200, 207]:
-> 1268     map_error(status_code=response.status_code, response=response, error_map=error_map)
   1269     error = self._deserialize.failsafe_deserialize(_models.SearchError, pipeline_response)

File c:\Users\anaconda3\envs\llm\lib\site-packages\azure\core\exceptions.py:109, in map_error(status_code, response, error_map)
    108 error = error_type(response=response)
--> 109 raise error

RequestEntityTooLargeError: Operation returned an invalid status 'Request Entity Too Large'
Content: The page was not displayed because the request entity is too large.



It works when I split the items in half and then upload. But I am not able to find any documentation for the limit settings. 
Could anyone suggest a solution or provide any reference to set this limit on Azure or direct me to where I can find information regarding the limit definition on Azure?

Solution

  • It works when I split the items in half and then upload. But I am not able to find any documentation for the limit settings. Could anyone suggest a solution or provide any reference to set this limit on Azure or direct me to where I can find information regarding the limit definition on Azure?

    The limit is 1000 documents per batch or about 16MB per batch. This is documented here: https://learn.microsoft.com/en-us/rest/api/searchservice/addupdate-or-delete-documents (very first paragraph).

    You would need to reduce the batch size so that you stay under these limits to successfully upload the documents.