Recently, I have been using Azure AI cognitive services to summarize text using document summarization and conversation summarization of it. But the summary length using both document summarization and conversation summarization is very less.
According to the documentation, you can give a maximum sentence length of 20 for a summary.
If you want to get a summary with more than 20 sentences, you can split your document and summarize it.
Example: If your document length is long, split it based on the topic or according to your requirements, then summarize it.
Below is the document I have with a length of 4779
.
Next, split it and summarize it.
Here, I am using the Python SDK to perform an extractive summary.
Code:
# This example requires environment variables named "LANGUAGE_KEY" and "LANGUAGE_ENDPOINT"
key = "db2............."
endpoint = "https://<congnitive_name>.cognitiveservices.azure.com/"
from azure.ai.textanalytics import TextAnalyticsClient
from azure.core.credentials import AzureKeyCredential
# Authenticate the client using your key and endpoint
def authenticate_client():
ta_credential = AzureKeyCredential(key)
text_analytics_client = TextAnalyticsClient(
endpoint=endpoint,
credential=ta_credential)
return text_analytics_client
client = authenticate_client()
# Example method for summarizing text
def sample_extractive_summarization(client,doc):
poller1 = client.begin_extract_summary(documents=doc,max_sentence_count=20)
document_results = poller1.result()
for i in document_results:
print(len(i['sentences']))
sample_extractive_summarization(client,document)
Output before chunking the document.
You can see a maximum of 20 sentences.
Output After chunking.
Code for chunking.
def chunk_string(string, chunk_size):
chunks = []
for i in range(0, len(string), chunk_size):
chunks.append(string[i:i+chunk_size])
return chunks
chunk_size = 1000
chunks = chunk_string(document[0], chunk_size)
sample_extractive_summarization(client,chunks)
Here, I am chunking with a length of 1000.
Now, if you add those lengths, you will get 25
sentences.
With the help of chunking, you can increase the summary length.
Note: I just used indexing for chunking, but in your case, you should do chunking that makes sense for your document, like topic-wise splitting the document.