nlpdatasetsequence-to-sequence

What are the details of Sequence-to-sequence model for text summarization?


It is clear how to train encoder-decoder model for translation: each source sequence has its corresponding target sequence (translation). But in case of text summarization abstract is much shorter than its article. According to Urvashi Khandelwal, Neural Text Summarization each source sentence has its abstract (shorter or longer). But I hardly beleive there is any such dataset exists where each sentence has its corresponding abstract. So, if i am right, what are the possible ways to train sunch model? Otherwise are there any free datasets for text summarization?


Solution

  • You're right that there are very few large datasets that were created specifically to be used for training text summarization models. People tend to use other existing data and find ways to turn it into a summarization problem. You can read other text summarization papers to see what they do.