information-retrievaldocuments

Textfiles for Information Retrieval


I am searching for sample .txt files for information Retrieval. Would be nice if there are sets of documents(around 20 documents) regarding one topic, e.g., sports, music, etc.

Thanks


Solution

  • There are many datasets available, for instance:

    Datasets used to evaluate IR systems: http://www.daviddlewis.com/resources/testcollections/

    More IR datasets: http://boston.lti.cs.cmu.edu/callan/Data/

    A comprehensive list of several datasets: http://zitnik.si/mediawiki/index.php?title=Datasets

    The classic news groups dataset: http://scikit-learn.org/stable/datasets/twenty_newsgroups.html

    Much bigger, news articles: http://research.signalmedia.co/newsir16/signal-dataset.html