repositorybotsinformation-retrievalinformation-extraction

Creating an information repository referencing bot


I would like to create a bot. Someone would type "!123" the bot will search the repository for the value "123" and return(paste) the information found for that value back. I'd like this to be universal..meaning it can be used anywhere, so some sort of firefox plugin maybe.

Can someone provide me with information on where i can start?

I have an understanding of programming in c# and java.

P.s There is no intention for this to be some sort of spam bot, i just want to have a collection of information where people can easily reference it.


Solution

  • there are multiple portions to your project.

    1. Bot that would crawl the data from the web and save the data in the db. (given you are considering to build your repository from web). Google Web Crawler/scraper for that.
    2. Data extractor/Cleanser that would clean the data and extract relevant information about a particular document. (this is important so that you could tag the information for relevant information)
    3. Then is the Search Engine part which enables you to retrieve relevant data from the repository. try vector similarity algorithm for that