I created a very basic search option for my blog, and as per topics and key words it is generating results but what i am looking for is in certain articles i have to add links so if my search can go through those links that are basically external websites for example if i am referring to someone else blog for more information then search to find from that.Is it possible ? And i don't want to go for GCSE. Thanks in advance. It will be of great help.
Thanks again.
Yes, it is possible to write a bot to crawl external websites from links. I've made one. It crawled 100K+ website URLs. So yes, it is possible to make one, which can crawl links from your blog.
To create a search engine, you'll need to know some internals regarding how they work...
Search Bots work like this:
Parser splits the HTML into pieces, so that data can be extracted from the page. This has 2 sub-components to it, which...
a. Extracts any data from the page that you want to capture & then saves that data into a database.
b. Extracts links & places them back into the crawling queue. This creates an infinite loop, so your bot never stops crawling... (Unless someone else's malformed URL crashes it, which happens a lot. So be ready to frequently fix it.)
Indexer creates lookup indexes, which map keywords to the web page's contents. This has 2 sub-components to it, as it...
a. Creates a Forward Index, which maps each document to keywords that are inside of that document.
doc1 | bird, aviary, robin, dove, blue jay, cardinal
doc2 | birds, bird watching, binoculars
doc3 | cats, eat, birds
doc4 | cats, generally, don't, like, water, nor, neighborhood, dogs
doc5 | dog, shows, look, fun
b. Creates an Inverted Index from the Forward Index, which reverses the indices. This allows users to search by keyword & then the search script looks up & suggests which documents, that users may want to view. Like so...
bird | doc1, doc2
cat | doc3, doc4
dog | doc4, doc5
Search Forms work like this:
Examples:
Searching for:
"bird" returns links to "doc1, doc2"
"cat" returns links to "doc3, doc4"
"dog" returns links to "doc4, doc5"
Good luck building your search engine for your blog!