I want to implement my search feature into my website and show user results in all possible ways.
I'm currently using this Query
$rs = $conn->query("SELECT * FROM search_engine WHERE soundex(keyword) LIKE soundex('%$q%') UNION SELECT * FROM search_engine WHERE title LIKE '%$q%' OR link LIKE '%$q%' ORDER BY `clicks` DESC");
I know that this query is not good.
This query only shows results when parameters match the title, Keywords, or link
e.g. when I search for Extract all URL from sitemap.xml with PHP CURL
query, I'm getting results but when I try the query How to Extract all URL from sitemap.xml with PHP CURL
I can't get any results, so what can I do to make my search feature powerful.
I build search engines.
I'm going to give you 6 tips to explore, so that you can continue to learn how to program & proceed if desired.
TIP #1: Focus!
First define what you're trying to accomplish. Think about what you really want to do, before trying to build a search engine from scratch. It may not be your end goal AKA what you really want to do.
Do you really want to crawl the web, with this idea: "Extract all url from sitemap.xml with PHP CURL"?
Or do you simply want to add a search box to your website, which gets product data from your product database & displays that product data on your website, with this idea: "I want to implement my own search feature into my website"?
It's kind of hard to tell.
If you want to add a product-based "search feature" to your website, then you don't need to extract content from an XML Sitemap. You'd simply retrieve it from a database like: MySQL, PosgreSQL, Oracle, SQL Server, etc... and display the results onto your search results page. That's usually what people are looking to do, when they want to add a "search feature" into their website.
TIP #2: For searching, simpler is faster.
This is good to remember, when writing code: Simple Always Wins. It's known as the "S.A.W. Principle".
First, let's look at your SQL. It has 2 select statements, which are joined together with a UNION
keyword.
SELECT * FROM search_engine
WHERE soundex(keyword) LIKE soundex('%$q%')
UNION
SELECT * FROM search_engine
WHERE title LIKE '%$q%' OR link LIKE '%$q%'
ORDER BY `clicks` DESC
Since this looks like it's coming from the same database table, then you could combine it as follows... where the change is replacing UNION SELECT * FROM search_engine WHERE
with OR
:
SELECT * FROM search_engine
WHERE soundex(keyword) LIKE soundex('%$q%')
OR title LIKE '%$q%' OR link LIKE '%$q%'
ORDER BY `clicks` DESC
So if you can remove the UNION keyword & combine the 2 select statements into 1 select statement, then the database engine can do less work to fulfill the search query request.
If you're using 2 different tables, then you'll need to do some homework to look up a concept known as an inverted index. The concept is the same: Keep the search as simple as possible... so that the database server does as little work as possible... so that the search experience runs as fast as possible!
Even though simpler means faster, it doesn't mean more accuracy.
TIP #3: Accuracy makes a search engine more relevant for users. Think of this as accuracy = "powerful".
Let's look at these page titles & how a search query works with them:
For these results to match with accuracy, then you'd have to have the 2nd page's full title in your database. If you only have the 1st page's title (without the "How to "), then a query for the 2nd page's title won't find a result for the 1st page's title. That's the problem that you've noticed with your site's search feature.
The reason is that a query for the 1st page's title is an exact partial match inside of the 2nd page's title. However a query for the 2nd page's title is not an exact match, nor even a partial match of the 1st page's title.
To get around that problem, search engines work on the basis of keywords.
TIP #4: Learn about Keywords vs. Stop Words & how to parse them in your search query.
In a search query, there are both relevant key words known as keywords & non-relevant junk words, called stop words. You may want to investigate the concept of what stop words are & how search engines use them or most often, throw them away before the search query is actually performed.
So in your queries, these are your unique & meaningful keywords. They have self-contained concrete meanings, when you think of each word individually.
array('extract', 'url', 'sitemap.xml', 'PHP', 'CURL')
Concrete meanings:
These are most likely the stop words, which have either no meaning to them by themselves or a vague meaning.
array('How', 'to', 'all', 'from', 'with');
Vague Meanings:
Search engines usually strip stop words & query the meaningful keywords for results. A relevance score is how accurate the search results are.
Here is a hypothetical example (which I made up off of the top of my head while writing this): If a query finds a page with 1 of 5 unique keywords, then the relevancy score would be 20%. If it finds a page with 4 of 5 unique keywords, then the relevancy score would be 80%. It's hypothetical, because it's not how any specific search engine currently works. It's just a basic concept to explain a point, using a simple illustration.
The relevancy algorithm & score is really up to the search engine designer/builder to create. The relevancy algorithm can be as simple or as complex as the search engine designers and/or builders want to make it. Search engine developers can spend a lot of time fine-tuning that relevancy algorithm & score. It also depends on the search algorithm, which is used & how well the search bot finds data for those algorithms.
Tip #5: Explore building search bots!
You should look into building search bots, if you really want to accomplish this: "Extract all url from sitemap.xml with PHP CURL".
I've written a search bot too. It has already crawled over 1 million URLs!
PHP Curl isn't what extracts links. It's what fetches content from 1 URL. The search bot has to be written to parse the returned HTML, so that it can figure out what to extract from those search results.
Just a warning: People don't write perfect HTML syntax in their URLs. So your search bot will require a lot of fine tuning to get it to detect sloppy programming, which will crash your search bot. That is a huge time commitment! Just be ready for spending years on this project or even decades, if you decide to pursue building your own search bot. Building a search engine is a long journey! Your search bot WILL CRASH hundreds to thousands of times, before you can get it to crawl millions of URLs.
So... Do you really want to "Extract all url from sitemap.xml" or do you want to query a list of previously uploaded product data, which resides in your database? That latter database querying idea is A LOT FASTER to build & easier to maintain in the future!
Tip #6: If you don't want to spend a lot of time building a search engine from scratch, plus a search bot from scratch, plus a relevancy score algorithm from scratch, then look at some pre-built search engine solutions. Here are a few popular ones. They can be fun to play around with!
Conclusion: Search engines are not easy to build! They can take years to build. Be ready for a significant time commitment (easily months, realistically years, possibly decades), if you really want to accomplish this goal: "I want to show results to user in all possible ways."