I'm trying to build a search engine that allows my users to search with natural language commands, just like Google Now. Except, my search engine is slightly more constrained, in that it is mainly going to be used within an e-commerce site, and allow the users to search for certain devices.
Some of the features I want to provide are:
1) Allow users to search by brand 2) By model 3) by price range 4) By 3g/4g capability 5) By Operating System
etc. etc.
I built a mock version using which looks for certain keywords, like "price", "cost", "iphone 5", etc.
Is building my own dictionary/array of keywords the best way to accomplish this?
Or are there existing dictionaries/APIs to help parse my User's search query and return the appropriate information.
See the following example:
"find me an android phone with 4 gb ram and at least 16 gb storage."
First of all you need a list of words that you can directly extract from the input and insert in your search query.This is the easy part.
"find me an android phone with 4 gb ram and at least 16 gb storage."
Next, there would be numbers in the input.
"find me an android phone with 4 gb ram and at least 16 gb storage."
You would have already extracted the words "android" and "phone" by now. Now you have to extract the numbers - along with 2 or 3 words before and after them.
"find me an android phone with 4 gb ram and at least 16 gb storage."
Your next step is to figure out what these numbers mean. For this, you need a table like one below: (Lets call this table "Properties")
Compare the above table with the numbers and words around the numbers you extracted from the input:
4---phone,with,gb,ram 6---at,least,gb,storage
By using a decent algorithm and the properties table you could easily figure out what the numbers mean.Just compare each number with each property - first check if the number number falls in the range of the property, and then check if the words around the number in the input match with the tags of the property. Now that you know what each of the numbers mean (4=ram,6=storage), you have to check for inequalities in the input.
Checking inequalities:
If there are any phrases such as "at least","not more than", "at most" etc, then you have an inequality. In our case , you can find no such words near the number 4, but you can find the term "at least" near the number 6. This means that the user wants a phone with exactly 4 gb ram (use == ), but the internal storage space could be greater than or equal to 6 gb (use >=).
Also note that, if you are not able to associate a number with a property, its most probable that the number is the price.
There are more features you could add.. like letting the user sort the results in the increasing/decreasing order of a property.For example:
"find me an android phone with 4 gb ram and at least 16 gb storage.Show the cheap ones first"