algorithmsearchfull-text-searchfull-text-indexing

suggestions on fulltext search or already existing search algorithms


Can someone suggest how to solve the below search problem easily, I mean is there any algorithm, or full text search will be suffice for this?

There is below classification of items data,

ItemCategory ItemCluster ItemSubCluster SubCluster Items
Vegetable Root vegetables Root WithOutSkin potato, sweet potato, yam
Vegetable Root vegetables Root WithSkin onion, garlic, shallot
Vegetable Greens Leafy green Leaf lettuce, spinach, silverbeet
Vegetable Greens Cruciferous Flower cabbage, cauliflower, Brussels sprouts, broccoli
Vegetable Greens Edible plant stem Stem celery, asparagus

The inputs will be some thing like,

sweet potato, yam
Yam, Potato
garlik, onion
lettuce, spinach, silverbeet
lettuce, silverbeet
lettuce, silverbeet, spinach

From the input, I want to get the mapping of the input items those belongs to which ItemCategory, ItemCluster, ItemSubCluster, SubCluster.

Any help will be much appreciated.


Solution

  • You are nearly following the right approach.

    You don't need full text searching here.

    What you can create here is a kind of inverted index as follows:

    If we take example of potato, create a map for potato storing what is its ItemCategory, ItemCluster, ItemSubCluster, SubCluster.

    For example -

    "potato": {
        "ItemCategory": "Vegetable",
        "ItemCluster": "Root vegetables",
        "ItemSubcluster": "Root",
        "Subcluster": "Without Skin"
    }
    

    Now, to store this kind of data for each vegetable would be expensive.

    You can optimise the storage by using an encoding scheme:

    For example -

    let ItemCategory be denoted by 0, let ItemCluster be denoted by 1, let ItemSubcluster be denoted by 2, let Subcluster be denoted by 3

    and the values be denoted by a similar encoding scheme:

    let Vegetable be denoted by 0, let Root vegetables be denoted by 1, let Root be denoted by 2, let Without Skin be denoted by 3

    Now, your mapping becomes:

    "potato": {
        "0": "0",
        "1": "1",
        "2": "2",
        "3": "3",
    }
    

    To further optimise this, you can also make maintain an index of vegetables. For example, potato can be denoted by 0.

    So your final index becomes:

    "0": {
        "0": "0",
        "1": "1",
        "2": "2",
        "3": "3",
    }