javaregexalgorithmdata-structures

Matching Rules Set with input data set in Java


I'm working on a problem where I have two Input sets

Input1 :

Multiple Set of rules (Sample):

RuleSet1:

1. I am $name
2. I am $age years old
3. $bookname is my favorite book
   ....

RuleSet2:

1. I love $sportname
2. $color is my favorite color
   ....

RuleSet3:

1. $fruit is my favorite fruit
2. I am a $diet
3. I speak $language
4. I am from $countryname
   ....

Here $name,$age,$bookname... are placeholders. There could be multiple such rule sets. There is no limit.

Input2 :

Multiple Set of Input Strings.

Set 1:

1. I am 26 years old
2. I am James
   .....

Set 2:

1. I am John
2. ToKillAMockinBird is my favorite book
   .......

Set 3:

1. TuesdaysWithMorrie is my favorite book
2. I am Bill
3. I am 26 years old
   ......

Set 4:

1. I am Jack
2. I am 27 years old
3. WarAndPeace is my favorite book
   ......

Set 5:

1. I am a vegan
2. I speak English
   ......

Set 6:

1. Purple is my favorite color
2. I love football
   ......

Problem Statement :

For each Set of Strings in Input 2, I need to match with Input 1 and say if these strings appeared in the same order or not.

Output :

Set1 --> false
Set2 --> true
Set3 --> false
Set4 --> true
Set5 --> true
Set6 --> false

I tried brute force by iterating each string in each input set and checking if it exists or not, if so, giving them a number, finally checking if these numbers are in ascending order or not. But, it's not efficient. The input Set1, set2 could be huge data sets. Is there a better way of solving this?


Solution

  • Here is a thought: concatenate the lines in the rule sets and input sets into one line with some special delimiter (or alternately a surrounding pattern)

    so rule set #1 can look like this
    I am $name ### I am $age years old ### $bookname is my favorite book
    or like this
    [I am $name] [I am $age years old] [$bookname is my favorite book]

    then you can do the same for the input sets and compare. seems to me like replacing the placeholders with regex \w+ may be sufficient