I have a dictionary of words put in an 2-3-4 tree. Words have different lengths. Using a telephone keypad i need to find all possible words that can be responding to a specific phone number. Given the keypad:
for instance the number 26678837 could be the word "COMPUTER" but could also be another word. Given the fact that i have all my words in a 2-3-4 tree what would be the best algorithm or just way of searching in order to find all the possible words from a given phone number?
The key here would be to acquire all the different character combinations that can derive from the characters related to the supplied number in the left to right sequence of that number. Obviously the larger the supplied number, the more combinations there will be. The number of 26678837 which you provided in your post for example can generate 11,664 unique character combinations (string words).
2 6 6 7 8 8 3 7
abc mno mno pqrs tuv tuv def pqrs
Each one of these 11,664 generated string words would need to be passed through a dictionary of sorts which in itself may contain let's say 170,000 (or more) words and checked for (case insensitive) equality. This is not going to be an overly quick process since that equates to almost 2 billion inspections for equality. This takes about 1 minute, 20 seconds on my 10 year old desktop Windows box.
In any case the method provided below takes the supplied string number, generates the different character combinations (character words) and compares each one against the words within the supplied dictionary (word) text file. Read the doc's that are attached to the method:
/**
* Converts the supplied string telephone number (or any string of integer
* numbers) to possible words acquired from within the supplied dictionary
* text file.<br><br>
*
* Specific digits relate to specific alpha characters as they do in a telephone
* keypad, for example:<pre>
*
* 1 2 3 4 5 6 7 8 9 0
* ABC DEF GHI JKL MNO PQRS TUV WXYZ
* </pre>
*
* Digits 1 and 0 do not relate to any alpha characters. By looking at the
* above scale you can see how a number like "26678837" can generate 11,664
* unique character combinations (string words). Each one of these combinations
* needs to be check against the supplied dictionary (or word) text file to see
* if matches a specific dictionary word. If it does then we store that word
* and carry on to the next character combination. If the dictionary (word)
* text file contains 170,000 words and there are 11,664 different character
* combinations then a simple calculation (170,000 x 11,664) determines that
* there will need to be 1,982,880,000 (almost 2 billion) inspections done for
* equality. Because of this, many (sometimes thousands) of Executor Service
* threads are used to speed up the processing task. At the end of the day
* however, task speed completely relies on the speed of the Hard Disk Drive
* (HDD) and too many threads can actually degrade performance. It's a tough
* balance. Modify things as you see fit.
*
*
*
* @param dictionaryFilePath (String) The path and file name of the dictionary
* file to gather possible words from. This dictionary (or word) file must
* contain a single word on each line of the file. There can not be more than
* one word on any given file line. Blank lines within the file are ignored.
* Letter case is always ignored during processing however any words found and
* returned will be in Upper Letter Case.<br><br>
*
* If you don't currently have a dictionary or word text file available then
* you can easily acquire one from the Internet.<br>
*
* @param telephoneNumber (String) The telephone number (or any number) to
* convert to a dictionary word. Any non-digit characters within the supplied
* string are automatically removed so that only digits remain. The longer
* the number string, the longer it takes to process this number to a word.
* There is no guarantee that the number you supply will produce a word from
* within the supplied dictionary (word) file.<br>
*
* @param showProcess (boolean) If boolean 'true' is supplied then the processing
* is displayed within the console window. This however slows things down
* considerably but does give you an idea what is going on. Because larger
* string numbers can take some time to complete processing, a modal 'Please
* Wait' dialog with a progress bar is displayed in the middle of your screen.
* When processing has completed this dialog window is automatically closed.<br>
*
* @return (String[] Array) A String[] Array of the word or words from the
* supplied dictionary (word) file which the supplied number correlates to.
*/
public static String[] telephoneNumberToDictionaryWords(final String dictionaryFilePath,
final String telephoneNumber, final boolean showProcess) {
long timeStart = System.currentTimeMillis();
if (dictionaryFilePath == null || dictionaryFilePath.isEmpty()) {
System.err.println("The argument for dictionaryFilePath can not be null or empty!");
return null;
}
else if (telephoneNumber == null || telephoneNumber.trim().isEmpty()) {
System.err.println("The argument for telephoneNumber can not be null or empty!");
return null;
}
else if (!new File(dictionaryFilePath).exists()) {
System.err.println("File Not Found! ("
+ new File(dictionaryFilePath).getAbsolutePath() + ")");
return null;
}
String digits = String.valueOf(telephoneNumber).replaceAll("[^\\d]", "");
if (digits.isEmpty()) {
return null;
}
List<String> foundList = new ArrayList<>();
// ================== The 'Please Wait' Dialog Window ==================
// Depending on the number supplied, this process can take a while to
// complete therefore put a popup 'Please Wait...' window into play
// here.
final javax.swing.JFrame iframe = new javax.swing.JFrame();
iframe.setAlwaysOnTop(true);
iframe.setDefaultCloseOperation(JFrame.DISPOSE_ON_CLOSE);
iframe.setLocationRelativeTo(null);
final JDialog workingDialog = new JDialog(iframe);
workingDialog.setCursor(java.awt.Cursor.getPredefinedCursor(java.awt.Cursor.WAIT_CURSOR));
final JPanel p1 = new JPanel(new BorderLayout());
javax.swing.border.Border lineBorder = javax.swing.BorderFactory.createLineBorder(Color.red, 4);
p1.setBorder(lineBorder);
p1.setBackground(Color.BLACK);
p1.setPreferredSize(new java.awt.Dimension(255, 150));
final JLabel label = new JLabel();
label.setHorizontalAlignment(JLabel.CENTER);
label.setVerticalAlignment(JLabel.CENTER);
label.setText("<html><font size='5',color=#14e5eb><b><center>Please Wait..."
+ "</center></b></font><br><center>This can take a little "
+ "while.</center></html>");
label.setForeground(Color.WHITE);
p1.add(label, BorderLayout.CENTER);
final JLabel label2 = new JLabel();
label2.setHorizontalAlignment(JLabel.CENTER);
label2.setVerticalAlignment(JLabel.CENTER);
label2.setText("Hello World");
label2.setForeground(Color.WHITE);
p1.add(label2, BorderLayout.BEFORE_FIRST_LINE);
final javax.swing.JProgressBar progressBar = new javax.swing.JProgressBar();
progressBar.setPreferredSize( new java.awt.Dimension (190, 25));
progressBar.setMaximumSize(new java.awt.Dimension(190,25));
progressBar.setStringPainted(true);
p1.add(progressBar, BorderLayout.SOUTH);
workingDialog.setUndecorated(true);
workingDialog.getContentPane().add(p1);
workingDialog.pack();
workingDialog.setLocationRelativeTo(iframe);
workingDialog.setDefaultCloseOperation(JDialog.DISPOSE_ON_CLOSE);
workingDialog.setModal(true);
// =====================================================================
SwingWorker<String, Void> worker = new SwingWorker<String, Void>() {
@Override
protected String doInBackground() throws InterruptedException {
// Get the number of words contained within the supplied dictionary file.
long numberOfDictionaryWords = 0L;
// This will auto-close the stream.
try (java.util.stream.Stream<String> linesStream
= java.nio.file.Files.lines(java.nio.file.Paths.get(dictionaryFilePath))) {
numberOfDictionaryWords = linesStream.count();
}
catch (IOException ex) {
System.err.println(ex);
}
if (showProcess) {
System.out.println("The number of words in dictionary: --> "
+ numberOfDictionaryWords);
}
// Map the alpha characters related to telephone digits
HashMap<Character, String> map = new HashMap<>();
map.put('0', "");
map.put('1', "");
map.put('2', "abc");
map.put('3', "def");
map.put('4', "ghi");
map.put('5', "jkl");
map.put('6', "mno");
map.put('7', "pqrs");
map.put('8', "tuv");
map.put('9', "wxyz");
// Get all the character combinations available for the
// telelphone number (or just number) digits provided.
List<String> phoneCharactersList = new ArrayList<>();
phoneCharactersList.add("");
for (int i = 0; i < digits.length(); i++) {
if (digits.charAt(i) == '0' || digits.charAt(i) == '1') {
continue;
}
ArrayList<String> tempList = new ArrayList<>();
String option = map.get(digits.charAt(i));
for (int j = 0; j < phoneCharactersList.size(); j++) {
for (int p = 0; p < option.length(); p++) {
tempList.add(new StringBuilder(phoneCharactersList.get(j))
.append(option.charAt(p)).toString());
}
}
phoneCharactersList.clear();
phoneCharactersList.addAll(tempList);
}
if (showProcess) {
System.out.println("The number of generated words to process: --> "
+ phoneCharactersList.size());
}
progressBar.setMinimum(0);
progressBar.setMaximum(phoneCharactersList.size());
/* Iterate through all the character combinations now contained
within the 'phoneCharactersList' and compare them to the words
contained within the dictionary file. Store found words into
the foundList List Interface object.
Declare concurrent executor service threads (one for each possible
word to check for in dictionary file). */
java.util.concurrent.ExecutorService executor =
java.util.concurrent.Executors.newFixedThreadPool(phoneCharactersList.size());
for (int i = 0; i < phoneCharactersList.size(); i++) {
String wordToFind = phoneCharactersList.get(i);
label2.setText("<html><center>Processing word: <font color=yellow>"
+ wordToFind + "</font></center></html>");
if (showProcess) {
System.out.println((i + 1) + ") Processing the possible word: " + wordToFind);
}
Runnable threadWorker = new Runnable() {
@Override
public void run() {
try (java.io.BufferedReader br = new java.io.BufferedReader(new java.io.FileReader(dictionaryFilePath))) {
String line;
while ((line = br.readLine()) != null) {
line = line.trim();
if (line.isEmpty() || (line.length() != wordToFind.length())) {
continue;
}
if (line.equalsIgnoreCase(wordToFind)) {
foundList.add(wordToFind.toUpperCase());
break;
}
}
}
catch (Exception ex) {
System.err.println(ex);
}
}
};
executor.execute(threadWorker);
progressBar.setValue(i+1);
}
// Shut down Executor Service threads (when they are done)
executor.shutdown();
// Hold rest of code processing until are executors are terminated.
while (!executor.isTerminated()) {
}
return null;
}
@Override
protected void done() {
workingDialog.dispose();
iframe.dispose();
workingDialog.setCursor(java.awt.Cursor.getPredefinedCursor(java.awt.Cursor.DEFAULT_CURSOR));
}
};
worker.execute();
workingDialog.setVisible(true);
try {
worker.get();
}
catch (Exception e1) {
e1.printStackTrace();
}
java.util.Collections.sort(foundList);
long timeEnd = System.currentTimeMillis();
if (showProcess) {
long seconds = (timeEnd - timeStart) / 1000;
String timeSpan = new StringBuffer("").append(String.valueOf(seconds))
.append(" seconds").toString();
if (seconds > 60) {
timeSpan = new StringBuffer("").append(String.valueOf(seconds/60))
.append(" minutes and ").append(String.valueOf(seconds%60))
.append(" seconds").toString();
}
System.out.println(new StringBuffer("It took ").append(timeSpan)
.append(" to process the number: ").append(telephoneNumber)
.toString());
}
return foundList.toArray(new String[foundList.size()]);
}
Modify the code as you see fit so as to utilize your 2-3-4 Tree, B-Tree, or whatever you like. To use this method you might do it like this:
String phoneNumber = "26678837";
String[] words = telephoneNumberToDictionaryWords("dictionary.txt", phoneNumber, false);
if (words == null || words.length == 0) {
System.out.println("There are no dictionary words available for "
+ "the Phone Number: " + phoneNumber);
}
else {
for (String str : words) {
System.out.println(str);
}
}