I am trying to search a list of 268 000 words. The idea is to check whether a word that the user inputs exists in that list. I have accomplished this using a simple I/O stream, but the search takes about 5 seconds, which is too long. My file is currently located in Assets. I have looked for more efficient ways to search my file, and I came across Memory Mapped Buffer. However, it is not clear to me where I should store my file in the following example:
import java.io.File;
import java.io.IOException;
import java.io.RandomAccessFile;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;
public class ReadFiles {
private static String largeFile = "sowpods.txt";
public static void read() throws IOException {
File file = new File(largeFile);
FileChannel fileChannel = new
RandomAccessFile(file,"r").getChannel();
MappedByteBuffer buffer = fileChannel.map(
FileChannel.MapMode.READ_ONLY, 0, fileChannel.size());
System.out.println(buffer.isLoaded());
System.out.println(buffer.capacity());
}
}
If I leave it in assets, how can I read from it? At the moment, I am getting a "sowpods.txt: open failed: ENOENT (No such file or directory)" error message. Thanks for any tips!
Using a memory mapped file is a bad idea here. You are essentially wasting OS resources, and it won't get you the best speed anyway.
If you are only performing searches once in a while, you want to keep it simple and do not want to keep file in memory between searches, go with BufferedInputStream. Give it a buffer of, say 10 kB, it should perform quite fast and most likely you will saturate disk.
If you are performing a lot of searches, try to keep the contents in memory between searches. Use a HashSet or TreeSet. If you are using HashSet, give it enough buckets to start with.
If none of this suits you (i.e. you are low on memory, you have millions of words and still want fast searches), convert the words to some SQL database, put the data in a table and index it. This is really what databases excel at. You should have no trouble finding a database that fits your purpose.
Apparently, 300k words is not a lot, and it should fit in memory easily, somewhere around 10 MB. Depending on your usage scenario, you might also want to look at Bloom filter.