hadoophdfshadoop2webhdfs

How to get specific key/value from HDFS via HTTP or JAVA API?


How can I get the value of one or more keys in HDFS via HTTP or JAVA api from remote client? For example, the file below has a million keys and values. I just want to get the values of the 'phone' and 'toys' keys.

MyFile:

book, 5
notebook, 5
phone, 3
toys, 2
.
.
.

Solution

  • HDFS is block storage, not a Key Value store.

    If you need queries such as this, your options include Accumulo, HBase or Hive (plus variants such as Presto/Trino, Drill, Spark, etc).

    Otherwise, you must read the entire file, then loop over each line, looking for those values. This is not ideal considering that HDFS files may be several GB large, and you shouldn't be streaming GB worth of data over HTTP/RPC for simple KV lookups. Instead, you could use MapReduce or Spark to read the file as 2-column CSV file, but again, this would iterate and parse all lines, not be an indexable lookup table.

    Alternatively, use or dump your data into a traditional database you can query for specific values