I know I can read a local file in Scala
like so:
import scala.io.Source
val filename = "laba01/ml-100k/u.data"
for(line <- Source.fromFile(filename).getLines){
This code words fine and prints out the lines from the text file. I run it in JupyterHub
with Apache Toree
I know I can read from HDFS
at this server, because when I run the next code in another cell:
import sys.process._
"hdfs dfs -ls /labs/laba01/ml-100k/u.data"!
it works fine too, and I can see this output:
-rw-r--r-- 3 hdfs hdfs 1979173 2020-04-20 17:56 /labs/laba01/ml-100k/u.data
lastException: Throwable = null
warning: there was one feature warning; re-run with -feature for details
Now I want to read this same file kept in HDFS
by running this:
import scala.io.Source
val filename = "hdfs:/labs/laba01/ml-100k/u.data"
for(line <- Source.fromFile(filename).getLines){
but I get this output instead of the file's lines printed out:
lastException = null
Name: java.io.FileNotFoundException
Message: hdfs:/labs/laba01/ml-100k/u.data (No such file or directory)
StackTrace: at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at scala.io.Source$.fromFile(Source.scala:91)
at scala.io.Source$.fromFile(Source.scala:76)
at scala.io.Source$.fromFile(Source.scala:54)
So how do I read this text file from HDFS
will not able to find any file in HDFS. It's not for that. If I'm not wrong it can only read file that are in your local (file:///
You need to use hadoop-common.jar
to read the data from HDFS.
You can find code example here https://stackoverflow.com/a/41616512/7857701