apache-sparkluigihail

Access different type of preset target location in Luigi


I have a luigi pipeline. There is a file where Google Cloud is set as a target location:

https://github.com/macarthur-lab/hail-elasticsearch-pipelines/blob/d6e9dedbce929c04c294c54095663ba94a4de3f0/luigi_pipeline/lib/hail_tasks.py#L37

Now, there is run_vep() method that calls other ones ultimately ending up calling the following, different run_vep():

https://github.com/macarthur-lab/hail-elasticsearch-pipelines/blob/d6e9dedbce929c04c294c54095663ba94a4de3f0/hail_scripts/v02/utils/hail_utils.py#L103

There we are using Google Cloud path to access the files, but I want now to access local files instead. Is there a way to change where luigi looks for files temporarily? The thing is that I have two locations for where luigi should get the files and need both of them to be accessible instead of just one or the other. How could this issue be handled in luigi?


Solution

  • It turns out that their function hl.vep() actually expects config that should have local paths and not hadoop ones. After specifying the local paths, things worked for me. Still it would be interesting to know how to access this or that file system directly for which source code of hl.vep() may be useful too.