variantvcf-variant-call-formatvcftoolsgatk

How to run ensembl-vep in conda


I’ve installed like so:

conda install ensembl-vep=105.0-0

And then installed the human cache like this:

vep_install -a cf -s homo_sapiens -y GRCh38 -c /mnt/gpfs/live/rd01__/ritd-ag-project-rd018o-mdflo13/refs/vep —CONVERT

But I can’t get it to run with any commands, e.g.

vep --dir_cache "/mnt/gpfs/live/rd01__/ritd-ag-project-rd018o-mdflo13/conda/envs/bioinfo/share/ensembl-vep-105.0-0" \
   -i "/mnt/gpfs/live/rd01__/ritd-ag-project-rd018o-mdflo13/data/test/manual/results/variants/cohort.norm_recalibrated.vcf" \
   -o "/mnt/gpfs/live/rd01__/ritd-ag-project-rd018o-mdflo13/data/test/manual/results/variants/vep_output.txt”

This gives an error message about downloading caches:

IMPORTANT INFORMATION:
The VEP can read gene data from either a local cache or local/remote databases.

Or this one:

vep --cache \
   -i "/mnt/gpfs/live/rd01__/ritd-ag-project-rd018o-mdflo13/data/test/manual/results/variants/cohort.norm_recalibrated.vcf" \
   -o "/mnt/gpfs/live/rd01__/ritd-ag-project-rd018o-mdflo13/data/test/manual/results/variants/vep_output.txt”

This gives the error:

MSG: ERROR: Cache directory /mnt/gpfs/home/skgtmdf/.vep/homo_sapiens not found

I don’t suppose anyone would be able to point me in the right direction?


Solution

  • I found the answer. You need to add both --cache and --dir_cache arguments:

    vep --cache --dir_cache "/mnt/gpfs/live/rd01__/ritd-ag-project-rd018o-mdflo13/conda/envs/bioinfo/share/ensembl-vep-105.0-0" \
       -i "/mnt/gpfs/live/rd01__/ritd-ag-project-rd018o-mdflo13/data/test/manual/results/variants/cohort.norm_recalibrated.vcf" \
       -o "/mnt/gpfs/live/rd01__/ritd-ag-project-rd018o-mdflo13/data/test/manual/results/variants/vep_output.txt"