bioinformaticssequence-alignment

BWA fail to locate the index files


I'm currently working on trying to analyze a dataset. I'm new to the field of bioinformatics and was trying to use BWA tools, however, as soon as I reach bwa mem, I keep running into the same error:

input --> mirues-macbook:sra ipmiruek$ bwa mem -t 8 Homo_sapiens.GRCh38.dna.chromosome.17.fa ERR3841737/ERR3841737_trimmed.fq.gz > ERR3841737/ERR3841737_mapped.sam

output --> [E::bwa_idx_load_from_disk] fail to locate the index files

I've already indexed the reference chromosome as such:

bwa index Homo_sapiens.GRCh38.dna.chromosome.17.fa.gz

Is there anything I could do to fix this problem? Thank you.

I tried changing the dataset that I was using along with the corresponding reference chromosome but it still yielded the same result. Is this an issue with the code or with the dataset I'm working with?


Solution

  • It looks like you indexed a gzip-compressed FASTA file, but are supplying an index base (idxbase) without the .gz extenstion. What you want is:

    $ bwa mem \
        -t 8 \
        Homo_sapiens.GRCh38.dna.chromosome.17.fa.gz \ 
        ERR3841737/ERR3841737_trimmed.fq.gz \
        > ERR3841737/ERR3841737_mapped.sam
    

    Alternatively, gunzip the reference FASTA file and index it. For example:

    $ gunzip Homo_sapiens.GRCh38.dna.chromosome.17.fa.gz
    $ bwa index Homo_sapiens.GRCh38.dna.chromosome.17.fa
    

    Note that BWA packs the reference sequences (into the .pac file), so you don't even need the FASTA file to run BWA MEM after it's been indexed.