I'm trying to use rBlast
for protein sequences but somehow it doesn't work. It works fine for nucleotide sequences but for proteins it just doesn't return any match (I used a sequence from the searched dataset, so there can't be no match). In the description it stands "This includes interfaces to blastn, blastp, blastx..." but in the help file in R studio it says "Description Execute blastn from blast+". Did anybody run rBlast
for proteins?
Here's what I ran:
listF<-list.files("Trich_prot_fasta/")
fa<-paste0("Trich_prot_fasta/",listF[i])
makeblastdb(fa, dbtype = "prot", args="")
bl <- blast("Trich_prot_fasta/Tri5640_1_GeneModels_FilteredModels1_aa.fasta", type="blastp")
seq <- readAAStringSet("NDRkinase/testSeq.txt")
cl <- predict(bl, seq)
Result:
> cl <- predict(bl, seq)
Warning message: In predict.BLAST(bl, seq) : BLAST did not return a match!
Tried to reproduce the error but everything worked as expected on my system (macOS BigSur 11.6 / R v4.1.1 / Rstudio v1.4.1717).
Given your blastn was successful, perhaps you are combining multiple fasta files for your protein fasta reference database? If that's the case, try concatenating them together and use the path to the file instead of an R object ("fa") when making your blastdb. Or perhaps:
makeblastdb(file = "Trich_prot_fasta/Tri5640_1_GeneModels_FilteredModels1_aa.fasta", type = "prot)
Instead of:
makeblastdb(fa, dbtype = "prot", args="")
Also, please edit your question to include the output from sessionInfo()
(might help narrow things down).
library(tidyverse)
#BiocManager::install("Biostrings")
#devtools::install_github("mhahsler/rBLAST")
library(rBLAST)
# Download an example fasta file:
# https://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/reference_proteomes/Eukaryota/UP000001542/UP000001542_5722.fasta.gz
# Grab the first fasta sequence as "example_sequence.fasta"
listF <- list.files("~/Downloads/Trich_example", full.names = TRUE)
listF
#> [1] "~/Downloads/Trich_example/UP000001542_5722.fasta"
#> [2] "~/Downloads/Trich_example/example_sequence.fasta"
makeblastdb(file = "~/Downloads/Trich_example/UP000001542_5722.fasta", dbtype = "prot")
bl <- blast("~/Downloads/Trich_example/UP000001542_5722.fasta", type = "blastp")
seq <- readAAStringSet("~/Downloads/Trich_example/example_sequence.fasta")
cl <- predict(bl, seq)
cl
#> QueryID SubjectID Perc.Ident Alignment.Length
#> 1 Example_sequence_1 tr|A2D8A1|A2D8A1_TRIVA 100.000 694
#> 2 Example_sequence_1 tr|A2E4L0|A2E4L0_TRIVA 64.553 694
#> 3 Example_sequence_1 tr|A2E4L0|A2E4L0_TRIVA 32.436 669
#> 4 Example_sequence_1 tr|A2D899|A2D899_TRIVA 64.344 488
#> 5 Example_sequence_1 tr|A2D899|A2D899_TRIVA 31.004 458
#> 6 Example_sequence_1 tr|A2D899|A2D899_TRIVA 27.070 314
#> 7 Example_sequence_1 tr|A2D898|A2D898_TRIVA 54.915 468
#> 8 Example_sequence_1 tr|A2D898|A2D898_TRIVA 33.691 653
#> 9 Example_sequence_1 tr|A2D898|A2D898_TRIVA 32.936 671
#> 10 Example_sequence_1 tr|A2D898|A2D898_TRIVA 29.969 654
#> 11 Example_sequence_1 tr|A2D898|A2D898_TRIVA 26.694 487
#> 12 Example_sequence_1 tr|A2D898|A2D898_TRIVA 25.000 464
#> 13 Example_sequence_1 tr|A2F4I3|A2F4I3_TRIVA 39.106 716
#> 14 Example_sequence_1 tr|A2F4I3|A2F4I3_TRIVA 30.724 677
#> 15 Example_sequence_1 tr|A2F4I3|A2F4I3_TRIVA 29.257 417
#> 16 Example_sequence_1 tr|A2F4I3|A2F4I3_TRIVA 23.438 640
#> 17 Example_sequence_1 tr|A2F4I3|A2F4I3_TRIVA 22.981 718
#> 18 Example_sequence_1 tr|A2F4I3|A2F4I3_TRIVA 24.107 112
#> 19 Example_sequence_1 tr|A2FI39|A2FI39_TRIVA 33.378 740
#> 20 Example_sequence_1 tr|A2FI39|A2FI39_TRIVA 31.440 722
#> Mismatches Gap.Openings Q.start Q.end S.start S.end E Bits
#> 1 0 0 1 694 1 694 0.00e+00 1402.0
#> 2 243 2 1 692 163 855 0.00e+00 920.0
#> 3 410 15 22 671 1 646 3.02e-94 312.0
#> 4 173 1 205 692 1 487 0.00e+00 644.0
#> 5 308 7 22 476 1 453 3.55e-55 198.0
#> 6 196 5 13 294 173 485 4.12e-25 110.0
#> 7 211 0 1 468 683 1150 8.48e-169 514.0
#> 8 420 11 2 647 501 1147 1.61e-91 309.0
#> 9 396 10 2 666 363 985 5.78e-89 301.0
#> 10 406 11 16 664 195 801 1.01e-66 238.0
#> 11 297 10 208 662 21 479 1.60e-36 147.0
#> 12 316 7 11 469 29 465 3.04e-36 147.0
#> 13 386 4 2 667 248 963 1.72e-149 461.0
#> 14 411 10 2 625 66 737 8.34e-83 283.0
#> 15 286 5 129 542 14 424 2.66e-52 196.0
#> 16 421 15 5 607 365 972 3.07e-38 152.0
#> 17 407 21 77 662 27 730 1.25e-33 138.0
#> 18 81 3 552 661 3 112 2.10e-01 35.4
#> 19 421 9 3 675 394 1128 1.12e-115 375.0
#> 20 409 15 2 647 163 874 1.21e-82 285.0
...
Created on 2021-09-30 by the reprex package (v2.0.1)