rfastadna-sequence

Collapse a list of DNAstringsets into a single DNAStingset in order to apply writeXStringSet() and turn it into fasta file in R


Using R for bioinformatics here: I have a list of DNAstringsSets(seen below) and want to use the writeXstringset() function which takes a DNAstringset object as an argument in order to save as a FASTA file.Anyone knows how is it possible to collapse the list of DNAstringsets into a single DNAStringset object and use it as an argument?

$NM_008866
  A DNAStringSet instance of length 13
     width seq                                                        names               
 [1]   693 ATGTGCGGCAACAACATGTCCGCTCCGA...GATAAGCTCCTACCTCCAATTGATTGA NM_008866
 [2]    72 ATGGATGGGCAGAAGCCTTTGCAGGTAT...AATACATCTGTCCACATGCCCCTGTGA NM_008866
 [3]   114 ATGGGCAGAAGCCTTTGCAGGTATCAAA...GAATATGGCTATGCCTTCTTGGTTTGA NM_008866
 [4]   213 ATGGCATTCCTTCTAACAGGATTATTTT...AGTGCCATGGAGATTGTGACCCTTTAG NM_008866
 [5]    63 ATGTCAAGCACTTCATTGATAAGCTCCT...TTGATTGACATCACTAAGAGGCCTTGA NM_008866
 ...   ... ...
 [9]   219 ATGGCCCTTCTATTGGGAGACCAGGCTT...CAGAGGCAGGCGGATCTCTGTCAATAG NM_008866
[10]   144 ATGTTATGCTTAAAACCAAATACTGTTC...CAGTCTCCTGTACAAATATTAAAATAA NM_008866
[11]    78 ATGTTGCAAAAATTATGGTTATTTCTGA...CCAACCAACCAAGAAGCACCTTTATAA NM_008866
[12]    75 ATGGTTATTTCTGAACGGTTGCTTTTCT...AGAAGCACCTTTATAAACAGGTGCTAA NM_008866
[13]    90 ATGTCTGGATTTAAAACAATTTCAAACA...AATTTACTTCAGTTATTCTATCTGTAA

$NM_001159750
  A DNAStringSet instance of length 9
   width seq                                                         names               
[1]   903 ATGGAGGACGAGGTGGTTCGCATTGCCA...ATGTGGAAATCGGTGGAAGTTCTGTTGA NM_001159750
[2]   105 ATGGACCATCAACTGATAAAGACCCTGA...AGAGAAGAAAGTTCCAGCAGCAATGTAA NM_001159750
[3]    75 ATGAGACAAATGCTCGAGATACATATGT...CCAAGCACTTCTGATTCTGTGCGATTAA NM_001159750
[4]    75 ATGATTATGTTGCAATTGGAGCTGATGA...ATTGAGGAAGCTATATATCAAGAAATAA NM_001159750
[5]   129 ATGAATGTGGAAATCGGTGGAAGTTCTG...GCCAGGCAACTCGTTTCCTTGCAAGTGA NM_001159750
[6]    63 ATGTGGAAATCGGTGGAAGTTCTGTTGA...AGAATTGGCAAAGTATCTGGACCATTAA NM_001159750
[7]   102 ATGTGTCCCACTTGTTTTGCTAGTAATA...TATAGTAAAGGCCACTTTTATAAATTAA NM_001159750
[8]   102 ATGGAAAACAATATGTCCATGTTAAAAG...CGGGAGGCAGAGGCAGGCGGATTTCTGA NM_001159750
[9]    75 ATGGATAATTTCTGTCACTTTAAAAATA...TAGTTTAAAAGTAATAAGGTTAAAATAG NM_001159750

$NM_011541
  A DNAStringSet instance of length 9
    width seq                                                         names               
[1]   906 ATGGAGGACGAGGTGGTTCGCATTGCCA...ATGTGGAAATCGGTGGAAGTTCTGTTGA NM_011541
[2]   108 ATGGACCATCAACTGATAAAGACCCTGA...GAAGAAAGTAGTTCCAGCAGCAATGTAA NM_011541
[3]    75 ATGAGACAAATGCTCGAGATACATATGT...CCAAGCACTTCTGATTCTGTGCGATTAA NM_011541
[4]    75 ATGATTATGTTGCAATTGGAGCTGATGA...ATTGAGGAAGCTATATATCAAGAAATAA NM_011541
[5]   129 ATGAATGTGGAAATCGGTGGAAGTTCTG...GCCAGGCAACTCGTTTCCTTGCAAGTGA NM_011541
[6]    63 ATGTGGAAATCGGTGGAAGTTCTGTTGA...AGAATTGGCAAAGTATCTGGACCATTAA NM_011541
[7]   102 ATGTGTCCCACTTGTTTTGCTAGTAATA...TATAGTAAAGGCCACTTTTATAAATTAA NM_011541
[8]   102 ATGGAAAACAATATGTCCATGTTAAAAG...CGGGAGGCAGAGGCAGGCGGATTTCTGA NM_011541
[9]    75 ATGGATAATTTCTGTCACTTTAAAAATA...TAGTTTAAAAGTAATAAGGTTAAAATAG NM_011541

Solution

  • A very minimal reproducible example. Interestingly, this will not work if each element of the list has a name (i.e. just returns the same list). Make sure that names(dna_list) <- NULL. I am unsure of the specific reason for this, perhaps someone else may know and would care to comment.

    require(Biostrings)
    x0 <- DNAStringSet(c("CTCCCAGTAT", "TTCCCGA", "TACCTAGAG"))
    x1 <- DNAStringSet(c("AGGTCGT", "GTCAGTGGTCCCC", "CATTTTAGG"))
    x2 <- DNAStringSet(c("TGCTAGCTA", "AGTCTTGC", "AGCTTTCGAG"))
    dna_list <- list(x0, x1, x2)
    > dna_list
    [[1]]
      A DNAStringSet instance of length 3
        width seq
    [1]    10 CTCCCAGTAT
    [2]     7 TTCCCGA
    [3]     9 TACCTAGAG
    
    [[2]]
      A DNAStringSet instance of length 3
        width seq
    [1]     7 AGGTCGT
    [2]    13 GTCAGTGGTCCCC
    [3]     9 CATTTTAGG
    
    [[3]]
      A DNAStringSet instance of length 3
        width seq
    [1]     9 TGCTAGCTA
    [2]     8 AGTCTTGC
    [3]    10 AGCTTTCGAG
    
    do.call(c, dna_list)
    > do.call(c, dna_list)
      A DNAStringSet instance of length 9
        width seq
    [1]    10 CTCCCAGTAT
    [2]     7 TTCCCGA
    [3]     9 TACCTAGAG
    [4]     7 AGGTCGT
    [5]    13 GTCAGTGGTCCCC
    [6]     9 CATTTTAGG
    [7]     9 TGCTAGCTA
    [8]     8 AGTCTTGC
    [9]    10 AGCTTTCGAG