runixbioinformaticsgeneticsgenomics

Getting namespace from a obo-formatted text file


I want a way to extract the namespace associated with each GO term from files in the OBO format:

obo database

I have tried using a obo converter to txt file https://github.com/PNNL-Comp-Mass-Spec/OBO-Data-Converter but I cannot. I have also tried using the R package "ontologyIndex" but nothing works. Any help is more than welcome


Solution

  • Making some assumptions

    [Term]  
    id: GO:0000070
    name: mitotic sister chromatid segregation
    namespace: biological_process
    alt_id: GO:0016359
    def: "The cell cycle process in which replicated homologous chromosomes are organized and then physically separated and apportioned to two sets during the mitotic cell cycle. Each replicated chromosome, composed of two sister chromatids, aligns at the cell equator, paired with its homologous partner. One homolog of each morphologic type goes into each of the resulting chromosome sets." [GOC:ai, GOC:jl]
    subset: goslim_pombe
    synonym: "mitotic chromosome segregation" EXACT []
    synonym: "mitotic sister-chromatid adhesion release" NARROW []
    is_a: GO:0000819 ! sister chromatid segregation
    is_a: GO:1903047 ! mitotic cell cycle process
    relationship: part_of GO:0140014 ! mitotic nuclear division
    ... ...

    etc

    Reading the file

    dat <- read.delim("go.obo", header=F)
    

    Getting a data frame format from non-repeated terms

    head(
    do.call(rbind, 
      apply(cbind(grep("\\[Term]", dat$V1) + 1, 
                  grep("\\[Term]", dat$V1) - 1 + 
                    diff(c(grep("\\[Term]", dat$V1), nrow(dat)))), 1, \(x){
        res <- data.table::tstrsplit(dat[x[1]:x[2],], ": ")
        id <- grep("^id|name|namespace|def|is_obso", res[[1]])
        `length<-`(as.list(res[[2]][id]), 5)})) |> 
      data.frame() |> 
      setNames(c("id", "name", "namespace", "definition", "is_obsolete")))
    

    output

              id                                                     name
    1 GO:0000001                                mitochondrion inheritance
    2 GO:0000002                         mitochondrial genome maintenance
    3 GO:0000003                                    obsolete reproduction
    4 GO:0000005                    obsolete ribosomal chaperone activity
    5 GO:0000006    high-affinity zinc transmembrane transporter activity
    6 GO:0000007 low-affinity zinc ion transmembrane transporter activity
               namespace
    1 biological_process
    2 biological_process
    3 biological_process
    4 molecular_function
    5 molecular_function
    6 molecular_function
                                                                                                                                                                                                                                                                   definition
    1                                         The distribution of mitochondria, including the mitochondrial genome, into daughter cells after mitosis or meiosis, mediated by interactions between mitochondria and the cytoskeleton. [GOC:mcc, PMID:10873824, PMID:11389764]
    2                                                                                                      The maintenance of the structure and integrity of the mitochondrial genome; includes replication and segregation of the mitochondrial chromosome. [GOC:ai, GOC:vw]
    3                                                                     OBSOLETE. The production of new individuals that contain some portion of genetic material inherited from one or more parent organisms. [GOC:go_curators, GOC:isa_complete, GOC:jl, ISBN:0198506732]
    4                                                          OBSOLETE. Assists in the correct assembly of ribosomes or ribosomal subunits in vivo, but is not a component of the assembled ribosome when performing its normal biological function. [GOC:jl, PMID:12150913]
    5 Enables the transfer of zinc ions (Zn2+) from one side of a membrane to the other, probably powered by proton motive force. In high-affinity transport the transporter is able to bind the solute even if it is only present at very low concentrations. [TC:2.A.5.1.1]
    6                                                                                                                                                          Enables the transfer of a solute or solutes from one side of a membrane to the other according to the reaction
      is_obsolete
    1        NULL
    2        NULL
    3        true
    4        true
    5        NULL
    6        NULL