I want a way to extract the namespace associated with each GO term from files in the OBO format:
I have tried using a obo converter to txt file https://github.com/PNNL-Comp-Mass-Spec/OBO-Data-Converter but I cannot. I have also tried using the R package "ontologyIndex" but nothing works. Any help is more than welcome
Making some assumptions
[Term] | |
---|---|
id: | GO:0000070 |
name: | mitotic sister chromatid segregation |
namespace: | biological_process |
alt_id: | GO:0016359 |
def: | "The cell cycle process in which replicated homologous chromosomes are organized and then physically separated and apportioned to two sets during the mitotic cell cycle. Each replicated chromosome, composed of two sister chromatids, aligns at the cell equator, paired with its homologous partner. One homolog of each morphologic type goes into each of the resulting chromosome sets." [GOC:ai, GOC:jl] |
subset: | goslim_pombe |
synonym: | "mitotic chromosome segregation" EXACT [] |
synonym: | "mitotic sister-chromatid adhesion release" NARROW [] |
is_a: | GO:0000819 ! sister chromatid segregation |
is_a: | GO:1903047 ! mitotic cell cycle process |
relationship: | part_of GO:0140014 ! mitotic nuclear division |
... | ... |
etc
Reading the file
dat <- read.delim("go.obo", header=F)
Getting a data frame format from non-repeated terms
head(
do.call(rbind,
apply(cbind(grep("\\[Term]", dat$V1) + 1,
grep("\\[Term]", dat$V1) - 1 +
diff(c(grep("\\[Term]", dat$V1), nrow(dat)))), 1, \(x){
res <- data.table::tstrsplit(dat[x[1]:x[2],], ": ")
id <- grep("^id|name|namespace|def|is_obso", res[[1]])
`length<-`(as.list(res[[2]][id]), 5)})) |>
data.frame() |>
setNames(c("id", "name", "namespace", "definition", "is_obsolete")))
output
id name
1 GO:0000001 mitochondrion inheritance
2 GO:0000002 mitochondrial genome maintenance
3 GO:0000003 obsolete reproduction
4 GO:0000005 obsolete ribosomal chaperone activity
5 GO:0000006 high-affinity zinc transmembrane transporter activity
6 GO:0000007 low-affinity zinc ion transmembrane transporter activity
namespace
1 biological_process
2 biological_process
3 biological_process
4 molecular_function
5 molecular_function
6 molecular_function
definition
1 The distribution of mitochondria, including the mitochondrial genome, into daughter cells after mitosis or meiosis, mediated by interactions between mitochondria and the cytoskeleton. [GOC:mcc, PMID:10873824, PMID:11389764]
2 The maintenance of the structure and integrity of the mitochondrial genome; includes replication and segregation of the mitochondrial chromosome. [GOC:ai, GOC:vw]
3 OBSOLETE. The production of new individuals that contain some portion of genetic material inherited from one or more parent organisms. [GOC:go_curators, GOC:isa_complete, GOC:jl, ISBN:0198506732]
4 OBSOLETE. Assists in the correct assembly of ribosomes or ribosomal subunits in vivo, but is not a component of the assembled ribosome when performing its normal biological function. [GOC:jl, PMID:12150913]
5 Enables the transfer of zinc ions (Zn2+) from one side of a membrane to the other, probably powered by proton motive force. In high-affinity transport the transporter is able to bind the solute even if it is only present at very low concentrations. [TC:2.A.5.1.1]
6 Enables the transfer of a solute or solutes from one side of a membrane to the other according to the reaction
is_obsolete
1 NULL
2 NULL
3 true
4 true
5 NULL
6 NULL