This morning I was trying to link some of my genes to pathways, so of course my first stop was KEGG.
What in the world is KEGG? Kegg is a database of pathways. The pathway identifiers for human are numbered and prefixed with hsa, for example, here is one. So if I want to get a link between a gene and this hsa identifier, I can do a REST call like this: http://rest.kegg.jp/link/pathway/hsa:100101267
That’s great, so I wrote a script to do this in R, for a pretty reasonable set of genes (about 800). This is when I felt like this:
It made it 200 something the first time. Then R froze. Then it only made it to 38. I will not have any of this, KEGG database! So I downloaded the entirety (with other methods) and parsed the entire thing into an R object. You can download from one of my repositories.
Here are details:
KEGG database with 30739 entries, downloaded 9/22/2014/n gene2hsa: maps gene names to hsa identifiers gene2path: maps gene symbols to pathway descriptions hsa2gene: maps hsa pathway identifiers to gene symbols hsa2path: maps hsa identifiers to pathways. questions: email vsochat [at] stanford [.] edu
My diabolical lab mate also just pointed me to a much better resource:
Sochat, Vanessa. "KEGG Human (hsa) Database for R." @vsoch (blog), 22 Sep 2014, https://vsoch.github.io/2014/kegg-human-hsa-database-for-r/ (accessed 20 Mar 23).