This morning I was trying to link some of my genes to pathways, so of course my first stop was KEGG.

What in the world is KEGG? Kegg is a database of pathways. The pathway identifiers for human are numbered and prefixed with hsa, for example, here is one. So if I want to get a link between a gene and this hsa identifier, I can do a REST call like this: http://rest.kegg.jp/link/pathway/hsa:100101267

That’s great, so I wrote a script to do this in R, for a pretty reasonable set of genes (about 800). This is when I felt like this:

cat

It made it 200 something the first time. Then R froze. Then it only made it to 38. I will not have any of this, KEGG database! So I downloaded the entirety (with other methods) and parsed the entire thing into an R object. You can download from one of my repositories.

kegg.Rda

Here are details:


KEGG database with 30739 entries, downloaded 9/22/2014/n gene2hsa: maps gene names to hsa identifiers
gene2path: maps gene symbols to pathway descriptions
hsa2gene: maps hsa pathway identifiers to gene symbols
hsa2path: maps hsa identifiers to pathways.
questions: email vsochat [at] stanford [.] edu

My diabolical lab mate also just pointed me to a much better resource:

http://www.pathwaycommons.org/about/

Awesome!




Suggested Citation:
Sochat, Vanessa. "KEGG Human (hsa) Database for R." @vsoch (blog), 22 Sep 2014, https://vsoch.github.io/2014/kegg-human-hsa-database-for-r/ (accessed 28 Nov 24).