This morning I was trying to link some of my genes to pathways, so of course my first stop was KEGG.

What in the world is KEGG? Kegg is a database of pathways. The pathway identifiers for human are numbered and prefixed with hsa, for example, here is one. So if I want to get a link between a gene and this hsa identifier, I can do a REST call like this:

That’s great, so I wrote a script to do this in R, for a pretty reasonable set of genes (about 800). This is when I felt like this:


It made it 200 something the first time. Then R froze. Then it only made it to 38. I will not have any of this, KEGG database! So I downloaded the entirety (with other methods) and parsed the entire thing into an R object. You can download from one of my repositories.


Here are details:

KEGG database with 30739 entries, downloaded 9/22/2014/n gene2hsa: maps gene names to hsa identifiers
gene2path: maps gene symbols to pathway descriptions
hsa2gene: maps hsa pathway identifiers to gene symbols
hsa2path: maps hsa identifiers to pathways.
questions: email vsochat [at] stanford [.] edu

My diabolical lab mate also just pointed me to a much better resource:


Suggested Citation:
Sochat, Vanessa. "KEGG Human (hsa) Database for R." @vsoch (blog), 22 Sep 2014, (accessed 28 Nov 22).