This morning I was trying to link some of my genes to pathways, so of course my first stop was KEGG.

What in the world is KEGG? Kegg is a database of pathways. The pathway identifiers for human are numbered and prefixed with hsa, for example, here is one. So if I want to get a link between a gene and this hsa identifier, I can do a REST call like this: http://rest.kegg.jp/link/pathway/hsa:100101267

That’s great, so I wrote a script to do this in R, for a pretty reasonable set of genes (about 800). This is when I felt like this:

It made it 200 something the first time. Then R froze. Then it only made it to 38. I will not have any of this, KEGG database! So I downloaded the entirety (with other methods) and parsed the entire thing into an R object. You can download from one of my repositories.

# kegg.Rda

Here are details:


KEGG database with 30739 entries, downloaded 9/22/2014/n gene2hsa: maps gene names to hsa identifiers
gene2path: maps gene symbols to pathway descriptions
hsa2gene: maps hsa pathway identifiers to gene symbols
hsa2path: maps hsa identifiers to pathways.
questions: email vsochat [at] stanford [.] edu



My diabolical lab mate also just pointed me to a much better resource: