Wikipedia Equation Embeddings

Datasets Github

12 Jan 2019



The Wikipedia Equation Embeddings dataset includes two datasets that each contain word2vec embeddings generated from LaTex equations extracted from Wikipedia statistics and math articles, along with metadata for the articles themselves. You can see the in each of the math and statistics subfolders for information on generation and using the data.


The datasets are both provided via the Github repository:

git clone


Here are some interesting questions these datasets might help answer:

Other questions?

If you have other questions, or want help for your project, please don’t hesitate to open an issue. If you use any of the datasets in your work, please remember to include the doi.