The Wikipedia Equation Embeddings dataset includes two datasets that each contain word2vec embeddings generated from LaTex equations extracted from Wikipedia statistics and math articles, along with metadata for the articles themselves. You can see the README.md in each of the math and statistics subfolders for information on generation and using the data.
The datasets are both provided via the Github repository:
git clone https://www.github.com/vsoch/wikipedia-equations
wget https://github.com/vsoch/wikipedia-equations/archive/0.0.1.zip
wget https://github.com/vsoch/wikipedia-equations/archive/0.0.1.tar.gz
Here are some interesting questions these datasets might help answer:
If you have other questions, or want help for your project, please don’t hesitate to open an issue. If you use any of the datasets in your work, please remember to include the doi.