# Wikipedia Equation Embeddings

12 Jan 2019
## Summary

The Wikipedia Equation Embeddings dataset includes two datasets
that each contain word2vec embeddings generated from LaTex equations extracted
from Wikipedia statistics and math articles, along with metadata for the articles
themselves. You can see the README.md in
each of the math
and statistics
subfolders for information on generation and using the data.

### Download

The datasets are both provided via the Github repository:

```
git clone https://www.github.com/vsoch/wikipedia-equations
wget https://github.com/vsoch/wikipedia-equations/archive/0.0.1.zip
wget https://github.com/vsoch/wikipedia-equations/archive/0.0.1.tar.gz
```

### Questions

Here are some interesting questions these datasets might help answer:

- What domains of math are more strongly associated with different domains (of science? something else?)
- Can you predict equations from terms? Meaning, you could create a search engine where a user searches for a term, and gets back equations that are associated?
- Can you predict terms from equations? Meaning, you can take a new equation, and generate terms that describe how it’s been used or described?

### Other questions?

If you have other questions, or want help for your project, please don’t hesitate to open an issue. If you use any of the datasets in your work,
please remember to include the doi.