**Vector Quantization** is a method to compress data, and it’s considered a lossy technique because we are creating a new representation for a set of N observations that loses some of our information. This happens because we represent each of our observations based on a “prototype vector.” You can think of it like doing k-means clustering, and then representing each observation vector based on its centroid, and throwing away the raw data. A simple algorithm goes something like this:

- Divide observations into groups having approximately the same number of points closest to them (or I don’t see why you couldn’t use k-means or some variation of that)

Repeat until convergence {

- Define a vector of sensitivity values for each centroid (some small value)
- Pick a sample point at random, find the centroid with the smallest distance - sensitivity
- Move the centroid a little bit toward the sample point
- Set the centroids sensitivity to zero so it will be less likely to be chosen again

}

What does convergence mean? We can use simulated annealing, which broadly lets us find a good global optimum in a search space by temporarily accepting less optimal solutions. We do this by having a function, P, that takes in the energies of two states (a measure of their goodness) and a “temperature” T, that must start off higher (close to 1) when the algorithm starts, and eventually go to zero. This function P needs to be positive when our transition state is more optimal than our current state. When T is large, we are willing to step uphill (because it might be a local optimum!) and when T is small, we only go downhill (we reach the end of our annealing schedule, and are ready to converge to a solution).

In the context of Vector Quantization, using an annealing schedule probably means that we look at training by moving a vector based on one point as a state, and the next state being the next move we make with our next point. We stop adjusting with new points when some metric (the function P) that evaluates the energy of each state (taking into account our annealing schedule) is less than a randomly generated probability. See the pseudocode for more clear explanation.

### Benefits of Vector Quantization

This is cool because you can see how the densities can easily be updated with live data, and you can also see how we could deal with missing data. If we match a new point to the closest centroid (based on the information that we do have), we can then ascribe the average of the missing parameters to the data point.