We usually use d-dimension vector to represent an image, but if we consider the usage of memory, we hope to use smaller dimension binary vector to represent an image. But when we do this way, we may have some error, the error is better to be smaller. And the other way, for a new image, the algorithm should be efficient to calculate the binary representation vector.
We hope that each bit contains more information, means that the variance should be high for each bit, and the inner-product for each pair-wise binary code, so we have the following formula:
If we set the distribution of a bit to be half 1s and half 0s, we can easily find the maximum, so we rewrite the formula as follow:
The objective function is the same as what PCA does, so we take the first c eigenvectors of XTX to be the initial W.
In PCA, the variance for each component is not the same, so we want to rotate the components generated by PCA, makes the variance closer. Here is the image illustrating this idea:
With rotation, the quantization error in (c) is the smallest. We rewrite the objective function as follows:
Where V=XW, B is the encoding result, and R is the rotation matrix, here we proposed a method called ITQ, for each iteration, two steps are implemented, (1)fix R and update B, (2)fix B, update R, with higher iterations, quantization error gets smaller, here is the experiment result:





沒有留言:
張貼留言