2016年3月16日 星期三

Iterative Quantization: A Procrustean Approach to Learning Binary Codes

      We usually use d-dimension vector to represent an image, but if we consider the usage of memory, we hope to use smaller dimension binary vector to represent an image. But when we do this way, we may have some error, the error is better to be smaller. And the other way, for a new image, the algorithm should be efficient to calculate the binary representation vector.
    We hope that each bit contains more information, means that the variance should be high for each bit, and the inner-product for each pair-wise binary code, so we have the following formula:
    If we set the distribution of a bit to be half 1s and half 0s, we can easily find the maximum, so we rewrite the formula as follow:
    The objective function is the same as what PCA does, so we take the first c eigenvectors of XTX to be the initial W.
    In PCA, the variance for each component is not the same, so we want to rotate the components generated by PCA, makes the variance closer. Here is the image illustrating this idea: 
    With rotation, the quantization error in (c) is the smallest. We rewrite the objective function as follows:
    Where V=XW, B is the encoding result, and R is the rotation matrix, here we proposed a method called ITQ, for each iteration, two steps are implemented, (1)fix R and update B, (2)fix B, update R, with higher iterations, quantization error gets smaller, here is the experiment result:


2016年3月15日 星期二

Aggregating local descriptors into a compact image representation

    This paper focuses on searching images from a large dataset (10 million images or more). But here we have three parts to optimize.

  1. The representation of an image
  2. The dimension reduction of these representation vectors.
  3. The indexing algorithm.

    In this paper, it uses a descriptor called VLAD, which is derived from Bag-of-Words and Fisher Kernel, then aggregates SIFT descriptors.
    In the second part, it uses asymmetric distance computation to do approximate nearest neighbors search. Then it uses PCA to do dimension reduction.
   We mainly focus on part 1 and part 2. And here is the flow chart[1] to describe what this paper does.





     [1] http://users.auth.gr/espyromi/publications/slides/spyromitrosWIAMIS2012slides.pdf

Online Dictionary Learning for Sparse Coding

     When we write a sentence, we would like to use less words to express what we want to say. In sparse coding, if we have some vectors (called D), we are likely to do simple linear combination of these vectors to represent an image.
    Based on our goal, we have the following cost function:

    The loss function acts like a distance function, we should use D to represent images but don't want lose much information. Suppose the each x is m-dimensional vector, then D is a m*k-dimensional matrix, where k is smaller than the numbers of images.(e.g. k=200, n=100,000)
    The detail of the cost function is as follow:
    The second term is the regularization term, if we decrease alpha, make D larger, we can minimize the second term, but it helps nothing. We hope that we have the simplest alpha, so we rewrite the object function:  
    The above formula limits the size for each column of D, and the cost function follows the limitation:
    
    In the paper, we use online learning, which means we add an images each iteration, then update alpha and D, here is the algorithm:
     In the procedure of updating D, we use D_(t-1) as the initial point, here is the algorithm of updating D.
    The length for each column of D would be less than 1, which follows our constraint. Here we discuss some conditions we may counter:
    (1) In real data, we may have fewer images, which means we may select the same image in different iterations, suppose at time t0 we draw an image, at time t we draw the same image again, here is the way we update A:
    We discard the information of time t0. In the implementation, we hardly memory the history of images drawed.
   (2) Here we use a batch of images to update:
   (3) Some columns of D contain nothing, we should discard the columns when we update D.