The task can be simply described by this figure:
If we are given the following clues to figure out which category an image belong to, how will we do?
(a) There are C classes, K themes, T codewords.
(b) For each class, we know that some of themes are likely seen.
(c) Given a class and some themes, it's easily to describe an image.
So we can do the following procedure:
(a) Select a class c based on a probability distribution, called 'eta'.
(b) Then select some themes which is likely to be seen given class c, also based on distribution 'sita'.
(c) Using the given materials, then we choose some words from codebook to describe an image.
The description of an image can be also drawing from a distribution called 'beta'.
So here is the illustration of the procedure:
Dataset
The dataset contains 13 categories and about 3.7K images.
The following two figures describes (a) the accuracy for each category (b) the size of each distribution




沒有留言:
張貼留言