2016年5月19日 星期四

Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding

Introduction:

DNN is powerful in computer vision, such as image classifier, detection. But its advantage is that it has too much parameters. If we can discard many parameters but keep the performance, we can save more memory usage. In this paper, we describe 3 stages to remove parameters.

Here is the brief summary of this method:


Stage 1: Pruning

If a weight is below a threshold, we set it to zero. Now we only need to keep the position of the weights which is not zero. Instead of keeping the absolute position, we store the difference between consecutive addresses. Here is the illustration:
Pruning reduced the number of parameters by 9× and 13× for AlexNet and VGG-16 model.

Stage 2: Quantization and weight sharing

Now we cluster the weights into different clusters. In this figure, instead of using 32-bit floating number to store a weight, we now only use 2-bit index, since we have only 4 clusters.  


Stage 3: Huffman Coding

The basic idea of Huffman Coding is that using few bits to represent high-frequency things.

Experiment:

1. Parameters reduced but retain the loss.


2. Statistics about compressing AlexNet


3. Speedup

Text Understanding from Scratch



Introduction:

When we process sentences, some NLP models extract semantic level feature, like word2vec or N-gram models. In this work, it encodes sentences to character-level feature, which performs better than the former feature.

ConvNet

The main component of ConvNet is convolutional module, which computes a 1-D convolution between input and output.

The idea is briefly illustrated in this figure.

Character Quantization

Given a sentence, we quantize each character using 1-of-m encoding, where m is the number of alphabets. It's very simple but it works like Braille, which helps blind people reading.

If a sentence is longer than L characters, we remove those exceeding characters.
We use

Model Design

We design two ConvNets, which both have 6 convolutional layers and 3 fc layers.
The difference is the frame size. Here is the illustration of model.


Data Augmentation

The size of text data is always annoying. We need to do data augmentation if we have no sufficient data. Here we replace some words in a sentence with their synonyms.

Dataset

Here we use 5 datasets to evaluate our method.

(1) DBpedia, which has 14 classes, 560K training , 70K testing
(2) Amazon reviews, which has 5 classes, 3M training and 650K testing.
(3) Yahoo! Answers, which has 10 classes, 1.4M training, 60K testing.
(4) AG's news corpus, which has 4 classes, 120K training, 7.6K testing.
(5) Sogou News, which has 5 classes, 360K training, 60K testing.

Here we show these the result of (1):


Here we show these the result of (2):


Here we show these the result of (3):


Here we show these the result of (4):


Here we show these the result of (5):


2016年5月12日 星期四

DeepFace: Closing the Gap to Human-Level Performance in Face Verification

Introduction:
Most of works about face recognition is made up of 4 stages, detect, align, represent, classify. In the paper, we focus on the stage of detection and alignment, based on this method, we get the performance which is better than the state-of-the-art method and close to human-level performance.

Face Alignment:
The pipeline is briefly introduced as follows:

(a) Use 6 base points to bound face.
(b) Use another 67 points to get 3D shape face.


Feature Representation:
The frontalized crop will be the input of the following DNN architecture.



Experiment:

Dataset:

Social Face Classification (SFC), 4.4M images, 4030 people.
Labeled Face in the Wild (LFW), 13.2K images, 5749 people.
Youtube Face (YTF), 3425 Youtube videos, 1595 subjects.

Result:
DeepFace can beat state-of-the-art method and be close to human-level performance.