【論文読み会】Deep Clustering for Unsupervised Learning of Visual Features

©2018 ARISE analytics
2018/09/21
担当: 堀越
Deep Clustering for Unsupervised Learning of
Visual Features

©2018 ARISE analytics 2
概要
タイトル: Deep Clustering for Unsupervised Learning of Visual Features
著者: Mathilde Caron, Piotr Bojanowski, Armand Joulin, Matthijs Douze
https://arxiv.org/abs/1807.05520
一言で言うと
すごい点
感想
CNNで特徴抽出してK-meansを繰り返す
教師なしでもCNNで特徴抽出ができる
DeepClusterを使って教師なしで pre-training → 少量の画像で
fine-tuning というやり方は現実の問題でも使えそう
モチベーション学習データへのラベルづけが大変なので教師なしでやりたい

Motivation
背景:
CNNは画像認識において非常に重要な技術になっている。その中で、オープンな大規模データセットで
あるImageNetは非常に大きな役割を果たした。
しかしながら、近年では様々な新手法が提案されているにも関わらず、そのパフォーマンスは上げどまっ
ている。ImageNetでは、SOTAな手法のパフォーマンスを評価しきれていないのではないか？
目的:
ImageNetより大きなデータセットを、コストをかけずに作りたい。

ImageNetとは
ImageNet
- 画像1,400万枚、2万クラス以上
ILSVRC2012
ImageNetのサブセット
- 1,000 クラス
- 学習用データ: 120万枚
- 検証用データ: 5万枚
- 評価用データ: 10万枚

ImageNetの限界
ConvNets and ImageNet Beyond Accuracy: Understanding Mistakes and
Uncovering Biases (Stock et al. 2017)
Top: Performance evolution of various
CNN architectures on ImageNet.
Bottom: Some images sampled from
the Internet and misclassified by a
ResNet-101.
Some test samples misclassified by a ResNet-101 (first row)
and a Densenet-161(second row).
The predicted class is indicated in red, the ground truth in
black and in parenthesis. All those examples gathered more
than four (4 or 5) positive answers over 5 on AMT. Note that
no adversarial noise has been added to the images.

Deep Learningを用いたクラスタリング手法
（既存研究）

Deep Learningを用いたクラスタリング手法
COIL20: Columbia University Image Library
Clustering with Deep Learning: Taxonomy and New Methods (Aljalbout et al.
2018)
Modified in red
(Aljalbout et al. 2018)
Splitting GAN
(Grinblat et al. 2017)
GAN
Critic
output
Wasserstein
loss
k-Means
loss
Pretraining
and
fine tuning
k-Means

Encoder / Decoderを使う方法
Unsupervised Deep Embedding for Clustering Analysis (Xie et al. 2015)
Encoder / Decoderで学習した表現について、クラスタのKL情報量を最小にするようクラスタリングす
る
Deep Embedded Clustering (DEC)

クラスタリングのステップをDLへ組み込む方法
Joint Unsupervised Learning of Deep Representations and Image Clusters
(Yang et al. 2016)
階層的クラスタリングの併合を周期的プロセスと見立てて、RCNNを学習する
Joint Unsupervised Learning (JULE)

CNNの表現を使う方法
CNN-Based Joint Clustering and Representation Learning with Feature Drift
Compensation for Large-Scale Image Data (Hsu et al. 2016)
ミニバッチ単位で、CNNの学習とk-meansを繰り返す

GANを使う方法
Class-Splitting Generative Adversarial Networks (Grinblat et al. 2017)
Critic (Discriminator)の最終層の表現をクラスタリングし、クラスタのラベルでGeneratorを学習さ
せる

Deep Clustering for Unsupervised Learning of
Visual Features

概要
学習データとなる画像をCNNで教師なし学習させ、その最終層の表現をクラスタリングする。学習の際
の正解データとして、前Epochでのクラスタのラベルを用いる (pseudo-labeling)

定式化
• Given a training set X = {x1, x2, ..., xn} of N images.
• each image xn is associated with a label yn in {0, 1}^k.
• This label represents the image’s membership to one of k possible
predefined classes.
• Fθ is a convnet mapping, where θ is the set of corresponding parameters.
• The features fθ(xn) produced by the convnet, and clusters them into k
distinct groups based on a geometric criterion.
• It jointly learns a d*k centroid matrix C and the cluster assignments yn
of each image n by solving the following problem.

実装
- Standard AlexNet architecture
- Five convolutional layers with 96, 256, 384, 384 and 256 filters.
- Three fully connected layers.
- Remove the Local Response Normalization layers and use batch
normalization.
- For the clustering, features are PCA-reduced to 256 dimensions, whitened
and l2-normalized.
Image Transformation
- Sobel Filtering
Data Augmentation
- Random horizontal flips
- Crops of random sizes
and aspect ratios
Preprocessing CNN Clustering
http://nocotan.github.io/chainer/2017/08/04/chainercnn-copy.html
PCA
256
k-meansAlexNet

実装上の工夫: 前処理
オブジェクトの分類には色よりもエッジの情報が重要だが、生の画像で学習させると最初のレイヤーが色
情報を抽出してしまう。
→画像にSobel Filterをかけ、エッジを抽出しておく
Filters from the first layer of an
AlexNet trained on unsupervised
ImageNet on raw RGB input (left) or
after a Sobel filtering (right).

実装上の工夫: クラスタリング
Empty clusters:
• 空のクラスタができた場合、空でないクラスタをランダムに選び、そのセントロイドをわずかにずらして二
つのクラスタを作る
Trivial parametrization:
• クラスタの偏りを防ぐため、学習データを pseudo-labelのが一様分布からサンプリングする
or
• 損失関数をクラスタの大きさの逆数で重み付けする

評価
a) クラスタと真のラベルとの相互情報量
b) あるEpochと直前のEpochのクラスタの相互情報量
c) クラスタ数 k の影響

評価
a) クラスタと真のラベルとの相互情報量
真のラベルとクラスタのラベルがどの程度一致しているかを示す。真のラベルは学習時には使っていない
が、学習が進むにつれクラスタのラベルは真のラベルに近づいている

評価
b) あるEpochと直前のEpochのクラスタの相互情報量
Epoch ごとにCNNを学習→k-meansでクラスタリングしたラベルで再学習を繰り返すため、クラスタの
中身は順次入れ替わっていく。Epochが進んでいくと、クラスタの中身の入れ替わりが少なくなっていく
= クラスタが安定していく。
とはいえ、0.8程度で上げどまるため、少なくない割合の画像はEpochのたびに別のクラスタに割り当て
られることになる。もっとも、実用上はこれでもそれほど問題なく、単一のモデルに収束していく (ホン
ト？)

評価
c) クラスタ数 k の影響
クラスタ数 k を対数スケールで変更した時、ImageNetで300Epoch学習した後、別のデータセット*
でのmAPを調べる。最良のパフォーマンスは k=10,000の時に得られた。
真のクラス数(1,000)よりもある程度大きなクラスタ数を選ぶのが良さそう
* Pascal VOC 2007のバリデーションセット, 20クラス

Q&A
• そもそも学習できるの？
• 処理時間はどれくらい？
• どの程度良い表現を学習している？
• 別のデータセットでも使える？
• 別のモデルでも使える？
• 別のクラスタリングアルゴリズムは使える？
• 他のタスクにも使える？

そもそも学習できるの？
A. できる
Deep Learningでは、学習データのラベルをランダム化しても、訓練誤差が0にできることが知られてい
る。同じように、ランダムなセントロイドによるクラスタを初期値として学習を始めても、中間層では徐々
に適切な表現を学習していく？
Understanding Deep Learning Requires Rethinking Generalization (Zhang et al.
2016)
Randomization tests.
...we train several standard architectures on a copy of the data where the
true labels were replaced by random labels. Our central finding can be
summarized as:
Deep neural networks easily fit random labels.
More precisely, when trained on a completely random labeling of the true
data, neural networks achieve 0 training error.

処理時間はどれくらい？
A. 以下の環境で12日くらい
- 500エポック
- Pascal P100
- 市場価格100万円くらい？
全体の1/3くらいは k-means にかかる時間
→クラスタリングの際に全データをForwardする必要があるため、、通常の学習の1.5倍以上の時間が
かかる？

どの程度良い表現を学習している？
A. CNNの各層で、他の教師なしと比べて良い表現を学習している
評価方法:
教師なしでpre-trainingしたCNNの各層について、それぞれの直後に線形分類器をおいてfine-
tuningした場合のaccuracyを評価 (パフォーマンスがよければ、その層はより良い表現を学習できて
いるはず)
MIT Places database
http://places.csail.mit.edu/

深いレイヤーほどより大きな特徴を捉えている。が、畳み込みの最終層のフィルタのいくつかは、それまで
の層で捉えた特徴を捉え直しているだけに見えるものもある (下段)

最終層のフィルタを見ると、あるフィルタは何らかのクラス、もしくはパターンに対応していそう

別のデータセットでも使える？
A. 使える
評価方法:
ImageNetは各クラスの画像数が均等になっており、DeepClusterに有利なデータである。
この影響を検証するため、YFCC100M*からランダムに選択した100万枚の画像について、その
accuracyを検証した
Yahoo Flickr Creative Commons 100 Millionデータセット。クラスに大きな偏りがある

別のモデルでも使える？
A. 使える
教師ありの場合と同じように、より深いモデルを使うとパフォーマンスは向上する
評価方法:
ImageNetのデータを利用して教師なしでpre-trainingしたモデルを元に、PASCAL VOC 2007
のデータでfine-tuningした際のmAPを評価

別のクラスタリングアルゴリズムは使える？
A. 使える
PIC (Power Iteration Clustering) を試したところ、大規模なデータセットについてパフォーマンス
がよかった
評価方法:
行のデータを利用して教師なしでpre-trainingしたモデルを元に、列のデータでfine-tuningした際の
accuracyを評価

他のタスクにも使える？
A. 使える
評価方法:
ImageNetを利用して教師なしで学習したモデルを元に、列のデータで画像検索した際のmAPを評
価
http://www.robots.ox.ac.uk/~vgg/data/oxbuildings/
http://www.robots.ox.ac.uk/~vgg/data/parisbuildings/

まとめ
タイトル: Deep Clustering for Unsupervised Learning of Visual Features
著者: Mathilde Caron, Piotr Bojanowski, Armand Joulin, Matthijs Douze
一言で言うと
すごい点
感想
CNNで特徴抽出してK-meansを繰り返す
教師なしでもCNNで特徴抽出ができる
DeepClusterを使って教師なしで pre-training → 少量の画像で
fine-tuning というやり方は現実の問題でも使えそう
モチベーション学習データへのラベルづけが大変なので教師なしでやりたい

GitHub実装
https://github.com/facebookresearch/deepcluster

【論文読み会】Deep Clustering for Unsupervised Learning of Visual Features

More Related Content

What's hot

Similar to 【論文読み会】Deep Clustering for Unsupervised Learning of Visual Features

More from ARISE analytics

【論文読み会】Deep Clustering for Unsupervised Learning of Visual Features