merge and add links

rromb · rromb · commit 360cd9fa5253 · 2022-07-27T00:18:15.000+02:00
diff --git a/README.md b/README.md
@@ -22,7 +22,7 @@
 ## News
 
 ### July 2022
-- Inference code and model weights to run our [retrieval-augmented diffusion models](https://arxiv.org/abs/2204.11824) are now available. See ##RDM.
+- Inference code and model weights to run our [retrieval-augmented diffusion models](https://arxiv.org/abs/2204.11824) are now available. See [this section](#rdm).
 ### April 2022
 - Thanks to [Katherine Crowson](https://github.com/crowsonkb), classifier-free guidance received a ~2x speedup and the [PLMS sampler](https://arxiv.org/abs/2202.09778) is available. See also [this PR](https://github.com/CompVis/latent-diffusion/pull/51).
 
@@ -49,15 +49,16 @@ If you use any of these models in your work, we are always happy to receive a [c
 ![rdm-figure](assets/rdm-preview.jpg)
 We include inference code to run our retrieval-augmented diffusion models (RDMs) as described in [https://arxiv.org/abs/2204.11824](https://arxiv.org/abs/2204.11824).
 
-To get started, install the following dependencies into the `ldm` conda environment, 
-```bash
-pip install transformers==4.19.2 scann kornia
+
+To get started, install the additionally required python packages into your ldm environment
+```shell script
+pip install transformers==4.19.2 scann kornia==0.6.4
 ```
-and download the weights:
+and download the trained weights:
+
 ```bash
-mkdir -p models/rdm/rdm768x768
-wget -O models/rdm/rdm768x768/model.ckpt TODO
-wget -O models/rdm/rdm768x768/config.yaml TODO
+mkdir models/rdm/rdm768x768/
+wget -O models/rdm/rdm768x768/model.ckpt https://ommer-lab.com/files/rdm/model.ckpt
 ```
 As these models are conditioned on a set of CLIP image embeddings, our RDMs support different inference modes, 
 which are described in the following.
@@ -70,27 +71,45 @@ python scripts/knn2img.py  --prompt "a happy bear reading a newspaper, oil on ca
 ```
 
 #### RDM with text-to-image retrieval
-Download the retrieval-databases which contain the retrieval-datasets (OpenImages and ArtBench) compressed into CLIP image embeddings:
+
+To be able to run a RDM conditioned on a text-prompt and additionally images retrieved from this prompt, you will also need to download the corresponding retrieval database. 
+We provide two distinct databases extracted from the [Openimages-](https://storage.googleapis.com/openimages/web/index.html) and [ArtBench-](https://github.com/liaopeiyuan/artbench) datasets. 
+Interchanging the databases results in different capabilities 
+of the resulting semi-parametric model as visualized below #TODO although the learned weights are the same in both cases. 
+
+Download the retrieval-databases which contain the retrieval-datasets ([Openimages](https://storage.googleapis.com/openimages/web/index.html) (~11GB) and [ArtBench](https://github.com/liaopeiyuan/artbench) (~82MB)) compressed into CLIP image embeddings:
 ```bash
-mkdir -p data/rdm/openimages
-mkdir -p data/rdm/artbench
-wget -O data/rdm/openimages/data.p TODO
-wget -O data/rdm/artbench/data.p TODO
+mkdir -p data/rdm/retrieval_databases
+wget -O data/rdm/retrieval_databases/artbench.zip https://ommer-lab.com/files/rdm/artbench_databases.zip
+wget -O data/rdm/retrieval_databases/openimages.zip https://ommer-lab.com/files/rdm/openimages_database.zip
+unzip data/rdm/retrieval_databases/artbench.zip -d data/rdm/retrieval_databases/
+unzip data/rdm/retrieval_databases/openimages.zip -d data/rdm/retrieval_databases/
 ```
-We also provide trained [ScaNN]()/[faiss]() search indices [here](TODO). Download via
+We also provide trained [ScaNN](https://github.com/google-research/google-research/tree/master/scann) search indices for ArtBench. Download and extract via
 ```bash
-wget -O data/rdm/openimages/searcher.p TODO
-wget -O data/rdm/artbench/searcher TODO
+mkdir -p data/rdm/searchers
+wget -O data/rdm/searchers/artbench.zip https://ommer-lab.com/files/rdm/artbench_searchers.zip
+unzip data/rdm/searchers/openimages.zip -d data/rdm/searchers
 ```
 
+Since the index for OpenImages is large (~21 GB), we provide a script to create and save it for usage during sampling. Note however,
+that sampling with the OpenImages database will not be possible without this index. Run the script via
+```bash
+python scripts/train_searcher.py
+```
+
+After this, retrieval based text-guided sampling with visual nearest neighbors can be started via 
+```
+python scripts/knn2img.py  --prompt "a happy bear reading a newspaper, oil on canvas" --use_neighbors --knn <number_of_neighbors> 
+```
+Note that the maximum supported number of neighbors is 20. The database can be changed via the cmd parameter ``--database`` which can be `[openimages, artbench-art_nouveau, artbench-baroque, artbench-expressionism, artbench-impressionism, artbench-post_impressionism, artbench-realism, artbench-renaissance, artbench-romanticism, artbench-surrealism, artbench-ukiyo_e]`.
 
 
-#### RDM with image-to-image retrieval (maybe?, TODO)
-- simple modification of above section, support image encoding
 
 #### Coming Soon
 - better models
 - more resolutions
+- image-to-image retrieval
 
 ## Text-to-Image
 ![text2img-figure](assets/txt2img-preview.png) 
@@ -323,6 +342,19 @@ Thanks for open-sourcing!
       archivePrefix={arXiv},
       primaryClass={cs.CV}
 }
+
+@misc{https://doi.org/10.48550/arxiv.2204.11824,
+  doi = {10.48550/ARXIV.2204.11824},
+  url = {https://arxiv.org/abs/2204.11824},
+  author = {Blattmann, Andreas and Rombach, Robin and Oktay, Kaan and Ommer, Björn},
+  keywords = {Computer Vision and Pattern Recognition (cs.CV), FOS: Computer and information sciences, FOS: Computer and information sciences},
+  title = {Retrieval-Augmented Diffusion Models},
+  publisher = {arXiv},
+  year = {2022},  
+  copyright = {arXiv.org perpetual, non-exclusive license}
+}
+
+
 ```
 
 
diff --git a/scripts/knn2img.py b/scripts/knn2img.py
@@ -63,8 +63,8 @@ def __init__(self, database, retriever_version='ViT-L/14'):
         assert database in DATABASES
         # self.database = self.load_database(database)
         self.database_name = database
-        self.searcher_savedir = f'models/searchers/{self.database_name}'
-        self.database_path = f'data/retrieval_databases/{self.database_name}'
+        self.searcher_savedir = f'data/rdm/searchers/{self.database_name}'
+        self.database_path = f'data/rdm/retrieval_databases/{self.database_name}'
         self.retriever = self.load_retriever(version=retriever_version)
         self.database = {'embedding': [],
                          'img_id': [],
@@ -287,7 +287,7 @@ def __call__(self, x, n):
     parser.add_argument(
         "--database",
         type=str,
-        default=DATABASES[0],
+        default='artbench-surrealism',
         choices=DATABASES,
         help="The database used for the search, only applied when --use_neighbors=True",
     )
diff --git a/scripts/train_searcher.py b/scripts/train_searcher.py
@@ -0,0 +1,147 @@
+import os, sys
+import numpy as np
+import scann
+import argparse
+import glob
+from multiprocessing import cpu_count
+from tqdm import tqdm
+
+from ldm.util import parallel_data_prefetch
+
+
+def search_bruteforce(searcher):
+    return searcher.score_brute_force().build()
+
+
+def search_partioned_ah(searcher, dims_per_block, aiq_threshold, reorder_k,
+                        partioning_trainsize, num_leaves, num_leaves_to_search):
+    return searcher.tree(num_leaves=num_leaves,
+                         num_leaves_to_search=num_leaves_to_search,
+                         training_sample_size=partioning_trainsize). \
+        score_ah(dims_per_block, anisotropic_quantization_threshold=aiq_threshold).reorder(reorder_k).build()
+
+
+def search_ah(searcher, dims_per_block, aiq_threshold, reorder_k):
+    return searcher.score_ah(dims_per_block, anisotropic_quantization_threshold=aiq_threshold).reorder(
+        reorder_k).build()
+
+def load_datapool(dpath):
+
+
+    def load_single_file(saved_embeddings):
+        compressed = np.load(saved_embeddings)
+        database = {key: compressed[key] for key in compressed.files}
+        return database
+
+    def load_multi_files(data_archive):
+        database = {key: [] for key in data_archive[0].files}
+        for d in tqdm(data_archive, desc=f'Loading datapool from {len(data_archive)} individual files.'):
+            for key in d.files:
+                database[key].append(d[key])
+
+        return database
+
+    print(f'Load saved patch embedding from "{dpath}"')
+    file_content = glob.glob(os.path.join(dpath, '*.npz'))
+
+    if len(file_content) == 1:
+        data_pool = load_single_file(file_content[0])
+    elif len(file_content) > 1:
+        data = [np.load(f) for f in file_content]
+        prefetched_data = parallel_data_prefetch(load_multi_files, data,
+                                                 n_proc=min(len(data), cpu_count()), target_data_type='dict')
+
+        data_pool = {key: np.concatenate([od[key] for od in prefetched_data], axis=1)[0] for key in prefetched_data[0].keys()}
+    else:
+        raise ValueError(f'No npz-files in specified path "{dpath}" is this directory existing?')
+
+    print(f'Finished loading of retrieval database of length {data_pool["embedding"].shape[0]}.')
+    return data_pool
+
+
+def train_searcher(opt,
+                   metric='dot_product',
+                   partioning_trainsize=None,
+                   reorder_k=None,
+                   # todo tune
+                   aiq_thld=0.2,
+                   dims_per_block=2,
+                   num_leaves=None,
+                   num_leaves_to_search=None,):
+
+    data_pool = load_datapool(opt.database)
+    k = opt.knn
+
+    if not reorder_k:
+        reorder_k = 2 * k
+
+    # normalize
+    # embeddings =
+    searcher = scann.scann_ops_pybind.builder(data_pool['embedding'] / np.linalg.norm(data_pool['embedding'], axis=1)[:, np.newaxis], k, metric)
+    pool_size = data_pool['embedding'].shape[0]
+
+    print(*(['#'] * 100))
+    print('Initializing scaNN searcher with the following values:')
+    print(f'k: {k}')
+    print(f'metric: {metric}')
+    print(f'reorder_k: {reorder_k}')
+    print(f'anisotropic_quantization_threshold: {aiq_thld}')
+    print(f'dims_per_block: {dims_per_block}')
+    print(*(['#'] * 100))
+    print('Start training searcher....')
+    print(f'N samples in pool is {pool_size}')
+
+    # this reflects the recommended design choices proposed at
+    # https://github.com/google-research/google-research/blob/aca5f2e44e301af172590bb8e65711f0c9ee0cfd/scann/docs/algorithms.md
+    if pool_size < 2e4:
+        print('Using brute force search.')
+        searcher = search_bruteforce(searcher)
+    elif 2e4 <= pool_size and pool_size < 1e5:
+        print('Using asymmetric hashing search and reordering.')
+        searcher = search_ah(searcher, dims_per_block, aiq_thld, reorder_k)
+    else:
+        print('Using using partioning, asymmetric hashing search and reordering.')
+
+        if not partioning_trainsize:
+            partioning_trainsize = data_pool['embedding'].shape[0] // 10
+        if not num_leaves:
+            num_leaves = int(np.sqrt(pool_size))
+
+        if not num_leaves_to_search:
+            num_leaves_to_search = max(num_leaves // 20, 1)
+
+        print('Partitioning params:')
+        print(f'num_leaves: {num_leaves}')
+        print(f'num_leaves_to_search: {num_leaves_to_search}')
+        # self.searcher = self.search_ah(searcher, dims_per_block, aiq_thld, reorder_k)
+        searcher = search_partioned_ah(searcher, dims_per_block, aiq_thld, reorder_k,
+                                                 partioning_trainsize, num_leaves, num_leaves_to_search)
+
+    print('Finish training searcher')
+    searcher_savedir = opt.target_path
+    os.makedirs(searcher_savedir, exist_ok=True)
+    searcher.serialize(searcher_savedir)
+    print(f'Saved trained searcher under "{searcher_savedir}"')
+
+if __name__ == '__main__':
+    sys.path.append(os.getcwd())
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--database',
+                        '-d',
+                        default='data/rdm/retrieval_databases/openimages',
+                        type=str,
+                        help='path to folder containing the clip feature of the database')
+    parser.add_argument('--target_path',
+                        '-t',
+                        default='data/rdm/searchers/openimages',
+                        type=str,
+                        help='path to the target folder where the searcher shall be stored.')
+    parser.add_argument('--knn',
+                        '-k',
+                        default=20,
+                        type=int,
+                        help='number of nearest neighbors, for which the searcher shall be optimized')
+
+    opt, _  = parser.parse_known_args()
+
+    train_searcher(opt,)