You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+50-18Lines changed: 50 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -22,7 +22,7 @@
22
22
## News
23
23
24
24
### July 2022
25
-
- Inference code and model weights to run our [retrieval-augmented diffusion models](https://arxiv.org/abs/2204.11824) are now available. See ##RDM.
25
+
- Inference code and model weights to run our [retrieval-augmented diffusion models](https://arxiv.org/abs/2204.11824) are now available. See [this section](#rdm).
26
26
### April 2022
27
27
- Thanks to [Katherine Crowson](https://github.com/crowsonkb), classifier-free guidance received a ~2x speedup and the [PLMS sampler](https://arxiv.org/abs/2202.09778) is available. See also [this PR](https://github.com/CompVis/latent-diffusion/pull/51).
28
28
@@ -49,15 +49,16 @@ If you use any of these models in your work, we are always happy to receive a [c
49
49

50
50
We include inference code to run our retrieval-augmented diffusion models (RDMs) as described in [https://arxiv.org/abs/2204.11824](https://arxiv.org/abs/2204.11824).
51
51
52
-
To get started, install the following dependencies into the `ldm` conda environment,
53
-
```bash
54
-
pip install transformers==4.19.2 scann kornia
52
+
53
+
To get started, install the additionally required python packages into your ldm environment
As these models are conditioned on a set of CLIP image embeddings, our RDMs support different inference modes,
63
64
which are described in the following.
@@ -70,27 +71,45 @@ python scripts/knn2img.py --prompt "a happy bear reading a newspaper, oil on ca
70
71
```
71
72
72
73
#### RDM with text-to-image retrieval
73
-
Download the retrieval-databases which contain the retrieval-datasets (OpenImages and ArtBench) compressed into CLIP image embeddings:
74
+
75
+
To be able to run a RDM conditioned on a text-prompt and additionally images retrieved from this prompt, you will also need to download the corresponding retrieval database.
76
+
We provide two distinct databases extracted from the [Openimages-](https://storage.googleapis.com/openimages/web/index.html) and [ArtBench-](https://github.com/liaopeiyuan/artbench) datasets.
77
+
Interchanging the databases results in different capabilities
78
+
of the resulting semi-parametric model as visualized below #TODO although the learned weights are the same in both cases.
79
+
80
+
Download the retrieval-databases which contain the retrieval-datasets ([Openimages](https://storage.googleapis.com/openimages/web/index.html) (~11GB) and [ArtBench](https://github.com/liaopeiyuan/artbench) (~82MB)) compressed into CLIP image embeddings:
We also provide trained [ScaNN]()/[faiss]() search indices [here](TODO). Download via
88
+
We also provide trained [ScaNN](https://github.com/google-research/google-research/tree/master/scann) search indices for ArtBench. Download and extract via
Since the index for OpenImages is large (~21 GB), we provide a script to create and save it for usage during sampling. Note however,
96
+
that sampling with the OpenImages database will not be possible without this index. Run the script via
97
+
```bash
98
+
python scripts/train_searcher.py
99
+
```
100
+
101
+
After this, retrieval based text-guided sampling with visual nearest neighbors can be started via
102
+
```
103
+
python scripts/knn2img.py --prompt "a happy bear reading a newspaper, oil on canvas" --use_neighbors --knn <number_of_neighbors>
104
+
```
105
+
Note that the maximum supported number of neighbors is 20. The database can be changed via the cmd parameter ``--database`` which can be `[openimages, artbench-art_nouveau, artbench-baroque, artbench-expressionism, artbench-impressionism, artbench-post_impressionism, artbench-realism, artbench-renaissance, artbench-romanticism, artbench-surrealism, artbench-ukiyo_e]`.
86
106
87
107
88
-
#### RDM with image-to-image retrieval (maybe?, TODO)
89
-
- simple modification of above section, support image encoding
90
108
91
109
#### Coming Soon
92
110
- better models
93
111
- more resolutions
112
+
- image-to-image retrieval
94
113
95
114
## Text-to-Image
96
115

@@ -323,6 +342,19 @@ Thanks for open-sourcing!
323
342
archivePrefix={arXiv},
324
343
primaryClass={cs.CV}
325
344
}
345
+
346
+
@misc{https://doi.org/10.48550/arxiv.2204.11824,
347
+
doi = {10.48550/ARXIV.2204.11824},
348
+
url = {https://arxiv.org/abs/2204.11824},
349
+
author = {Blattmann, Andreas and Rombach, Robin and Oktay, Kaan and Ommer, Björn},
350
+
keywords = {Computer Vision and Pattern Recognition (cs.CV), FOS: Computer and information sciences, FOS: Computer and information sciences},
0 commit comments