Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 45 additions & 0 deletions src/diffusers/plus_pipelines/pulid/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Stable Diffusion

## Overview

ELLA was proposed in [ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment](https://arxiv.org/pdf/2403.05135) by Xiwei Hu, Rui Wang, Yixiao Fang, Bin Fu, Pei Cheng, and Gang Yu

The summary of the model is the following:
*Diffusion models have demonstrated remarkable performancein the domain of text-to-image generation. However, most widely usedmodels still employ CLIP as their text encoder, which constrains theirability to comprehend dense prompts, encompassing multiple objects,detailed attributes, complex relationships, long-text alignment, etc. Inthis paper, we introduce anEfficientLargeLanguage Model Adapter,termed ELLA, which equips text-to-image diffusion models with powerful Large Language Models (LLM) to enhance text alignment without training of either U-Net or LLM. To seamlessly bridge two pre-trained models, we investigate a range of semantic alignment connector designs and propose a novel module, the Timestep-Aware Semantic Connector (TSC), which dynamically extracts timestep-dependent conditions from LLM. Our approach adapts semantic features at different stages of the denoising process, assisting diffusion models in interpreting lengthy and intricate prompts over sampling timesteps. Additionally, ELLA can be readily incorporated with community models and tools to improve their prompt-following capabilities. To assess text-to-image models in dense prompt following, we introduce Dense Prompt Graph Benchmark (DPGBench), a challenging benchmark consisting of 1K dense prompts. Extensive experiments demonstrate the superiority of ELLA in dense prompt following compared to state-of-the-art methods, particularly in multiple object compositions involving diverse attributes and relationships.

## Examples:

### Impoting all the required pipelines
```python
from diffusers_plus_plus import EllaDiffusionPipeline, ELLA, DPMSolverMultistepScheduler
```

### Load pretrained ELLA weights from the hub provided by the authors of the paper
```python
ELLA = ELLA.from_pretrained('shauray/ELLA_SD15')
```

### Load all the parts of the pipeline namely the scheduler, unet, vae etc. and this can be used with adapters like T2I and IP-Adapter
```python
ella_pipeline = EllaDiffusionPipeline.from_pretrained("Justin-Choo/epiCRealism-Natural_Sin_RC1_VAE",ELLA=ELLA, requires_safety_checker=False)
ella_pipeline.scheduler = DPMSolverMultistepScheduler.from_config(ella_pipeline.scheduler.config)
ella_pipeline = ella_pipeline.to("cuda")
```

### provide a prompt which would then be converted into llm token outputs in order to feed it through ELLA
```python
prompt = "a beautiful portrait of an empress in her garden"
negative_prompt = ""
```

### Generate and save the image
```python
image = ella_pipeline(prompt, negative_prompt=negative_prompt, guidance=7,num_inference_steps=30, height=768, width=512).images[0]

image.save("black_to_blue.png")
```

### Inference Example
| ELLA NOT-Fixed Embedding Length | ELLA Fixed Embedding Length | SD15 |
| ----------- | ----------- | ----------- |
| ![Example Image](https://drive.google.com/uc?id=1zgFb3ELhftBem2PTmZVhhbBahxQBSYX0) | ![Example Image](https://drive.google.com/uc?id=1m4vjEnguRWM8ZTGdXTA25A4xZeoKuyhh) | ![Example Image](https://drive.google.com/uc?id=1Te5V1Htku-3zZyiFS1ws4LL15zfhlvDh) |
65 changes: 65 additions & 0 deletions src/diffusers/plus_pipelines/pulid/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
from typing import TYPE_CHECKING

from ...utils import (
DIFFUSERS_SLOW_IMPORT,
OptionalDependencyNotAvailable,
_LazyModule,
get_objects_from_module,
is_flax_available,
is_k_diffusion_available,
is_k_diffusion_version,
is_onnx_available,
is_torch_available,
is_transformers_available,
is_transformers_version,
)


_dummy_objects = {}
_additional_imports = {}
_import_structure = {"pipeline_output": ["StableDiffusionPipelineOutput"]}

try:
if not (is_transformers_available() and is_torch_available()):
raise OptionalDependencyNotAvailable()
except OptionalDependencyNotAvailable:
from ...utils import dummy_torch_and_transformers_objects # noqa F403

_dummy_objects.update(get_objects_from_module(dummy_torch_and_transformers_objects))
else:
_import_structure["pipeline_pulid_sdxl"] = [
"EllaFixedDiffusionPipeline",
"EllaFlexDiffusionPipeline",
]
_import_structure["safety_checker"] = ["StableDiffusionSafetyChecker"]


if TYPE_CHECKING or DIFFUSERS_SLOW_IMPORT:
try:
if not (is_transformers_available() and is_torch_available()):
raise OptionalDependencyNotAvailable()

except OptionalDependencyNotAvailable:
from ...utils.dummy_torch_and_transformers_objects import *

else:
from .pipeline_stable_diffusion import (
EllaFixedDiffusionPipeline,
EllaFlexDiffusionPipeline,
StableDiffusionPipelineOutput,
StableDiffusionSafetyChecker,
)
else:
import sys

sys.modules[__name__] = _LazyModule(
__name__,
globals()["__file__"],
_import_structure,
module_spec=__spec__,
)

for name, value in _dummy_objects.items():
setattr(sys.modules[__name__], name, value)
for name, value in _additional_imports.items():
setattr(sys.modules[__name__], name, value)
45 changes: 45 additions & 0 deletions src/diffusers/plus_pipelines/pulid/pipeline_output.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
from dataclasses import dataclass
from typing import List, Optional, Union

import numpy as np
import PIL.Image

from ...utils import BaseOutput, is_flax_available


@dataclass
class StableDiffusionPipelineOutput(BaseOutput):
"""
Output class for Stable Diffusion pipelines.

Args:
images (`List[PIL.Image.Image]` or `np.ndarray`)
List of denoised PIL images of length `batch_size` or NumPy array of shape `(batch_size, height, width,
num_channels)`.
nsfw_content_detected (`List[bool]`)
List indicating whether the corresponding generated image contains "not-safe-for-work" (nsfw) content or
`None` if safety checking could not be performed.
"""

images: Union[List[PIL.Image.Image], np.ndarray]
nsfw_content_detected: Optional[List[bool]]


if is_flax_available():
import flax

@flax.struct.dataclass
class FlaxStableDiffusionPipelineOutput(BaseOutput):
"""
Output class for Flax-based Stable Diffusion pipelines.

Args:
images (`np.ndarray`):
Denoised images of array shape of `(batch_size, height, width, num_channels)`.
nsfw_content_detected (`List[bool]`):
List indicating whether the corresponding generated image contains "not-safe-for-work" (nsfw) content
or `None` if safety checking could not be performed.
"""

images: np.ndarray
nsfw_content_detected: List[bool]
Loading