ModelsLab · shauray8 · Sep 15, 2024
diff --git a/src/diffusers/plus_pipelines/pulid/README.md b/src/diffusers/plus_pipelines/pulid/README.md
@@ -0,0 +1,45 @@
+# Stable Diffusion
+
+## Overview
+
+ELLA was proposed in [ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment](https://arxiv.org/pdf/2403.05135) by Xiwei Hu, Rui Wang, Yixiao Fang, Bin Fu, Pei Cheng, and Gang Yu
+
+The summary of the model is the following:
+*Diffusion models have demonstrated remarkable performancein the domain of text-to-image generation. However, most widely usedmodels still employ CLIP as their text encoder, which constrains theirability to comprehend dense prompts, encompassing multiple objects,detailed attributes, complex relationships, long-text alignment, etc. Inthis paper, we introduce anEfficientLargeLanguage Model Adapter,termed ELLA, which equips text-to-image diffusion models with powerful Large Language Models (LLM) to enhance text alignment without training of either U-Net or LLM. To seamlessly bridge two pre-trained models, we investigate a range of semantic alignment connector designs and propose a novel module, the Timestep-Aware Semantic Connector (TSC), which dynamically extracts timestep-dependent conditions from LLM. Our approach adapts semantic features at different stages of the denoising process, assisting diffusion models in interpreting lengthy and intricate prompts over sampling timesteps. Additionally, ELLA can be readily incorporated with community models and tools to improve their prompt-following capabilities. To assess text-to-image models in dense prompt following, we introduce Dense Prompt Graph Benchmark (DPGBench), a challenging benchmark consisting of 1K dense prompts. Extensive experiments demonstrate the superiority of ELLA in dense prompt following compared to state-of-the-art methods, particularly in multiple object compositions involving diverse attributes and relationships.
+
+## Examples:
+
+### Impoting all the required pipelines
+```python
+from diffusers_plus_plus import EllaDiffusionPipeline, ELLA, DPMSolverMultistepScheduler
+```
+
+### Load pretrained ELLA weights from the hub provided by the authors of the paper
+```python
+ELLA = ELLA.from_pretrained('shauray/ELLA_SD15')
+```
+
+### Load all the parts of the pipeline namely the scheduler, unet, vae etc. and this can be used with adapters like T2I and IP-Adapter
+```python
+ella_pipeline = EllaDiffusionPipeline.from_pretrained("Justin-Choo/epiCRealism-Natural_Sin_RC1_VAE",ELLA=ELLA, requires_safety_checker=False)
+ella_pipeline.scheduler = DPMSolverMultistepScheduler.from_config(ella_pipeline.scheduler.config)
+ella_pipeline = ella_pipeline.to("cuda")
+```
+
+### provide a prompt which would then be converted into llm token outputs in order to feed it through ELLA
+```python
+prompt = "a beautiful portrait of an empress in her garden"
+negative_prompt = ""
+```
+
+### Generate and save the image
+```python
+image = ella_pipeline(prompt, negative_prompt=negative_prompt, guidance=7,num_inference_steps=30, height=768, width=512).images[0]
+
+image.save("black_to_blue.png")
+```
+
+### Inference Example
+|  ELLA NOT-Fixed Embedding Length | ELLA Fixed Embedding Length | SD15  |
+| ----------- | ----------- | ----------- |
+| ![Example Image](https://drive.google.com/uc?id=1zgFb3ELhftBem2PTmZVhhbBahxQBSYX0) | ![Example Image](https://drive.google.com/uc?id=1m4vjEnguRWM8ZTGdXTA25A4xZeoKuyhh) |  ![Example Image](https://drive.google.com/uc?id=1Te5V1Htku-3zZyiFS1ws4LL15zfhlvDh) |
diff --git a/src/diffusers/plus_pipelines/pulid/__init__.py b/src/diffusers/plus_pipelines/pulid/__init__.py
@@ -0,0 +1,65 @@
+from typing import TYPE_CHECKING
+
+from ...utils import (
+    DIFFUSERS_SLOW_IMPORT,
+    OptionalDependencyNotAvailable,
+    _LazyModule,
+    get_objects_from_module,
+    is_flax_available,
+    is_k_diffusion_available,
+    is_k_diffusion_version,
+    is_onnx_available,
+    is_torch_available,
+    is_transformers_available,
+    is_transformers_version,
+)
+
+
+_dummy_objects = {}
+_additional_imports = {}
+_import_structure = {"pipeline_output": ["StableDiffusionPipelineOutput"]}
+
+try:
+    if not (is_transformers_available() and is_torch_available()):
+        raise OptionalDependencyNotAvailable()
+except OptionalDependencyNotAvailable:
+    from ...utils import dummy_torch_and_transformers_objects  # noqa F403
+
+    _dummy_objects.update(get_objects_from_module(dummy_torch_and_transformers_objects))
+else:
+    _import_structure["pipeline_pulid_sdxl"] = [
+        "EllaFixedDiffusionPipeline",
+        "EllaFlexDiffusionPipeline",
+    ]
+    _import_structure["safety_checker"] = ["StableDiffusionSafetyChecker"]
+
+
+if TYPE_CHECKING or DIFFUSERS_SLOW_IMPORT:
+    try:
+        if not (is_transformers_available() and is_torch_available()):
+            raise OptionalDependencyNotAvailable()
+
+    except OptionalDependencyNotAvailable:
+        from ...utils.dummy_torch_and_transformers_objects import *
+
+    else:
+        from .pipeline_stable_diffusion import (
+            EllaFixedDiffusionPipeline,
+            EllaFlexDiffusionPipeline,
+            StableDiffusionPipelineOutput,
+            StableDiffusionSafetyChecker,
+        )
+else:
+    import sys
+
+    sys.modules[__name__] = _LazyModule(
+        __name__,
+        globals()["__file__"],
+        _import_structure,
+        module_spec=__spec__,
+    )
+
+    for name, value in _dummy_objects.items():
+        setattr(sys.modules[__name__], name, value)
+    for name, value in _additional_imports.items():
+        setattr(sys.modules[__name__], name, value)
diff --git a/src/diffusers/plus_pipelines/pulid/pipeline_output.py b/src/diffusers/plus_pipelines/pulid/pipeline_output.py
@@ -0,0 +1,45 @@
+from dataclasses import dataclass
+from typing import List, Optional, Union
+
+import numpy as np
+import PIL.Image
+
+from ...utils import BaseOutput, is_flax_available
+
+
+@dataclass
+class StableDiffusionPipelineOutput(BaseOutput):
+    """
+    Output class for Stable Diffusion pipelines.
+
+    Args:
+        images (`List[PIL.Image.Image]` or `np.ndarray`)
+            List of denoised PIL images of length `batch_size` or NumPy array of shape `(batch_size, height, width,
+            num_channels)`.
+        nsfw_content_detected (`List[bool]`)
+            List indicating whether the corresponding generated image contains "not-safe-for-work" (nsfw) content or
+            `None` if safety checking could not be performed.
+    """
+
+    images: Union[List[PIL.Image.Image], np.ndarray]
+    nsfw_content_detected: Optional[List[bool]]
+
+
+if is_flax_available():
+    import flax
+
+    @flax.struct.dataclass
+    class FlaxStableDiffusionPipelineOutput(BaseOutput):
+        """
+        Output class for Flax-based Stable Diffusion pipelines.
+
+        Args:
+            images (`np.ndarray`):
+                Denoised images of array shape of `(batch_size, height, width, num_channels)`.
+            nsfw_content_detected (`List[bool]`):
+                List indicating whether the corresponding generated image contains "not-safe-for-work" (nsfw) content
+                or `None` if safety checking could not be performed.
+        """
+
+        images: np.ndarray
+        nsfw_content_detected: List[bool]