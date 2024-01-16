PIXART-delta, the next-generation text-to-image model, is set to redefine the parameters of speed and quality in the realm of image generation. This state-of-the-art model integrates Latent Consistency Models (LCM) and an innovative ControlNet-Transformer architecture, facilitating a substantial acceleration in the inference process and a marked improvement in image generation quality.

Revolutionizing Inference Speed

One of the most striking advancements in PIXART-delta is the incorporation of LCM, which empowers the model to produce high-quality samples at a dramatically faster rate. Inference times have been cut down to a mere 0.5 seconds for a 1024x1024 image on an A100 GPU—a seven-fold increase in speed compared to its predecessor, PIXART-alpha.

Introducing ControlNet-Transformer

The ControlNet module, ingeniously adapted for Transformer-based models, is another noteworthy feature of PIXART-delta. This module is applied selectively to the initial base blocks of the model, ensuring effective control of the generation process. This strategic design decision has been instrumental in significantly enhancing the model's controllability and performance.

Breaking Barriers in Training Efficiency and Quality

PIXART-delta's training efficiency is another remarkable achievement. The model can comfortably be trained within a 32GB GPU memory limit, making it accessible for consumer-grade hardware. It outperforms other comparable methods in terms of inference speed while maintaining high-quality image generation across various hardware platforms. The training process involves Latent Consistency Distillation (LCD), a refinement of the original Consistency Distillation algorithm, and is supported by performance benchmarks using FID and CLIP scores.

The impact of the ControlNet-Transformer was evaluated through an ablation study, revealing its superiority in terms of faster convergence and enhanced performance. The ControlNet-Transformer demonstrated particular prowess in generating detailed outlines for human faces and bodies. All these attributes position PIXART-delta as a potential game-changer in the field of real-time applications with its stellar text-to-image generation capabilities.