PeRFlow: Piecewise Rectified Flow as Universal Plug-and-Play Accelerator

1ByteDance   2UT Austin   3NUS


Gallery


8-step inference


Description of the image
Description of the image
Description of the image
Description of the image
Description of the image
Description of the image
Description of the image
Description of the image
Description of the image
Description of the image
Description of the image
Description of the image
Description of the image
Description of the image
Description of the image
Description of the image

4-step inference


Description of the image
Description of the image
Description of the image
Description of the image
Description of the image
Description of the image
Description of the image
Description of the image
Description of the image
Description of the image
Description of the image
Description of the image
Description of the image
Description of the image
Description of the image
Description of the image




Features



PeRFlow trains piecewise-linear rectified flow models for fast sampling. These models can be initialized from pretrained diffusion models, such as Stable Diffusion (SD). The obtained weights of PeRFlow serve as a general accelerator module which is compatible with various fine-tuned stylized SD models as well as SD-based generation/editing pipelines. Specifically, \(\Delta W\) are computed by the PeRFlow's weights minus the pretrained SD. One can fuse the PeRFlow-\(\Delta W\) into various SD pipelines for (conditional) image generation/editing to enable high-quality few-step inference.


Video demos for real-time generation

fast text-to-image generation
instant multiview generation



Compatibility to SD pipelines




Wonder3D

PeRFlow accelerated Wonder3D for instant (one-step) multiview generation.

Description of the image Description of the image Description of the image Description of the image Description of the image Description of the image


ControlNet - Tile

Here, we show the collaboration of PeRFlow and ControlNet-Tile for fast image enhancement. Given a low-res input (64x64), we can generate a high-res (1024x1024) image with rich details.

Low-resolution input
Description of the image
PeRFlow
Description of the image
LCM
Description of the image
All results are generated with 8-step inference unless otherwise stated.

ControlNet - Depth / Edge / Pose

Plug PeRFlow-\(\Delta W\) into other ControlNet pipelines.

Condition
Description of the image
Sample 1
Description of the image
Sample 2
Description of the image
Condition
Description of the image
Sample 1
Description of the image
Sample 2
Description of the image
Description of the image
Description of the image
Description of the image
Description of the image
Description of the image
Description of the image
Description of the image
Description of the image
Description of the image
Description of the image
Description of the image
Description of the image


IP-Adapter FaceID


Face ID
Description of the image
Sample 1
Description of the image
Sample 2
Description of the image
Face ID
Description of the image
Sample 1
Description of the image
Sample 2
Description of the image


Prompt-to-prompt


Description of the image

a cat → a dog

Description of the image

eating steak → eating rice

Description of the image

a dog → a bull

Description of the image

with crown → with hat


Img2Img


Description of the image
Description of the image

Description of the image

Description of the image
Description of the image

Description of the image



Comparison with LCM



Better compatibility with finetuned SD models


PeRFlow shows a smaller gap to the oracle in terms of image quality, e.g., color-style and layout. It preserves the aesthetic effect of finetuned stylized models (vividness, proper brightness and contrast). PeRFlow also supports CFG and negative prompts. One can use the well-craft negative and positive prompts provided in various finetued stylized models to generate astonishing images.

Oracle*
PeRFlow
LCM
Description of the image

*We regard the 25-step sampling results of the pretrained SD-v1.5 as oracle.


Better sampling diversity


PeRFlow
Description of the image
LCM
Description of the image
  • A dog playing in the garden, snow.
  • A boy with bright blue eyes, looking at the viewer with a toothy smile.
  • A man with brown skin and a beard, looking at the viewer with dark eyes.
  • A young woman with a crown and a masterpiece necklace, at a royal event.


  • Consistency when inference with different number of steps

    As PeRFlow is essentially an ODE, increasing sampling steps will monotonically improve the image quality. Given a random seed, PeRFlow generates results sharing the similar appearance, so that users can preview many candidates via 4-step inference, then choose few ones of interest for final high-quality generation. In contrast, the results of LCM via different inference steps may look very different.

    PeRFlow 4--8--16 step
    Description of the image
    LCM 4--8--16 step
    Description of the image



    Method



    Rectified Flows proposes to construct flow-based generative models via linear interpolation, and the trajectories of the learned flow can be straightened with a special operation called reflow. However, the reflow procedure requires generating a synthetic dataset by simulating the entire pre-trained probability flow, which consumes a huge amount of storage and time, making it unfavorable for training large-scale foundation models. To address this limitation, we propose piecewise rectified flow. By dividing the pre-trained probability flows into multiple segments and straightening the intermediate probability flows inside each segment with reflow, we yield a piecewise linear probability flow that can be sampled within very few steps. This divide-and-conquer strategy successfully avoids the cumbersome simulation of the whole ODE trajectory, thereby allowing us to perform the piecewise reflow operation online in training.

    Description of the image
    As shown in the figure, the pre-trained probability flow (which can be transformed from a pre-trained diffusion model) maps random noise distribution \(\pi_0\), to the data distribution \(\pi_1\). It requires many steps to sample from the curved flow with ODE solvers. Instead, PeRFlow divides the sampling trajectories into multiple segments (two as an example here), and straightens each segment with the reflow operation. A well-trained PeRFlow can generate high-quality images in very few steps because of its piecewise linear nature.

    LAION5B-30k SD-v1.5 COCO2014-30k
    FID 4-step 8-step 4-step 8-step 4-step 8-step
    PeRFlow 9.74 8.62 9.46 5.05 11.31 14.16
    LCM 15.38 19.21 15.63 21.19 23.49 29.63

    Quantitative Results: We train a PeRFlow model on LAION-aesthetic-v2 data to accelerate SD-v1.5. We compare the FID with respect to three datasets, including (1) a subset of 30K images from LAION, (2) a set of 30K images generated from SD-v1.5 with the JourneyDB prompts, (3) the validation set of MS-COCO2014. For all these datasets, we generate 30K images with different models using the corresponding text prompts. The results are presented in the following table. PeRFlow has lower FIDs in all the three comparisons according to the numerical results.

    BibTeX

    @article{yan2024perflow,
        title   = {PeRFlow: Accelerating Diffusion models via Piecewise Rectified Flow},
        author  = {Yan, Hanshu and Liu, Xingchao and Pan, Jiachun and Liew, Jun Hao and Liu, Qiang and Feng, Jiashi},
        year    = {2024}
    }