CogVideoX 5B模型的效果&&部署教程

WilliamSun 字数: 2728 阅读耗时: 6 分钟 2024/11/16 博客独享热度: 0 评论: 0

前言

我这次使用的是CogVideoX-5B-I2V，由智谱清影开源

Image to video模型，所以需要一张图作为基础输入，可以让chatgpt或sd3.5(后续会发)生成

在3090TI上，速度差不多是20.66s/it，生成一个50step的视频大概10-15分钟左右，部分取决于提示词

先看效果：

输入图：

提示词（Chatgpt4o）：

A person slowly walks from the left side of the screen and exits the frame, with smooth movement and natural pacing.

25step输出视频（7FPS）：

50step输出视频（7FPS）：

总体效果还能接受，除了太阳穿帮了

部署

代码仓库：THUDM/CogVideoX-5b-I2V at main

可以使用git或hf官方工具下载模型到本地文件夹里，比如./model，这里不多赘述

官方需求包：

# diffusers>=0.30.3
# transformers>=0.44.2
# accelerate>=0.34.0
# imageio-ffmpeg>=0.5.1
pip install --upgrade transformers accelerate diffusers imageio-ffmpeg

调用模型的代码（基于官方，做了一点更改）：

import os
from functools import cache
import torch
from diffusers import CogVideoXImageToVideoPipeline
from diffusers.utils import export_to_video, load_image
from huggingface_hub import hf_api



os.environ['HF_ENDPOINT'] = "https://hf-mirror.com"
prompt ="A person slowly walks from the left side of the screen and exits the frame, with smooth movement and natural pacing."



image = load_image(image="./input.jpg")#输入图片在这里调用本地./input.jpg
pipe = CogVideoXImageToVideoPipeline.from_pretrained(
    "./model",#这里是本地下载好的模型的位置
    torch_dtype=torch.bfloat16,local_files_only=True 
)

pipe.enable_sequential_cpu_offload()
pipe.vae.enable_tiling()
pipe.vae.enable_slicing()

video = pipe(
    prompt=prompt,
    image=image,#输入图片
    num_videos_per_prompt=1,
    num_inference_steps=50,#这里是步数
    num_frames=48,#这里是总帧率，意思就是要生成总共多少帧，结合下面的fps可以控制视频长度
    guidance_scale=6,
    generator=torch.Generator(device="cuda").manual_seed(76755),#这里是随机种子，可以随便改
).frames[0]

export_to_video(video, "./output/output.mp4", fps=3)#这里是输出路径和fps

跟着注释填上参数就可以了