换脸讲话：hallo在windows下的安装实现

前言

提示：之前安装过linux下的hallo，即人脸讲话系统。hallo是目前使用的较好的一个虚拟人脸视频生成系统，相对比SadTalker而言，表情更加逼真，人物更加形象。这里记录的是windows下的hallo版本安装，感谢liuning同学的参与。

一、安装：

配置cuda python等，并克隆

示例：pandas 是基于NumPy 的一种工具，该工具是为了解决数据分析任务而创建的。

安装python

要求python版本在3.8和3.11之间，我们安装的是3.10.11，点击链接python官网
在这里插入图片描述
找到对应的版本的python

下载完成之后开始安装，选择自定义安装，注意勾选将python添加到路径中

点击next

选择自己的安装路径（建议不要放在C盘），然后点击安装

安装完成之后,查看是否加入环境变量，搜先搜索编辑系统环境变量。
在这里插入图片描述
然后，点击环境变量。

点击path

查看是否python的安装路径是否已经加入环境变量中了。如果不在，将这些变量加入即可

最后检查python是否安装成功，打开cmd（win+R）

在这里插入图片描述
在跳出的终端输入python，出现以下内容就代表安装成功。然后输入exit()退出

安装ffmpeg

点击此处进入ffmpeg官网下载Windows版本
在这里插入图片描述
解压之后，将文件夹命名为ffmpeg，并放入到C盘根目录中

然后，以管理员身份运行cmd并设置环境变量

setx /m PATH "C:\ffmpeg\bin;%PATH%"

在这里插入图片描述
查看环境变量是否有以下内容

重新启动计算机并通过运行以下命令来验证安装：

ffmpeg -version

在这里插入图片描述

安装CUDA12.1

进入CUDA官网，选择12.1版本的cuda
在这里插入图片描述
根据下图选择安装

下载完成之后就是安装，最后选自动安装，如果选择自定义安装的话，记住安装路径就可以了。
接着配置环境变量（和上面步骤一样）先进入你的C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1这个目录。这一步得确保你能进入到这个目录，否则的话就要找到你的NVIDIA GPU Computing Toolkit安装目录，然后进入./CUDA/v12.18这个目录。
在这里插入图片描述
最后查看是否安装成功，打开cmd，输入

nvcc --version

在这里插入图片描述
如图所所示就是安装成功了。

Install with Powershell run install.ps1 or install-cn.ps1(for Chinese)

我们在用powershell运行 install.ps1时，其实做的一件事就是install requirement.txt。将各种依赖库进行安装，并下载各种依赖的模型，其中下载模型，需要参考如下：

./pretrained_models/
|-- audio_separator/
|   |-- download_checks.json
|   |-- mdx_model_data.json
|   |-- vr_model_data.json
|   `-- Kim_Vocal_2.onnx
|-- face_analysis/
|   `-- models/
|       |-- face_landmarker_v2_with_blendshapes.task  # face landmarker model from mediapipe
|       |-- 1k3d68.onnx
|       |-- 2d106det.onnx
|       |-- genderage.onnx
|       |-- glintr100.onnx
|       `-- scrfd_10g_bnkps.onnx
|-- motion_module/
|   `-- mm_sd_v15_v2.ckpt
|-- sd-vae-ft-mse/
|   |-- config.json
|   `-- diffusion_pytorch_model.safetensors
|-- stable-diffusion-v1-5/
|   `-- unet/
|       |-- config.json
|       `-- diffusion_pytorch_model.safetensors
`-- wav2vec/`-- wav2vec2-base-960h/|-- config.json|-- feature_extractor_config.json|-- model.safetensors|-- preprocessor_config.json|-- special_tokens_map.json|-- tokenizer_config.json`-- vocab.json

其中，各种模型的下载错误，是导致安装不成功的主要原因，为此。我将linux中下载好的模型直接拷贝到对应的文件下，即可解决下载的问题。以下是安装成功的效果。

待插图；

二、推理

1.Powershell run with run_inference.ps1

run_inference代码如下：

$source_image="assets/zgr.jpg"   
$driving_audio="assets/feng3cut.wav"
$output="test.mp4"
$face_expand_ratio=""Set-Location $PSScriptRoot
.\venv\Scripts\activate$Env:HF_HOME = "huggingface"
$Env:XFORMERS_FORCE_DISABLE_TRITON = "1"
$ext_args = [System.Collections.ArrayList]::new()if ($output) {[void]$ext_args.Add("--output=$output")
}if ($face_expand_ratio) {[void]$ext_args.Add("--face_expand_ratio=$face_expand_ratio")
}python.exe "./scripts/inference.py" `
--source_image=$source_image `
--driving_audio=$driving_audio `$ext_args

注意，我们需要修改的是source_image和driving_audio的文件入口。
driving_audio放入的音频文件不要太大，否则非常耗费时间。

然后就是运行的过程，显示如下：

  
# 处理步骤1：先处理背景图，再处理人脸；
Processed and saved: ./.cache\trump1_sep_background.png
Processed and saved: ./.cache\trump1_sep_face.png#步骤2：将音频文件转为向量；
Some weights of Wav2VecModel were not initialized from the model checkpoint at ./pretrained_models/wav2vec/wav2vec2-base-960h and are newly initialized: ['wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original0', 'wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original1', 'wav2vec2.masked_spec_embed']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
INFO:audio_separator.separator.separator:Separator version 0.17.2 instantiating with output_dir: ./.cache/audio_preprocess, output_format: WAV
INFO:audio_separator.separator.separator:Operating System: Linux #44~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Jun 18 14:36:16 UTC 2
INFO:audio_separator.separator.separator:System: Linux Node: ubuntu22-E500-G9-WS760T Release: 6.5.0-44-generic Machine: x86_64 Proc: x86_64
INFO:audio_separator.separator.separator:Python Version: 3.10.14
INFO:audio_separator.separator.separator:PyTorch Version: 2.2.2+cu121
INFO:audio_separator.separator.separator:FFmpeg installed: ffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg developers
INFO:audio_separator.separator.separator:ONNX Runtime GPU package installed with version: 1.18.0
INFO:audio_separator.separator.separator:CUDA is available in Torch, setting Torch device to CUDA
INFO:audio_separator.separator.separator:ONNXruntime has CUDAExecutionProvider available, enabling acceleration
INFO:audio_separator.separator.separator:Loading model Kim_Vocal_2.onnx...
INFO:audio_separator.separator.separator:Load model duration: 00:00:00
INFO:audio_separator.separator.separator:Starting separation process for  audio_file_path: assets/zhiguibing3cut.wav100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:03<00:00,  1.24s/it]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 17.42it/s]
INFO:audio_separator.separator.separator:Saving Vocals stem to 1_(Vocals)_Kim_Vocal_2.wav...
INFO:audio_separator.separator.separator:Clearing input audio file paths, sources and stems...
INFO:audio_separator.separator.separator:Separation duration: 00:00:10
## 大概需要运行2mins# 步骤三：将多模态特征输入扩散模型的UNet结构中
The config attributes {'center_input_sample': False, 'out_channels': 4} were passed to UNet2DConditionModel, but are not expected and will be ignored. Please verify your config.json configuration file.
Some weights of the model checkpoint were not used when initializing UNet2DConditionModel: ['conv_norm_out.bias, conv_norm_out.weight, conv_out.bias, conv_out.weight']#运行SD的UNet
INFO:hallo.models.unet_3d:loaded temporal unet's pretrained weights from pretrained_models/stable-diffusion-v1-5/unet ...
The config attributes {'center_input_sample': False} were passed to UNet3DConditionModel, but are not expected and will be ignored. Please verify your config.json configuration file.# 运行运动模块，生成动画效果
Load motion module params from pretrained_models/motion_module/mm_sd_v15_v2.ckpt
INFO:hallo.models.unet_3d:Loaded 453.20928M-parameter motion module# 运行hallo框架
loaded weight from  ./pretrained_models/hallo/net.pth#进行31次迭代生成；
[1/31]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [00:27<00:00,  1.45it/s]
100%|
....
....
....Moviepy - Building video .cache/output.mp4.
MoviePy - Writing audio in outputTEMP_MPY_wvf_snd.mp4
MoviePy - Done.                                                                                                                                                       
Moviepy - Writing video .cache/output.mp4#输出mp4文件
Moviepy - Done !                                                                                                                                                      
Moviepy - video ready .cache/output.mp4
————————————————                        
原文链接：https://blog.csdn.net/wqthaha/article/details/140696292