Creating a virtual streamer

Introduction

The goal of this post is to reveal how I built a virtual streamer capable of :

An example of a response to a question.

@truebookwisdom #Jesus parle des contradictions de l'homme #fr #learn #AI #IA ♬ original sound - truebookwisdom

Ask your question on Twitch

The system can be summarized with the following diagram :

architecture diagram

The system is composed of 3 parts :

The ChatReader. It reads the twitch chat, if new message arrives they are added in a high prio queue.
The VideoBuilder. It builds the vidoe response to the question, it is composed of 3 steps that we will detail later.
The Streamer. It takes cares of streaming videos to twitch, if new videos arrives, they are read first.

The VideoBuilder is composed of the main steps of the go from a question in text format to a video answering the question.

I used :

ChatGPT to anwswer user questions
TTS to convert the response to an audio format
VideoRetalking to use an input video and input audio and create a lip synced video.

As I was hosting everything on my desktop computer, I had to tweak it to make more reactive.

The system was originally very slow and I had to do a few tweaks to the VideoRetalking code.

The consequence is the obvious video artefacts visible on the output.

But it allowed to reach a half real time speed with an RTX 4080.

I would like to test :

different prompts to reach more interesting answers
make the code even faster to reach real time
improve the voice quality - much lower in french compared to available english ones
and make the whole system more production ready