Creating a virtual streamer

Creating a virtual streamer

2023, Aug 29    

Introduction

The goal of this post is to reveal how I built a virtual streamer capable of :

  • interacting with a chat
  • respond with text to speech
  • animate a video and more precisely the lips of the character

Too Long Didn’t Read :

An example of a response to a question.

@truebookwisdom #Jesus parle des contradictions de l'homme #fr #learn #AI #IA ♬ original sound - truebookwisdom

Ask your question on Twitch

General overview

The system can be summarized with the following diagram :

architecture diagram

The system is composed of 3 parts :

  • The ChatReader. It reads the twitch chat, if new message arrives they are added in a high prio queue.
  • The VideoBuilder. It builds the vidoe response to the question, it is composed of 3 steps that we will detail later.
  • The Streamer. It takes cares of streaming videos to twitch, if new videos arrives, they are read first.

Overview of the VideoBuilder

The VideoBuilder is composed of the main steps of the go from a question in text format to a video answering the question.

I used :

  • ChatGPT to anwswer user questions
  • TTS to convert the response to an audio format
  • VideoRetalking to use an input video and input audio and create a lip synced video.

Making it efficient

As I was hosting everything on my desktop computer, I had to tweak it to make more reactive.

The system was originally very slow and I had to do a few tweaks to the VideoRetalking code.

The consequence is the obvious video artefacts visible on the output.

But it allowed to reach a half real time speed with an RTX 4080.

Next steps

I would like to test :

  • different prompts to reach more interesting answers
  • make the code even faster to reach real time
  • improve the voice quality - much lower in french compared to available english ones
  • and make the whole system more production ready