Infinite motivational video creation with chatgpt
Introduction
What came clear to me and many other users, is that ChatGPT can be like a better Google. It know about a lot of concepts : from Pokemon to Friends.
So now, most of the bricks are ready to go very far in automatic content creation.
What can we do ?
TLDR :
Infinite inspirational videos
@truebookwisdom The right mindset for the wanted life by the Zander #mindset #life #happy #inspirational #motivational
♬ original sound - truebookwisdom
1 - The architecture
Here is the logic developped :
- Prompt engineering allows to create a scenario
- The scenario is broken down into pieces
- Each piece will have its own scene (image) and audi (generated voices)
- Everything will be combined together with ffmpeg to create a video
Let’s see more in details each part.
1 - a) Prompt engineering
A prompt template is tailored to get a scenario that we can then parse
Generate a top of life advices. Make Several small sentences rather than big ones.
Use this template format : after a quote, separated by a | , the visual of the scene is described.
Title : Top advices from "The art of War"
Narrator : "If you know the enemy and know yourself, you need not fear the result of a hundred battles." | A man is wielding a sword and faces the camera
Narrator : Similarly, in life. It is important to understand your strengths and weaknesses. | A person looking at his hands
Narrator : But also those of your adversaries. | A person weakness
Narrator : This knowledge can help you to navigate challenges and make strategic decisions. | A book, a medieval helmet and a knife
Now, generate a set of advices from {prompt} with the format defined above :
We force the generation to add a scene description in order to accomodate our generation later on.
1 - b) Text parsing
We need to retrieve :
- The character speaking
- The scene prompt
- And the speech of the character
There are some specificities to this format. We want to keep the sentences short and energic. So we might want ot break down the sentences in smaller chunks.
1 - c) Media generation with StableDiffusion and TTS
Each lines in our previous dataset gives one speech synthesis and one image.
For the image generation :
model_id = "stabilityai/stable-diffusion-2-1"
scheduler = EulerDiscreteScheduler.from_pretrained(model_id, subfolder="scheduler")
pipe = StableDiffusionPipeline.from_pretrained(model_id, scheduler=scheduler, torch_dtype=torch.float16)
pipe = pipe.to("cuda")
N_STEPS = 150
GUIDANCE_SCALE = 15
image = pipe(scene_prompt, num_inference_steps=N_STEPS, guidance_scale=GUIDANCE_SCALE).images[0]
For the TTS generation :
language = 'en'
model_id = 'v3_en'
sample_rate = 48000
speaker = 'en_1'
model, example_text = torch.hub.load(repo_or_dir='snakers4/silero-models',
model='silero_tts',
language=language,
speaker=model_id)
model.save_wav(text=text,
audio_path=audio_path,
speaker=speaker,
sample_rate=sample_rate)
1 - d) Automation with ffmpeg
All the pieces are merged together using FFMPEG.
One example of merging audio and image :
def combine_img_audio(png_file_path, audio_path, mp4_file_path):
rez = subprocess.run(["ffmpeg", "-y", "-i", audio_path, "-i", png_file_path,
"-framerate", "1", mp4_file_path])
if rez.returncode == 1:
raise Exception("ffmpeg audio+image failed")
return mp4_file_path
2 - Some examples
Experiment 1 : A pokemon episode example
@aipokemonscripts Trapped in the loop #ai #pokemon #ytp #youtubepoop ♬ original sound - aipokemonscripts
Review
The negative
- Poor image quality
- We expect more character focus when someone is talking
- We lose track of the speaker count
The positive
- Some rythm
- An original proposition
- Can be very viral
Experiment 2 : An inspirational video
@truebookwisdom Life learnings from Winning friends and Influence people #coach #motivational #lifehack #inspirational ♬ original sound - truebookwisdom
Review
The negative
- Image content can be unexpected
- Content need to more precise
The positive
- Enjoyable
- Overall close to a human production (when pic quality is high)
Experiment 3 : An inspirational video with a better image engine
@truebookwisdom How to develop influence by Robert Cialdini #learn #motivational #inspirational #lifehack ♬ original sound - truebookwisdom
Review
The negative
- After 5/10 video, content repeats itself
- Issues with fingers and multiple arms
The positive
- Very high grade quality of images