TheAITraveler Improved

I posted a few weeks about about my AI Traveler project, how I built some scripts and tools to completely 100% automate a basic Youtube Channel.

It’s been running automatically for about 2 weeks now and I’ve made lots of little changes and tweaks, and I wanted to share my findings for anyone else playing in this space.

It’s an interesting collection of AI & Technical quirks that sometimes disappoint, sometimes entertain.

Prompt Engineering

If you’ve played any in the new LLM space you’ve heard the term “Prompt Engineering”. What is it? Wikipedia says:

Prompt engineering or prompting is the process of structuring sentences so that they can be interpreted and understood by a generative AI model in such a way that its output is in accord with the user's intentions.

So that doesn’t really help, does it? Let me give you a concrete example. In the first version of the script I used a prompt like:

Write about the following topic: {prompt}. Write in short sentences separated by . Write about {nLen} complete sentences.

Generally that worked. But there were a few main problems:

Using a period as a separator only works if that’s not in your title. Like “St. Peter’s Basilica” -

history of St. Peter's Basilica - Vatican City

https://www.youtube.com/shorts/D04EE3Tcfdw

Even when told short sentences, it can generate some really long run-on compound sentences which don’t work well in this use case.

Sometimes it can be too literal and literally return “sentences”. Like, the single word. 🤦

It took some tinkering, but I eventually rewrote the prompt to return the result as a JSON array of proper sentences. That made things much more structured and easy to work with. However, the structure of the JSON would vary just a bit from run to run. Sometimes you get a basic array. Sometimes you get a key-valuearray pair. sometimes you get just the array without the bounding braces. It took a combination of Prompt-work and Python code to build something robust to work, but it’s been a good week now without any failed executions.

Resolution

I’m still running this on a Raspberry Pi 3B+, which was capped at 720P resolutions. It’s my own fault, I grabbed one handy without looking too closely and it only had 1G of RAM. I switched to a 4G unit, and now it can generate 1080p videos.

55s short takes 45minutes to encode

3 minute video takes 2 hours to encode

Which leads into the next topic:

API Limits

Google approved my request for a Quota increase, so I reconfigured the tools:

Generate a 55s short every 2 hours

Generate a 3 minute video 3x/day

I also added a yake pass to the video description to generate hashtags for the video.

Results

All of these together have made a noticable improvement to the quality of videos. However, it’s still far from perfect. Even some astute viewers have noticed comical items like this video:

plants of United Kingdom (UK)

Plants of the United Kingdom are quite varied and diverse. The UK is home to many species of wildflowers, grasses, trees and shrubs. The climate of the UK is...

https://www.youtube.com/shorts/9-XmA7aOrpA

A video of plants that only shows Bridges and buildings.

I’m still far from anything profitable on the channel. Even with 1000+ subscribers I’m under 5% of the viewer metrics required to even enable them. However, it’s been a fun project and I’ve learned a lot about the capabilities of these systems.

With all of this running, my only real cost is the ChatGPT API usage, which comes to about $0.12/day, or less than $4/month .

If I keep working on it, I may try to replace some of the yake elements with ChatGPT. I would hope that can generate more relevant keywords instead of the current context-free system, but I would have to do some work to integrate that against my image search algorithms to handle query failures.