I posted a few weeks about about my AI Traveler project, how I built some scripts and tools to completely 100% automate a basic Youtube Channel.
It’s been running automatically for about 2 weeks now and I’ve made lots of little changes and tweaks, and I wanted to share my findings for anyone else playing in this space.
It’s an interesting collection of AI & Technical quirks that sometimes disappoint, sometimes entertain.
Prompt Engineering
If you’ve played any in the new LLM space you’ve heard the term “Prompt Engineering”. What is it? Wikipedia says:
Prompt engineering or prompting is the process of structuring sentences so that they can be interpreted and understood by a generative AI model in such a way that its output is in accord with the user's intentions.
So that doesn’t really help, does it? Let me give you a concrete example. In the first version of the script I used a prompt like:
Write about the following topic:{prompt}
. Write in short sentences separated by . Write about{nLen}
complete sentences.
Generally that worked. But there were a few main problems:
- Using a period as a separator only works if that’s not in your title. Like “St. Peter’s Basilica” -
- Even when told short sentences, it can generate some really long run-on compound sentences which don’t work well in this use case.
- Sometimes it can be too literal and literally return “sentences”. Like, the single word. 🤦
It took some tinkering, but I eventually rewrote the prompt to return the result as a JSON array of proper sentences. That made things much more structured and easy to work with. However, the structure of the JSON would vary just a bit from run to run. Sometimes you get a basic array. Sometimes you get a key-valuearray pair. sometimes you get just the array without the bounding braces. It took a combination of Prompt-work and Python code to build something robust to work, but it’s been a good week now without any failed executions.
Resolution
I’m still running this on a Raspberry Pi 3B+, which was capped at 720P resolutions. It’s my own fault, I grabbed one handy without looking too closely and it only had 1G of RAM. I switched to a 4G unit, and now it can generate 1080p videos.
- 55s short takes 45minutes to encode
- 3 minute video takes 2 hours to encode
Which leads into the next topic:
API Limits
Google approved my request for a Quota increase, so I reconfigured the tools:
- Generate a 55s short every 2 hours
- Generate a 3 minute video 3x/day
I also added a yake pass to the video description to generate hashtags for the video.
Results
All of these together have made a noticable improvement to the quality of videos. However, it’s still far from perfect. Even some astute viewers have noticed comical items like this video:
A video of plants that only shows Bridges and buildings.
I’m still far from anything profitable on the channel. Even with 1000+ subscribers I’m under 5% of the viewer metrics required to even enable them. However, it’s been a fun project and I’ve learned a lot about the capabilities of these systems.
With all of this running, my only real cost is the ChatGPT API usage, which comes to about $0.12/day, or less than $4/month .
If I keep working on it, I may try to replace some of the yake elements with ChatGPT. I would hope that can generate more relevant keywords instead of the current context-free system, but I would have to do some work to integrate that against my image search algorithms to handle query failures.