The AI Traveler
🤖

The AI Traveler

Tags
Computer Science
Software Development
Projects
Thought Experiments
Published
July 26, 2023
Author
URL
Over the last few weeks, I’ve been experimenting with AI and LLM systems. ChatGPT, google Bard, LLAMA, Dalai, Midjourney, there are so many of them these days. I thought ChatGPT was neat, but when I unlocked ChatGPT Plus with access to Code Interpreter and other plugins, then I really started to understand some of the power.
I then decided I needed to come up with some idea of what I could do to try and integrate some of these systems? My chosen project: A fully autonomous Youtube Channel.
You can see my results at https://www.youtube.com/@TheAITraveler . All of the artwork, icons and banners and such, were created in MidJourney. The video subjects come from ChatGPT, along with the video titles and descriptions. Most of the videos were generated by the AI Generation feature in Kapwing. Over the last few weeks though, I’ve been experimenting with removing Kapwing and making it all fully autonomous.
Here is what I found:

The basic steps of Automation

I broke the project down into a few fundamental steps:
  • Create a Script
  • Collect art to go with the script
  • Combine the Script and Art into a Video
  • Upload the Video
With these 4 parts done, it’s easy to setup a cron job to automatically create and upload a video on schedule, with no human intervention.

Creating the Script

To create the script, I first created two databases:
  • One of popular locations - For this database, I asked ChatGPT a few times for things like “a list of 100 countries”, “a list of 50 popular travel destinations”, “a list of 50 exciting historical monuments”
  • One of adverbs - I manually created this database with things like “history of”, “tourism destinations near”, “cuisine of”, etc
I then wrote a simple python script to:
  • Randomly choose one entry of each of the two databases.
  • Then used the Python openapi module to request a simple short script.
There was some massaging required in the prompt to ask it to avoid compound sentences, and avoid use of pronouns. After a bit of tinkering, I had it reliably outputting short scripts that I could easily split on periods to get a simple line per screen for my video.

Collecting Art

For this part, I wanted to retrieve an image per line of the script. I planned each image would then be part of a simple slideshow-style video, combined with some simple background music and a simple effect like a pan or zoom to give it a bit of motion.
After some digging, I settled on two possible image banks::
Both of these services offer royalty-free images and videos, and provide a nice API for query and download. However, I quickly found out that simply throwing the script string at the service typically resulted in 0 hits. I needed to shorten the sentences into their essence for the search.
To do this, I found a python library called yake . With this, I can easily break down a sentence into keywords, and then iterate or randomly select keywords from this list for searching.
There were still lots of edge cases to handle (API errors, 0 hits, download timeouts, etc) but with a bit of work I was able to iterate over all the lines and download a series of images.
As an extra bonus, Pixabay also offers royalty-free background music! So I built a small library of 10 background tracks that I could randomly sample.

Making the movie

This was by far the most time consuming part. To break down the process here:
  • Iterate over every line of the script
    • Composite the Script Line onto a semitransparent background (for readability)
    • Composite that onto the selected image
    • Apply a simple motion effect to the image
  • Concatenate all of these clips
  • Apply Background audio
  • Write to MP4
Thanks to the moviepy package, I was able to pretty easily get this started. The devil was in the details tho.
  • First, the single line of text needed to be broken down into multiline, suitable for size.
  • Text size needed to be adjusted for length (smaller font sizes for longer text)
  • Adding motion to clips isn’t immediately obvious in moviepy, thankfully there are lots of examples.
  • Lots of issues with timing and clip-length
After a week or so of tinkering with it, I had it working tho. It works surprisingly well, and the results are now online to be seen!
Video preview

Uploading to Youtube

This one was easy, thanks to
youtube-upload
tokland • Updated Apr 4, 2024
. Simply download, configure, and it’s good to go.
The one thing I didn’t realize until a few days into the project is Youtube’s API Quotas. They use a “Quota Unit” system, where a free default account gets you 10,000 Units per day. Sounds good, until you realize uploading a video (any size, including short) consumes 1600 units. That means you get 6 uploads a day.
There is a process to request an API limit increase. I’ve started it, not heard back yet at the time of writing.
 

Putting it into Production

With all of the parts complete, it was easy to write a simple shell script to connect them all in series and connect it to a cron job to run every 4 hours (that makes 6 videos a day). I currently have it running on a Raspberry Pi in my garage, which takes about 30 minutes to create a 55s short. The slowest part is encoding the MP4, which is compute and RAM intensive. The pi doesn’t have enough ram for a full 1080x1920, so I’m currently generating 720x1280 videos.
 

Wrapup

So the project is done and running. Things I may continue to tinker with:
  • Updating it to generate higher-resolution videos.
  • Improving the Art Gathering phase - right now the keyword extractor frequently loses context, which results in some hilarious mismatched art. Script lines talking about great bars and music combined with images of african wildlife because the keyword extractor keyed on too simple of a word.
  • Improving the Script Generation - This one is a ChatGPT prompt issue. I ask for sentences and then split on periods, which works great until you have a location with a period in the name. For example, St. Peter’s Basilica. I could probably request that output be provided in a JSON or other structured format for better results.
    • I might also be able to request the image keywords along with the sentences and solve the previous issue.
This entire system running on a RaspberryPi (That I had left over from an old Octopi setup) costs about $0.01 per day in ChatGPT API costs. That’s it.
I have all the code up in Github (in a private repo for now). I may publish it publicly in the future, but right now I worry that this (and systems like it) are the beginning of a tidal wave of AI-generated content that is going to flood social media. In addition to driving AI mad, it’s polluting the internet for those of us looking for “true knowledge”.
 
Regardless, it was a fun project. I hope you learned a thing or two along the way. Happy to address questions in the comments below!