ARTIFICIAL INTELLIGENCE (AND ARTIFICIAL DEADLINES)
Since the mid-’90s, my journey in animation has evolved from being an early Flash Animation enthusiast and animator to contributing significantly to the 2D animation software revolution. That movement led us to amazing tools like Toon Boom Harmony and TV Paint, which have reshaped 2D animation production forever.
Now, as I embark on the next wave of animation evolution, the AI revolution, I’m excited to share not just my first AI-supported animated short but also the insightful journey behind it. With three decades of animation and storytelling expertise, particularly in software use, I aim to provide a detailed roadmap of this process, offering valuable insights for both seasoned professionals and those new to the field. This article is a window into the challenging yet rewarding world of AI in animation, a blend of hard work, innovative software, and relentless learning.
Embarking on creating a professional-grade animated short, that included a steady flow of lip sync, with only AI tools may sound ambitious to the point of delusion, but it was a challenge I couldn’t resist. I came equipped with a plan, months of tinkering with software, and an optimism that was both my armor and Achilles’ heel.
The plan? Deceptively simple: concoct a two-minute animated short, leaning heavily, almost recklessly, on the burgeoning tech of AI software packages. All this in under a month. You see, in the animation world, or my corner of it at least, deadlines are not just markers; they’re motivators. They’re what gets the coffee brewed and the midnight oil burning. Projects are never really finished — they’re surrendered, with a mix of relief and resignation, to the unforgiving gods of time.
I decided to include only humanoid characters in the short — AI-assisted lip sync, after all, is still not working all that well with non-humanoid figures. But here’s the kicker: most AI tools are like sprinters who can’t yet run a marathon. They give you three, maybe four seconds of gold, then they’re out, gasping for breath. So, what did I do? I took these constraints and spun it into a creative plan. I decided on a spoof trailer format — quick, punchy, a rapid-fire of scenes.
So, after a few hours of brainstorming, “Inventing Anna of Arendelle“ emerged as the front-runner. Picture it: a parody mixing the intrigue of Netflix’s “Inventing Anna” miniseries with the whimsy and visual appeal of Disney’s “Frozen.”
But let’s be real for a moment. After sifting through an arsenal of over fifty software options, I had to face the music. As much as AI has evolved, morphing into this tantalizing tool of endless possibilities, it couldn’t quite go the full distance. Not yet. So, part of this journey, part of this story, is about where I had to go old school, where human touch still had to intervene in the digital dance.
It’s about finding that balance, that precarious, often maddening balance, between the new frontier and the time-tested ways of the trade. Stick around, and I’ll walk you through it, every step of the way.
CONCEPT AND PRE-PRODUCTION: Sketching Out the Vision
Alright, let’s dig into the meat of this thing — pre-production. This is where the magic starts, or in my case, where the coffee kicks in and the reality of what I’ve signed up for begins to dawn.
The next phase of ‘Inventing Anna of Arendelle‘ began with a simple prompt to ChatGPT. It was like lobbing a ball over the net to a surprisingly adept player. I first served up a concept, and back came a script laced with potential. Meanwhile, I’m playing the field, testing out other chatbots, but none of these could keep pace with ChatGPT.
As the self-appointed story editor, the task was like thrift store shopping: wade through the volume of AI-generated material to find the storytelling gold. Read, smirk, cringe, edit. It was a relentless search for the holy trinity of film: producibility, comedy, and clarity. Plus, I needed a script lean enough to be doable in a month yet meaty enough to be worth the trouble. And throughout, I resisted the urge to write much, hoping that this short would serve as an example of what these tools are capable of.
Storyboarding posed a unique challenge, constrained by the specific limitations of the AI tools. For instance, I knew I couldn’t move the camera too much, as the animation process I was using wouldn’t allow for it, and I could only hold on a speaking character for 3 seconds at any one time, due to the time limits currently existing on platforms like Pika and Leonardo. So, those constraints ruled out the AI solutions available, which are promising but not super useful just yet. Instead, I did it the ol’ fashioned way. Well, no pencils were harmed in the making of this, so not that old, but it was my stylus and me, with zero AI help.
Voice tracks, however, were a whole other ball game. Thanks to the sorcerers at ElevenLabs, I had a text-to-speech tool that could mimic voices with uncanny accuracy. For the more nuanced lines, their speech-to-speech tool was a godsend. It was like piecing together a Frankenstein’s monster of voice acting — stitching words and phrases to hit the right notes. It’s no substitute for working with a real actor, but in the world of AI, it’s as close as you get to the real deal.
After locking in the animatic, I shifted gears to the intricate world of production design. It’s the stage where this abstract vision starts to look real and let me tell you, there’s nothing quite like seeing a rough idea chiseled into gorgeous designs.
PRODUCTION DESIGN: The Art of Digital Herding
So, production design. The big thing I wanted to nail down was character model consistency. It’s crucial, right? You can’t have your protagonist’s face shape-shifting and morphing around from shot to shot. It’s disconcerting. For anyone watching early successes in the AI animation space, you know that this is one of the harder things to nail down. Thankfully, Midjourney was up to the task, steady and reliable. I’ve also seen great stuff out of Stable Diffusion, but for now I’m hooked on Midjourney.
But, oh, the backgrounds – they’re the silent assassins lurking in the shadows of this AI process. Sneaky, complicated creatures. The same Midjourney tricks I use with characters fall flat with backgrounds. So, I improvised, cheated a bit – especially in that courtroom sequence. I took loads of AI renders into Photoshop, sliced them up, creating layers upon layers, each pieced back together in Premiere. It was like reconstructing a shattered vase, hoping no one notices the glue.
And then, it was on to the text graphics. ChatGPT 4, my digital wordsmith, outshone Midjourney like an Olympic sprinter lapping a weekend jogger. ChatGPT 4’s graphics were next level good. The “This Winter” card in the opener was more than cool – it was a plunge into the Arctic. The “Inventing Anna from Arendelle” title card at the end simply leaps oof the screen, demanding attention. And the “New City Gazette” layout deserves it’s own headline in this write-up.
I found that, in this realm of AI production design, I served as a conductor, editor, and a little bit of a trickster all rolled into one. You’re bending the digital world to your will, one pixel at a time, all the while hoping the seams don’t show. It’s art, it’s craft, it’s a little bit of deception – but then, isn’t that what all creation is about?
PRODUCTION: The Digital Puppetry Dance
So after about two weeks, I found myself at the production stage, the point where all the planning and design either pays off or falls apart. It’s a bit like being on stage, the curtain’s up, and you’re praying you remember your lines.
The process starts with the Midjourney characters, each posed just so, staring back at me from their pristine white backgrounds. First order of business? Get rid of that white and replace it with green, ready for the chroma key magic.
Then came the input for the three AI animation software solutions I chose for this project – Pika, Runway, and Leonardo. I needed the character art and the right prompts. At first I was just writing the prompts myself, but the results weren’t coming out right. Whenever I got stuck, I asked myself if AI could help – that was the whole spirit of this exercise, right? And here, AI was very much up to the task. Particularly – ChatGPT, where I uploaded my character art, new green background in place, and asked it to describe the image. The responses? They were like poetic insights into my own creations, seeing nuances I’d missed, capturing the essence in words I hadn’t even considered.
With these prompts and images, I was ready to dive into the animation rendering process. At this stage, I wasn’t fussing over lip-syncing. No, I was looking for emotion, gestures, the subtleties of body language – and trying to get those secondary animations, like wind in the hair or the flutter of clothes. It’s about adding a touch of realism to figures that are, essentially, pixels pretending to be human.
Pika did the heavy lifting. Sure, some of the shots made me cringe – they were far from perfect. But this wasn’t about creating a masterpiece; it was about exploring the possibilities of AI in animation. I only employed Leonardo late in the process, having discovered it half-way through production, and I was very impressed.
With my characters all moving, it was over to Synclabs to sync up with lips to the dialogue. This is where the characters really came to life. I’d feed in the animated shots along with the dialogue track from my animatic, hoping for that magical moment where movement and voice synced perfectly. Sometimes it worked; sometimes it was back to the drawing board. With Synclabs, I found that key was the human-like quality of the face – the closer to real anatomy, the better the sync. I had a judge with very ‘pushed’ features, and it broke the software.
I played around with other lip-sync solutions, but Synclabs was the front-runner. Its output wasn’t perfect – the resolution was lower than I wanted – so I’d give it a facelift in Topaz, uprezzing to the crispness and clarity needed for a standard 1920×1080 format.
There were a couple of shots where AI just couldn’t cut it. Like shot 17, where Anna reveals her concept art – that was all Midjourney, Photoshop, and then a few quick keyframed motions in Adobe Animate. And then there was shot 40, the close-up of the judge banging his gavel. Leonardo came close, but it needed a human touch for that subtle easing, the blur that brings it to life. Sometimes, you just can’t beat the old ways.
So there it was: a mix of AI wizardry and old-school animation, a dance between the cutting-edge and the time-tested. It was a balancing act, finding harmony between the synthetic and the organic, bringing to life a world that straddled the line between reality and pixelated fantasy.
ASSEMBLY IN PREMIERE
Now we come to the assembly in Premiere, the stage where all the pieces of this digital puzzle start to fit together, or at least, they’re supposed to. It’s a bit like cooking a complex dish, hoping the flavors will blend just right.
Some characters, blessedly, didn’t need the whole lip sync song and dance. They were simply upscaled from Pika or Leonardo and plopped into Premiere alongside their more vocal counterparts. I layered all of the characters over the storyboard panels, keying out the green backgrounds with ultra key, and resizing them to fit the layout on the board panel.
However, certain shots appeared static and devoid of the desired liveliness. A bunch of the animated performances I went with aren’t all that ‘animated,’ so I need to bring more life to the shot. So, back to the drawing board I’d go, creating more characters for the background to add a bit of depth, a bit of life. These extras would get the blur treatment and find their way into the frame, subtly enhancing the scene. Take, for instance, the birds in the sky in shot 17 – designed in Midjourney, brought to life in Pika, and then seamlessly blended into the scene in Premiere.
POST-PRODUCTION: The Final Touches
Post-production was where I added the final garnishes, the special effects. They were simple yet effective – like the photo flashes in shot 42, adding a paparazzi feel to the scene, or the snow fluttering down in the Statue of Liberty sequence. Each effect was carefully keyed and placed, a touch here, a sprinkle there. No AI was used, so yet another opportunity for some crafty software engineers.
Camera moves were sparse but impactful. A gentle zoom here, a subtle push there, like in shot 15, where we zoom up a building, or shot 17, where Anna reveals her plans. The camera was mostly a silent observer, capturing the unfolding story without much fuss. This was by design, as I mentioned above –
The audio was a whole other adventure. I dove into the world of music software, tinkering with Loudly, Suno, and occasionally Beatoven. Some of the more dramatic sounds, the builds, and stings, I snagged from Envato – my go-to treasure trove.
Sound effects, though, I kept traditional. The AI offerings felt like square pegs in round holes, so I leaned on Envato’s assets, mixing and matching to find the perfect auditory accompaniment.
While AI software for song mixing was available, finding tools for dialog-driven audio assemblies proved elusive. So the final mix was manual, done right in Premiere.
And the cherry on top – I used the AI dubbing tool Rask to create a German-language version. That whole process took about 30 minutes.
The Conclusion
And there you have it. It’s done and ready for viewing. The entire project, wrapped up in less than a month, a spare-time venture that cost me about $500 in software subscriptions. I’m still in awe of the power of these tools, the way they’re reshaping the animation landscape. But it’s clear – we’re still a ways off from achieving those high-end animation performances effortlessly. AI Technology is racing ahead, but crafting something truly remarkable, that takes time, patience, and a human touch. For my next project, I’ll likely take a more active role in the writing, go deeper into Stable Diffusion along with OpenPose and AnimateDiff, and see I can create more elaborate animation posing and action. But for now, I’m content with the progress, excited for the future, and always, always learning.