More human than human —

Epic’s new motion-capture animation tech has to be seen to be believed

"MetaHuman Animator" goes from iPhone video to high-fidelity 3D movement in minutes.

Would you believe that creating this performance took only minutes of video processing and no human tweaking?
Enlarge / Would you believe that creating this performance took only minutes of video processing and no human tweaking?

SAN FRANCISCO—Every year at the Game Developers Conference, a handful of competing companies show off their latest motion-capture technology, which transforms human performances into 3D animations that can be used on in-game models. Usually, these technical demonstrations involve a lot of specialized hardware for the performance capture and a good deal of computer processing and manual artist tweaking to get the resulting data into a game-ready state.

Epic's upcoming MetaHuman facial animation tool looks set to revolutionize that kind of labor- and time-intensive workflow. In an impressive demonstration at Wednesday's State of Unreal stage presentation, Epic showed off the new machine-learning-powered system, which needed just a few minutes to generate impressively real, uncanny-valley-leaping facial animation from a simple head-on video taken on an iPhone.

The potential to get quick, high-end results from that kind of basic input "has literally changed how [testers] work or the kind of work they can take on," Epic VP of Digital Humans Technology Vladimir Mastilovic said in a panel discussion Wednesday afternoon.

A stunning demo

The new automatic animation technology builds on Epic's MetaHuman modeling tool, which launched in 2021 as a way to manually create highly detailed human models in Unreal Engine 5. Since that launch, over 1 million users have created millions of MetaHumans, Epic said, some from just a few minutes of processing on three photos of a human face.

The main problem with these MetaHumans, as Mastilovic put it on stage Wednesday morning, is that "animating them still wasn't easy." Even skilled studios would often need to use a detailed "4D capture" from specialized hardware and "weeks or months of processing time" and human tweaking to get game-usable animation, he said.

Watch Melina Juergens' performance transformed into a stunningly accurate 3D animation in just minutes.

MetaHuman Animator is designed to vastly streamline that process. To demonstrate that, Epic relied on Ninja Theory Performance Artist Melina Juergens, known for her role as Senua in 2017's Hellblade: Senua's Sacrifice.

Juergens' 15-second on-stage performance was captured on a stock iPhone mounted on a tripod in front of her. The resulting video of that performance was then processed on a high-end AMD machine in less than a minute, creating a 3D animation that was practically indistinguishable from the original video.

The speed and fidelity of the result drew a huge round of applause from the developers gathered at the Yerba Buena Center for the Arts and really needs to be seen to be believed. Tiny touches in Juergens' performance—from bared teeth to minuscule mouth quivers to sideways glances—are all incorporated into the animation in a way that makes it almost indistinguishable from the original video. Even realistic tongue movements are extrapolated from the captured audio, using an "audio to tongue" algorithm that "is what it sounds like," Mastilovic said.

What's more, Epic also showed how all those facial tics could be applied not just to Juergens' own MetaHuman model but to any model built on the same MetaHuman standard. Seeing Juergens' motions and words coming from the mouth of a highly stylized cartoon character, just minutes after she performed them, was striking, to say the least.

The human performance in this trailer "hasn't been polished or edited in any way and took a MetaHuman animator just minutes to process, start to finish."

The presentation finished with the debut of a performance-focused trailer for the upcoming Senua's Saga: Hellblade II. That trailer is made all the more impressive by Mastilovic saying that Juergens' full-body motion-captured performance in it "hasn't been polished or edited in any way and took a MetaHuman animator just minutes to process, start to finish."

Channel Ars Technica