LA Noire: The future of facial capture?
On the day of its European release, 3D World takes a look at the MotionScan system behind LA Noire, and its promise of facial capture with no manual clean-up
LA Noire places the player in the role of Cole Phelps. Phelps, and the 400-odd characters he encounters, provide believable performances captured by the MotionScan system
The concept of facial motion capture isn’t a new one. Ever since Robert Zemeckis’ The Polar Express steamed onto cinema screens in 2004, many of Hollywood’s highest-profile movies have been entirely performance captured.
Even its application to real-time projects isn’t new, with recent games from Enslaved to Red Dead Redemption making extensive use of facial capture.
The idea of capturing usable facial data in real time with no manual clean-up, however… now that’s something novel.
Let’s qualify that last statement a little. MotionScan, the facial-capture system used on Rockstar Games’ PlayStation 3 and Xbox 360 title LA Noire, isn’t literally a real-time technology.
But at 15-20 minutes of processed animation data per day – and that’s data ready for use in the game engine, not raw data prior to cleaning – its throughput is well within the limits claimed for ‘real-time’ rendering systems.
Equally importantly, MotionScan is markerless, requires no special make-up and is based solely on off-the-shelf hardware.
With the technology now being marketed to studios worldwide in fields from VFX to forensic animation, we spoke to Depth Analysis, the company responsible, to assess what MotionScan brings to the industry.
But first, some history
Brendan McNamara, founder of LA Noire developer Team Bondi, is no stranger to large projects.
His previous game, PS2 crime series The Getaway, called for the recreation of 25 square miles of central London.
For his new project, the key challenge wasn’t the environments, but the characters – and he quickly became aware that existing facial-capture technology wasn’t up to the job.
“From the very early stages, Brendan had decided that the script was going to be big,” says Depth Analysis’ head of research, Oliver Bao, who began work on the project in April 2004.
“Originally, we estimated that 2,000 separate characters would need to be scanned. So from day one, we decided the process should be automatic.”
Although the cast eventually dropped to a slightly more manageable 400 actors, LA Noire still clocks in at over 2,000 pages of script.
The resulting 55 hours of audio footage, 25-30 of which appear in-game, dwarfs even films such as The Polar Express and Beowulf. Moreover, the facial capture had to be accurate.
LA Noire places the player in the shoes of 1940s LAPD detective Cole Phelps. As he navigates through a seedy underworld of vice and corruption, it falls to the player to uncover the motivations of the people Phelps encounters.
Put bluntly, the animation has to be good enough for you to tell when a character is lying.
The MotionScan process
Actor John Noble during facial capture
Before facial capture can begin, each actor goes through hair and make-up.
Unlike Mova’s Contour system, which uses UV-reflective make-up as an intrinsic part of the capture process, MotionScan simply requires an actor to be made up to reduce the shininess of their skin, avoiding unwanted specular highlights.
For LA Noire, hair geometry was captured along with that of the actors’ faces: a process facilitated by the slicked-back styles of the period.
While it can take up to three hours to marshal long hair into a style compact enough to scan, in most cases it takes only 30-60 minutes.
Once made up, an actor moves to the capture stage: a brightly lit white space Bao describes as “like being in 2001: A Space Odyssey”.
The room is soundproofed to enable Depth Analysis to record audio at the same time as facial performance, so their only contact with the outside world is through a monitor displaying the script, storyboards and a live feed of the director or the character they’re acting against.
Since the capture volume is relatively small (actors can turn their heads 45° left and right and 20-30° up and down), the actor remains seated at all times. The in-game characters’ body and neck movements come from separate full-body shoots.
The 32 cameras trained on the actor’s head are arranged in pairs, enabling the MotionScan system to recreate facial geometry through stereo-matching techniques.
“We have [that many] cameras because we need to cover the entire head, from underneath to the back,” says Bao.
“Each has 50 per cent overlap with its neighbours, so if one of the pairs fails, we have built-in redundancy.”
In all, around 30 separate operations are required to generate final textured geometry from the live image streams.
First, stereo image pairs are compared to obtain disparity maps. These are then used to generate surface patches in point cloud space.
Patches are merged to generate the entire head, then a mesh is fitted to the point-cloud data. Noise is clipped away and the entire data series subjected to temporal filtering to stabilise the surface, then the video textures are projected onto the geometry.
No clean-up required
A single operator oversees the capture session. The MotionScan procedure is specifi cally designed to avoid manual clean-up
It takes around 10 minutes to generate a ‘neutral head’ that the director can use to check the 3D output.
Typically, the only input required is the selection of a ready-made template designed to take account of the actor’s facial proportions.
“For an overweight character, you need to allow more volume for the neck region, for example,” says Bao.
The animated facial geometry can then be exported to an application such as Maya or MotionBuilder in FBX format, or in the proprietary format required by LA Noire’s game engine – a compression process that reduces the data rate from 1GB/s to 100kB/s.
Facial animation isn’t linked to an underlying rig. “There’s no control point to adjust, say, the eyebrow,” explains Bao.
“We allow [Team Bondi’s artists] to touch the textures – the specular maps, the shader look-up maps, those kinds of things – but other than that, they can pretty much just attach props like glasses or more hair.”
This policy of ‘run as recorded’ is a deliberate decision on Depth Analysis’s part. “We didn’t want the animators touching up the data,” says Bao.
“People tend to use their own faces as reference, and every time you do that, you lose a little bit of the [original actor’s] personality. We wanted LA Noire to be as authentic as possible.”
Equally importantly, eliminating the need for manual data clean-up increases throughput.
“We can generate 15 minutes of animation a day. An animator could spend a whole week just touching that up,” says Bao.
“[To adjust the output], it’s quicker just to capture different versions of a performance and mix the takes.”
Comparing the market
The men behind MotionScan: Oliver Bao (left), head of research at technology company Depth Analysis, and Brendan McNamara (right), founder of game developer Team Bondi
So how does MotionScan compare to other facial-capture technologies on the market?
While the raw resolution of 1-3mm is lower than that of Vicon’s marker-based or Mova’s make-up-based systems, both of which claim sub-millimetre accuracy, the in-game footage from LA Noire suggests that it’s more than adequate for the job.
“Whereas actors are traditionally told to ‘go bigger’ so the markers can be read easily, we
tell them to be as natural as they can,” says Bao.
“Sometimes we’ve had complaints from the QA people that the facial performance needs to be [made less subtle] to make gameplay easier!”
While the need for the actor to remain seated on the capture stage is restrictive, the lack of manual clean-up more than offsets the time required for a separate full-body shoot.
Bao is currently working on increasing the capture resolution of the system and extending the same techniques to full-body capture.
While he admits that the latter – given the issues posed by self-occlusion and motion blur – is “a totally different ball game”, he envisages that this shouldn’t take as long as the development of the original
Depth Analysis also aims to expand its capture facilities from a single studio in Culver City, California, to a set of satellite stations around the world, and is in talks with as-yet-undisclosed potential clients in sectors ranging from visual effects to medical and forensic animation.
Based on their feedback, Bao is working on adding re-targeting capabilities to the system, to enable studios to transfer capture data to a character with different facial proportions.
For most people, however, the first proof of MotionScan’s efficacy comes with the release of LA Noire.
Early demos of the game technology at E3 last year generated a wave of excited press coverage, with Official PlayStation Magazine editor-in-chief Tim Clark declaring that the facial animation “blew him away” – an experience Bao claims is common among beta testers.
“Gamers usually stop talking entirely,” he says. “But they can’t keep their mouths closed, either. They just sit there staring at the screen in shock.
With the more studio-type people, you can see them thinking, ‘How much is this going to cost me?’”
Bao himself admits to a mixture of excitement and relief to be coming to the end of a seven-year journey, including 80 full days of capture sessions.
So will he play LA Noire on its release?
“Probably not on launch day one,” he laughs. “I’ve seen too much of it already. But I can’t wait to hear what audiences think of it.”
LA Noire on the PS3
LA Noire on the XBox
on Friday, May 20th, 2011 at 5:33 pm under Features, Technology.
You can subscribe to comments.
You can leave a comment, or trackback from your own site.
Tags: facial capture, LA Noire, MotionScan