HUMAN BEINGS are good at locating the sources of sounds. Even when blindfolded, most people can point to within ten degrees of the true direction of a sound’s origin. This is a useful knack for evading danger. It is also an extraordinary cerebral feat. Partly, it is a matter of detecting minute differences of volume in each ear. Partly, it comes from tiny disparities in the time it takes a sound to reach two ears that are not equidistant from its source. The heavy lifting of sound-location, however, involves something else entirely.
Audio buffs call it the head-related transfer function. A sound is modulated by the body parts it encounters before it reaches the eardrums. In particular, the various tissues of the head attenuate higher frequencies, weakening the top notes of sound waves that have passed to an eardrum through the skull compared with those from the same source that have arrived directly through the air. The cartilaginous ridges, troughs and protuberances of the outer ear also alter sound before it is transduced into nerve signals. Sounds arriving from different angles are therefore modified in consistent ways that the brain learns to recognise.
For all of their acoustic spatial awareness, however, brains can still be fooled by appropriate technology into believing a sound is coming from somewhere that it is not. That sounds like the basis of a big business. And it is.
One way to simulate the “immersive” sound of reality through a pair of earbuds is by using a pair of recordings made with microphones embedded in the ear canals of a special dummy head. These heads are made to have the same shape and density as those of their flesh-and-blood counterparts. That means they modulate sound waves passing through them in a realistic manner. Recordings made using them therefore log what would arrive at the ear canals of someone listening to the sound in question for real. When they are played back, what a user hears recapitulates that experience, including the apparent directions from which the sounds are coming.
Dummy-based binaural recordings of this sort have been around for a while. But making them is clunky. It is also expensive. A good dummy head can cost $10,000, and time in a professional recording studio is hardly cheap. These days, though, the process can be emulated inside a computer. And that is leading to a creative explosion.
The trick that the emulator must master is a process called phase modulation. This involves retarding a sound’s high, medium and low frequencies by the slight but varying fractions of a second by which those frequencies would be delayed by different parts of the ears and head in reality. So writing the appropriate software starts by collecting a lot of data on how sound waves interact with a human head, and that means going back to the studio to conduct special binaural recordings, often using people instead of dummies. The resulting signals can then be decomposed into their component frequencies, which yields an understanding of how to modulate a given frequency to make it seem as if it is arriving from a particular location.
Demand for software to mix sound in this way has shot up says Lars Isaksson of Dirac Research, a firm in Uppsala, Sweden. Dirac developed its own version of such software, known as Dirac 3D Audio, by using a year’s worth of recordings it made that encompassed each degree of rotation, both side to side and up and down, around a listener’s head. This panaudicon provided, Mr Isaksson says, notable smoothness in the simulated movement of sound sources. Makers of video games are a big market for such stuff.
Dirac is not alone. Half a dozen other firms, including Dolby Laboratories of America and Sennheiser of Germany, also now make immersive software. To use it, a sound engineer employs a graphic interface that includes a representation of a sphere surrounding an icon representing the listener. The engineer uses a mouse to move sound channels—vocals, percussion and so on, if the product is music—to the points in the sphere from which their outputs are intended to originate. Software of this sort provides a way to take any recording and “project it in 3D”, says Véronique Larcher, co-director of Sennheiser’s division for immersive audio.
Sennheiser’s product is called AMBEO. Dolby’s is called Atmos. This has generated the soundtracks of more than 20 video games and 2,500 films and television shows, as well as many pieces of music. Immersive sound may even come to videoconferencing. Dirac is promoting software that makes the voices of participants seem to emerge from the spots on the screen where their images appear. The software uses a laptop’s camera to track listeners’ heads. To those who look, say, left, it will sound as though their interlocutors are off to the right. Dirac is in talks with videoconferencing firms including BlueJeans, Lifesize and Zoom.
Facebook, a social-media company, is also designing “spatialised audio” for video calls that use its Oculus virtual-reality headsets. Ravish Mehra, head of audio research at Facebook Reality Labs, is coy about how long it will take his team to perfect the aural illusion that this is intended to create. But he says software the firm has in development can modify the frequencies and volumes of sounds so that they match the virtual surroundings chosen for a call, as well as the speaker’s perceived position. The acoustics of a beach, he notes, are unlike those of a room.
Tin pan alley
Such stuff is for the professionals. But amateurs can play too. For the man or woman in the street who wants to jazz up a record collection, many simpler programs now permit people to give a more immersive feeling to their existing recordings by running them through software that modulates the sounds of those recordings to achieve that end.
Programs of this sort cannot handle different parts of a recording differently in the way that studio-based systems manage, but they do create an illusion of sonic space around the listener. Isak Olsson of Stockholm, who has put together two such packages, 8D Audio and Audioalter, describes them as seeming to increase the size of the room. This helps to overcome a phenomenon known as the “in-the-head experience”. And, as Michael Kelly, head of engineering at Xperi, an immersive-software firm based in California, observes, sounds that appear to come from outside the head are more comfortable.
At the other end of the technological scale from such do-it-yourself kits, a number of firms, Dirac, Dolby, Facebook, Sony and Xperi among them, are working on a bespoke approach to sonic immersion. They are tailoring it, in other words, to an individual listener’s anatomy.
One method, that being used by Sony, is to ask potential customers to upload photographs of their ears. Another, which may be adopted by Xperi, is to repurpose data from the face-recognition systems that now unlock many people’s smartphones. If this way of thinking works, it will bring with it the ultimate in high fidelity. This is a recognition that, in the real world, even if what they are hearing is the same set of sound waves, every listener’s experience is different—and that this needs to be replicated in the world of recorded sound, too. With that realisation, acknowledgment of the head-related transfer function’s importance has reached its logical conclusion. And the term “headbanging” may take on a new and positive meaning.■
This article appeared in the Science & technology section of the print edition under the headline “An auricular spectacular”