Sound Business: The Promise of Audio Machine Learning Technologies
New machine learning technologies offer potential value creation through sound detection, analysis, and creation.
News
- Qatar’s Snapchat Users Among Region’s Most Digitally Active, Study Shows
- Iran-Linked MuddyWater Behind Phishing Campaign in MENA
- 66% of UAE Organisations Flag AI’s Speed of Growth as a Security Challenge
- Google’s 'Quantum Echoes' Marks First Verifiable Leap Beyond Supercomputers
- Commercial Bank of Dubai Q3 Profit Hits Record AED 2.83B, Up 15.6%
- Dubai Hospital Performs 145 Robotic Surgeries, Slashing Blood Loss by 90%
 
                    Carolyn Geason-Beissel/MIT SMR | Getty Images
Sounds are everywhere — the chatter and babble of humans and animals, the whirring and thrumming of machines, the background hum of the natural environment, and the murmur of bees on a summer day. These sounds provide crucial input to our decision-making, whether as pedestrians crossing the road or as engineers testing the safety of a vehicle or machine.
But until recently, the systematic analysis of sounds in dynamic situations — a busy train station, a shopping mall, or an urban park, for example — has been difficult due to the sheer number of complex acoustic signals interacting at once. But this is now changing, thanks to major advances in sensor technology and deep learning algorithms that can harvest enormous quantities of acoustic input and rapidly extract key information.
Two branches of sound-related machine learning are emerging: one focused on the detection and analysis of sounds and the other on the AI-powered creation of sounds. Both have significant potential for business and societal value creation. In fact, according to one estimate, the global market for AI audio-recognition technologies is set to more than triple, from $4.1 billion in 2021 to $14.1 billion by 2030.
Deep learning algorithms are now being used to pioneer innovations across a diverse range of industries and sectors, such as these applications of sound detection and analysis:
Commercial and household security. Every year, U.S. businesses and consumers spend billions of dollars protecting buildings and other physical assets. Smart home devices, such as Amazon’s Echo, already use AI-driven voice recognition technology to authenticate different users and provide personalized entertainment and shopping experiences. But now AI systems’ sensors and deep learning algorithms can analyze nonvoice ambient sounds from every part of an office, a factory, or a military facility, distinguishing between innocuous sounds and those that may indicate an emerging threat, such as breaking glass.
Health care. AI sound technologies could transform many areas of health care, particularly by enabling low-cost, rapid diagnoses of diseases in their early stages. In health care, they could be used to provide real-time measurements for an array of biometric data, such as heart rate, blood pressure, respiration rate, and stress level. Deep learning has been used to extract and classify the “crackles” and “wheezes” of different lung diseases. Cochl, a South Korean startup, is pioneering AI applications to rapidly identify health problems based on the coughs and sneezes of patients. Such early-warning systems could prove to be pivotal in the fight against COVID-19 and future virus outbreaks.
Hearables. Acoustic deep learning technologies are being pioneered in the “hearables” market — which includes headphones, earphones, and other listening devices — to improve people’s listening experience. Such technologies can screen out unwanted noise or alert users to potential dangers, such as when someone is wearing earphones while jogging near traffic. Machine learning algorithms can be used to curate listening content to different contexts, such as by playing more relaxing or softer sounds when breathing signals indicate signs of stress, or more stirring content when the user is exercising.
Retail and leisure industries. Machine learning technologies can now recognize individuals through the sounds of their footsteps by isolating these distinctive footfall echoes from background noises and the sounds of other pedestrians. Sound-based gait-recognition technologies offer significant advantages over other surveillance systems, given that they work even in poor lighting conditions and are less intrusive than facial recognition, computer vision, or biometric identification systems.
One of the biggest potential applications of this technology is in footfall-intensive industries such as retail. Sound-based footfall recognition can be used to recognize returning customers and help identify linger points on the shopper’s journey — places where they pause to compare different products or respond to particular discounts or promotions — or pinpoint the time sensitivity of customers at different times of the day or week, based on the rapidity of footsteps.
Predictive maintenance and early-warning systems. Deep learning algorithms can analyze acoustic signals such as noise pressure and reverberations from machines and engine parts to assess wear and tear and predict when a particular part is likely to need replacing. NASA has used sound-sensing algorithms to monitor the functioning of equipment aboard the International Space Station. Deep learning algorithms are being used to classify underwater acoustic signals to potentially develop an early-warning system for deep-sea earthquakes and tsunamis.
Marketing and media content production. Speech-to-text technologies have been around for some time, but the reverse is now happening with the growth of AI-powered text-to-speech or video-to-sound technologies. AudioStack, a U.K.-based startup, provides AI-enabled audio creation, drawing on a database of more than 600 voices in more than 60 languages. Use cases include generating audio ads with different regional nuances, music, or voice tones, or generating synthetic or cloned voices for podcasts or narrated news flashes based on textual content. The algorithm that DeepZen uses for artificial voice creation from text is able to infer different emotional tones from the content, such as excitement, enthusiasm, or comfort. AutoFoley was developed to replicate the role of Foley artists, who add audio effects — such as the gallop of a horse or footsteps on a staircase — to movies. Applications like these have huge potential in industries such as gaming, marketing, and publishing, where global demand for voiceovers is forecast to grow by 9% per year to reach $2.3 billion by 2026.
Recommendations for Business Leaders
The rise of AI-based sound detection and creation will raise several challenges for businesses, especially in the intellectual property arena. To capitalize on the opportunities and mitigate the risks, business leaders should consider these recommendations:
Understand and protect your sonic assets. Most large companies will trademark or register a well-known jingle or catchphrase associated with their products or services, but in a world of AI-based sonification, businesses will also need to carefully assess the sonic print of their entire product portfolios — the distinctive whir of a washing machine or vacuum cleaner, the signature ignition of a sports car, the pop of a soda can opening. Counterfeiting of sounds will become a growing problem that businesses will need to prepare for.
“Sonify” the product experience. Businesses will pay increasing attention to the sonic signatures of their products, either to diminish cacophonous elements (such as a noisy clothes dryer) or to enhance aesthetically pleasing sounds (such as the tick-tock of a car’s turn indicator). In the future, manufacturers could incorporate sonic signatures that track the product life cycle — for example, by varying a sound profile from a consumer durable as its components start to wear out. Businesses of all types will need to make sonic signatures a core part of user interface design, by identifying both the right functional and informational sounds for different uses and contexts, and by combining these with visual and other sensory elements.
Prompt for sound. Generative AI systems — large language models capable of creating and synthesizing text, sounds, speech, images, and code — are increasingly enabling businesses to create completely new sounds and sonic combinations, or to combine audio with video, text or images to create new product campaigns. With a single prompt, marketers will be able to create an advertising tune for a new brand of ice cream, turn a wordy blog or product description into a music video, or alter a product jingle to make it more appealing to younger or older consumers or potential customers in a new market overseas. With “prompt engineering” now becoming a major discipline in the AI and automation fields, businesses will need to train designers, marketers, and product developers to write efficient prompts for sound-based use cases, taking into account factors such as brand values, product tone of voice, intellectual property rules, and ethics.
As the saying goes, “Wisdom is the reward for a lifetime of listening.” Today, AI-powered acoustic technologies are opening up a world of sounds previously inaccessible to us. They’re providing new insights and opportunities for businesses and policy makers in areas as diverse as consumer behavior, health care, urban planning, commercial security, and infrastructure management. Businesses that choose to listen to this new world, and act upon the insights, will be well positioned for future success.
 
                                                 
                                                 
                                                 
                                                 
    
	
