Microsoft researchers have announced a new application that uses artificial intelligence to mimic a person’s voice with just a few seconds of training. Voice models can be used for text-to-speech applications.

The application, called VALL-E, can be used to synthesize high-quality individual speech with only three seconds of nomination recording of the speaker as an acoustic signal, the researchers wrote in a paper published online on arXiv, a Free distribution service and an open-access archive for scholarly articles.

There are now programs that can cut and paste speech into an audio stream, and that speech is converted from typed text to the speaker’s voice. However, the program must be trained to simulate a person’s voice, which can take an hour or more.

“One of the extraordinary things about this model is that it does it in a matter of seconds. It’s very impressive,” Ross Rubin, principal analyst at Reticle Research, a consumer technology advisory firm in New York City, told TechNewsWorld.

According to the researchers, VALL-E outperforms current state-of-the-art text-to-speech (TTS) systems in both speech naturalness and speaker similarity.

In addition, VALL-E can preserve the speaker’s emotions and acoustic environment. So if a speech sample was recorded on a phone, for example, text using that voice would sound like it was being read through a phone.

‘super impressive’

VALL-E is a noticeable improvement compared to previous state-of-the-art systems like YourTTS, due to be released in early 2022, said Giacomo Micelli, a computer scientist and Werner Herzog, creator of a website featuring an AI-generated, never-ending discussion and the synthesized speech of Slavoj Zizek.

“The interesting thing about VALL-E is not just that it only needs three seconds of audio to clone a voice, but also how much it can mimic that voice, emotional timing, and any background noise.” can match closely,” Michaeli told TechNewsWorld. Ritu Jyoti, group vice president of AI and automation at IDC, a global market research company, called VALL-E “significant and highly impactful”.

“This is a significant improvement over previous models, which required a much longer training period to generate a new sound,” Jyoti told TechNewsWorld.

“It’s still early days for this technology, and more improvements are expected to make it more human-like,” he added.

emotion simulation questioned

Unlike OpenAI, the creator of ChatGPT, Microsoft has not opened VALL-E to the public, so questions remain about its performance. For example, are there factors that could cause degradation of the speech produced by the application?

“The longer the audio snippet generated, the more likely a human is to hear things that seem a bit distant,” Micheli said. “Words in speech synthesis may be ambiguous, omitted, or duplicated.”

“It’s also possible that switching between emotional registers will feel unnatural,” he said.

There is also doubt in the application’s ability to simulate the speaker’s emotions. “It will be interesting to see how strong the potential holds,” said Mark N. Vena, president and principal analyst at SmartTech Research in San Jose, Calif.

“The fact that they claim this is hard to believe with only a few seconds of audio,” he continued, “given the current limitations of AI algorithms, which require a lot of voice samples. “

ethical concerns

Experts see beneficial applications for VALL-E as well as some non-profit applications. Jyoti cited speech editing and replacing voice actors. Miceli said the technology could be used to build editing tools for podcasters, customize the sound of smart speakers, as well as incorporate them into messaging systems and chat rooms, videogames and even navigation systems.

“The other side of the coin is that a malicious user could clone a politician’s voice, and have them say things that sound absurd or inflammatory, or just to spread false information or propaganda in general,” Miceli said. Told.

If it’s as good as Microsoft claims, Vena sees huge abuse potential in the technology. “At the level of financial services and security, it is not difficult to accept use cases by rogue actors who can do really harmful things,” he said.

Jyoti also sees ethical concerns emerging around VALL-E. “As technology advances, the sounds produced by VALL-E and similar technologies will become more reliable,” he explained. “This would open the door to genuine spam calls that mimic the voices of real people a potential victim knows.”

“Politicians and other public figures can also be impersonated,” he added.

“There could be potential security concerns,” she continued. “For example, some banks allow voice passwords, which raises concerns about misuse. We can expect an increase in the arms race between AI-generated content and AI-detecting software to prevent misuse. Huh.

“It is important to note that currently VALL-E is not available,” Jyothi said. “Overall, it is important to regulate AI. We will have to see what measures Microsoft takes to regulate the use of VALL-E.”

enter lawyers

Legal issues may also arise around the technology. “Unfortunately, there may not be existing, adequate legal tools to deal directly with such issues, and instead, a hodgepodge of laws that cover how the technology is misused reduce such misuse. can be used to,” Michael L. Principal at Harness IP, a national intellectual property law firm.

“For example,” he continued, “voice cloning can result in a deepfake of a real person’s voice that can be used to deceive a listener or even be used to mimic the voice of an election candidate.” While such misuse would raise legal issues in the area of ​​fraud, defamation, or electoral misinformation laws, there is a lack of specific AI laws that would deal with the use of the technology itself.

“Further, depending on how the initial voice sample was obtained, there may be implications under the federal Wiretap Act and state wiretap laws if the voice sample was obtained over, for example, a telephone line,” he said. .

“After all,” Teich said, “in limited circumstances, there may be First Amendment concerns if such voice cloning is used by a government actor to silence, delegate, or dilute legitimate voices from exercising their free speech rights.” is done to.”

“As these technologies mature, there may be a need for specific laws to directly address the technology and prevent its misuse as the technology advances and becomes more accessible,” he said.

make smart investments

In recent weeks, Microsoft AI has been making headlines. ChatGPT is expected to be incorporated this year into its Bing search engine and possibly its Office apps. It also reportedly plans to invest $10 million in OpenAI — and now, VALL-E.

“I think they’re making a lot of smart investments,” said Bob O’Donnell, founder and principal analyst at Technalysis Research in Foster City, Calif., a technology market research and consulting firm.

“They jumped on the OpenAI bandwagon several years ago, so they’ve been behind the scenes on this for quite some time. Now it’s coming out in a big way,” O’Donnell told TechNewsworld.

“They’ve had to play catch-up with Google, which is known for its AI, but Microsoft is making some aggressive moves to come to the forefront,” he continued. “They’re jumping on the popularity and the incredible coverage that all these things are getting.”

Rubin said, “Microsoft, having been the leader in productivity for the last 30 years, is looking to preserve and extend that leadership. AI may hold the key to that.”

Microsoft has announced a hands-on preview for commercial customers of its new Teams premium product designed to make meetings more personal, intelligent and secure.

The premium product includes many attractive features, such as:

  • Using artificial intelligence to provide live translation and intelligent recaps of meetings with autogenerated chapters and suggested action items and insights;
  • Advanced security with the use of watermarks, end-to-end encryption, and sensitivity labels to prevent copy and pasting of chat sessions;
  • Tools for creating and managing high-quality webinars;
  • Virtual Appointment Dashboard to control the end-to-end virtual appointment experience; And
  • Ability to expand company image in meetings by branding background.

“This is an opportunity for Microsoft to open up monetization opportunities beyond Microsoft 365,” said Ross Rubin, principal analyst at Reticle Research, a consumer technology advisory firm in New York City.

“You’ll get basic-level functionality, but more functionality at the higher price levels,” Rubin told TechNewsWorld.

Race for AI Solutions

The AI ​​feature does many things not done in meetings, such as providing outlines, notes and translations for their audience, said Rob Enderle, president and principal analyst at the Enderle Group, an advisory services firm in Bend, Ore.

“I expect this AI component to be the defining difference between platforms in the future,” he told TechNewsworld. “Powerful conferencing solutions are racing to see who can provide the most powerful AI-based solution.”

Intelligent Recap holds a lot of promise for helping organizations get the most out of meetings, said JP Gounder, vice president and principal analyst at national market research company Forrester Research.

“Too often, follow-ups and action items are forgotten after the meeting,” Gounder told TechNewsWorld. “Those who missed the meeting struggle to find the value of the meeting.”

“Intelligent Recap promises to automate the process of extracting follow-ups, action items, and meeting content,” he continued. “It will take some time to learn from real-world meetings, but it promises to increase the value of meetings and connect them to business actions.”

more efficient meetings

In some ways, the new tools in Teams Premium make virtual meetings more efficient than in-person meetings, said Michael Inouye, a principal analyst at ABI Research, a global technology intelligence firm.

“By more efficient, I mean making access to information from previous meetings and follow-up more streamlined and easier,” Inoue told TechNewsWorld.

He clarified that in a face-to-face meeting, any work on the whiteboard may not be included in the meeting notes. Similarly, note-taking is often not shared among participants or may be specialized to an individual’s note-taking style.

“Creating chapters and tagging recorded meetings makes searching through the archives much more efficient,” he continued. “Instead of trying to remember the date of a particular meeting by checking your notes, you can search for a topic or other information of interest.”

“These tools can benefit in-person meetings as well, because those conversations can be recorded and processed in the same way, so it’s not exclusive to virtual,” he added.

Features Too Good for Paywall

New security features in Teams Premium have also drawn praise. Forrester analyst Will McCann-White said, “The security enhancements like copy/paste controls and E2E encryption for groups are all excellent.”

However, he questioned the limitation of features to a premium offering. “It’s strange that these are divided outside of the standard Teams platform,” he told TechNewsWorld.

McKeon-White was also commended for joining the Teams Premium webinar. “There is a great need for offering webinars from a competition point of view and this will help organizations further strengthen an offering,” he added.

While praising the product’s translation feature, he also lamented its limitations. “Live translation is going to be transformative for how organizations communicate,” he predicted. “It’s a shame to see this change inside the paywall.”

One feature of Teams Premium that is getting mixed reviews is its branding feature.

Mark N., president and principal analyst at SmartTech Research in San Jose, Calif. “I think the branding-focused features are interesting and potentially different than what Zoom and Cisco’s solutions are offering,” Vena said.

“It appears that Microsoft is moving toward a more personalized experience with the premium version of Teams, which I think will be useful,” Vena told TechNewsWorld.

“The ability to create more customized experiences will be valued by some users, and I think the ability to add your brand will be valued by users who create video podcasts or conduct webinars,” he said.

ahead of time

While an interesting idea, the brand extension feature could be problematic, stressed Enderle. “Using a tool like this to push a brand requires marketing to have a direct say in the outcome, which isn’t the case here,” he added.

“It would be like providing a medical device without medical oversight,” he explained. “I don’t think you can do a brand feature without deeply involving marketing in the resulting process. That’s not the case here.

Inouye said that branding is usually not prioritized in virtual types of communication and collaboration. “Going forward, if virtual C&C becomes more widespread or more generally, I can see it becoming more valuable,” he added.

“It may be a little ahead of its time,” he continued. “No harm done, but it won’t be a significant selling point.”

Inoue said Teams Premium should help position Microsoft in the communications and collaboration market. However, he added, “it’s hard to say whether this will change the competitive landscape in a meaningful way, at least not yet.”

“Companies have reduced virtual events, which means a company may see less value from a more integrated solution,” he explained. “For a handful of events going with a third party can be as good an option as a more integrated solution.”

word of caution

Vena argued that Microsoft is playing catch-up in the video conferencing space as Zoom became the market leader during the pandemic, focusing on ease of use.

“But Zoom continues to face protests over security concerns, and Microsoft has a perceived advantage when it comes to protecting privacy,” he said. “This new solution should move the ball forward in increasing its appeal to Teams, especially with enterprise and SMB users.”

McCown-White said all of the enhancements are logical and add value to the Teams platform. “My biggest issue with Microsoft’s approach is they compartmentalized behind a paywall,” he said.

“Any time AI/ML features are divested outside of a platform, it provides an opening for competitors,” he said.

There’s a lot to like in this rollout, as Microsoft continues to evolve its AI to deliver more valuable features, noted Wayne Kurtzman, vice president of collaboration and communities research at IDC, a global market research company.

However, he cautioned: “Microsoft will need to add more benefits to maintain the premium value, as some of their competitors are likely to include some of these features in their core product. Regardless, feature innovation is likely to maintain a high velocity.