Skip to content Skip to footer

Stability AI has introduced what it calls an enterprise-grade audio generation model.

On Wednesday, the Stable Diffusion maker launched Stable Audio 2.5, an audio model designed for enterprise-grade sound production to help customers create customizable, high-quality audio at scale.

The model has an inference speed of less than two seconds on a GPU, allowing it to generate three-minute-long tracks within seconds. It can also respond to prompts that feature descriptive moods such as “uplifting,” according to Stability AI. Besides supporting audio-to-audio and text-to-audio functionality, the model allows for audio inpainting, meaning users can apply AI tools to their own audio files.

Stable Audio 2.5 comes about 17 months after Stability AI introduced Stable Audio 2.0 in April 2024.

Audio models and the music industry

Stable Audio models bring different functionality to the enterprise compared with other popular generative AI models that focus on speech, text or image.

“We haven’t seen a lot of music/audio models in the enterprise,” according to Arun Chandrasekaran, an analyst at Gartner. “This is a very nascent and niche use case. There are not a lot of other providers that are doing that. In that sense, it’s unique.”

Bradley Shimmin, an analyst at Futurum Group, added that many model makers tended to be conservative in its focus of music generation because of the industry and the possibility of infringement, which could lead to lawsuits. But given Stable Audio 2.5, that appears to have changed.

Leave a comment