TLDR
- SAM Audio is Meta’s first unified AI model that isolates specific sounds using text, visual, and time-span prompts.
- The model simplifies audio editing by allowing users to remove background noise or isolate instruments without advanced tools.
- It is accessible through the Segment Anything Playground, where users can upload files or try sample media.
- SAM Audio supports creative and research use cases in music, film, podcasts, accessibility, and scientific analysis.
- Meta offers the model for download, enabling integration into offline workflows and existing development pipelines.
Meta has officially released SAM Audio, a new AI model capable of isolating sounds from mixed audio using text, visual, or time-span prompts, and the tool is now available through the Segment Anything Playground, where users can try it with sample media or upload their own, while the model can also be downloaded for custom implementations, making it accessible for creative and technical applications across multiple industries starting today.
SAM Audio Supports Flexible Prompts for Accurate Sound Editing
Meta confirmed SAM Audio is the first unified model designed to perform sound segmentation using three prompt types: text, visuals, and time spans, making the editing process easier and more accurate for users working with real-world audio recordings. The model supports tasks such as removing unwanted background sounds, isolating specific instruments, or extracting dialogue with minimal effort, all while mimicking the natural way people perceive sound.
It processes various types of complex audio with precision, offering support for media professionals, hobbyists, and researchers. Meta described the system as intuitive and accessible, saying, “This unified approach reflects how people naturally engage with sound.” Users can interact with SAM Audio in Meta’s newly launched Segment Anything Playground, where audio and video samples are available, or upload personal files for instant testing.
The model performs audio separation efficiently, allowing users to perform professional-level editing with simple instructions, such as filtering out traffic noise from a video shot outdoors or separating vocals from a band performance. SAM Audio allows users to achieve these results without requiring technical expertise or separate tools, thanks to its multi-modal input system. Meta emphasized that SAM Audio simplifies editing while supporting a wide range of use cases within a single framework.
Creative, Technical, and Scientific Use Cases Enabled
SAM Audio brings support for content creators in music, film, podcasting, and television, while also offering value in scientific research and accessibility development. The model allows users to focus on their creative vision instead of navigating fragmented audio tools, and it supports natural workflows for editing and analysis.
Meta is already using SAM Audio internally to develop the next generation of AI-powered media tools. The company designed it to handle complex, real-world scenarios, such as isolating bird sounds in nature recordings or removing ambient distractions from speech. The model is now downloadable, making it easier for developers and researchers to integrate it into existing pipelines or offline environments.
It does not rely on studio-quality audio to work and delivers results on mixed, noisy recordings typical of field or real-world use. According to Meta, the goal is to make sound editing simpler, faster, and more aligned with how people hear and describe sound in daily life. SAM Audio is live on Meta’s Segment Anything Playground for exploration and full-featured use.


