FAQ
Questions people ask before trying SelectVoice.
A practical overview of video-guided voice isolation, what to expect, and how uploaded videos are handled.
How is SelectVoice different from noise reduction?
Noise reduction tries to clean the entire audio track. SelectVoice is built for a different job: isolating one intended visible speaker from a mixed scene. It uses visual context from the video, including facial movements and other cues, to help guide extraction toward the selected voice.
How does video help isolate a voice?
Mixed audio can be ambiguous when voices overlap. Video provides an additional signal that can help guide voice isolation toward the intended visible speaker. Facial movements and other visual cues help the system focus on that speaker's voice instead of treating every voice in the mix equally.
Does the speaker need to be visible?
SelectVoice works best when the speaker you want is visible during the section you care about. The visual context is part of how the system isolates and extracts audio from the intended speaker.
What happens when multiple people are speaking?
SelectVoice is designed for scenes where multiple voices or background sounds compete with the speaker you care about. Visual context helps focus the result on the intended visible speaker rather than the whole audio mix.
Do I need to pick the speaker manually?
Typically, yes. In the interactive workflow, you choose the visible speaker you want to hear. API customers can also use automatic face detection workflows for speaker selection. Contact us for details.
What kinds of clips work best?
The best fit is spoken video where the intended speaker is visible and their voice is present but hard to hear. Examples include interviews, phone videos, field recordings, event footage, crowded rooms, and documentary clips with background noise or competing voices.
What kinds of clips are hardest?
The hardest clips are ones where the voice is barely present, heavily distorted, clipped, covered by loud music, or not paired with a visible speaker. Video can guide the system, but it cannot fully recover speech that is not meaningfully available in the source.
What do I get back after processing?
You get a processed video result back with the extracted audio result. The video is used as an input signal to guide isolation, and the returned result lets you review the selected speaker's cleaned voice in context.
Can I preview the result before buying credits?
Yes. SelectVoice offers a free preview trial so you can hear whether the workflow helps before buying credits. Processing runs in the cloud using GPUs and other compute needed for video-guided voice isolation.
What file types can I upload?
SelectVoice supports common web video formats, including MOV, MP4, and WebM. If your footage is in another format, export it to one of those formats before uploading.
How are uploaded videos handled?
Uploaded videos are stored and processed through our private cloud servers. We do not send uploaded video or audio to third-party AI services for processing. Files are used to run the job you request, then removed according to the retention schedule in the privacy policy.
Is there an API?
Yes. API workflows are available for higher-volume or integrated use cases, including automatic face detection options. Contact us to discuss API access and production workflows.