Featured & Pricing
Bring truly lifelike AI voice to your projects
Start with a custom synthetic voice clone, securely built, and create personalized content in multiple languages using verified AI voices that sound human. Great for everything from film to podcasts.
- Clone voices using text-to-speech or speech-to-speech
- Create custom models securely and with consent
- Manage end-to-end voice needs and licensing in one place
- Protect your model with inaudible watermarks and traceability
- Generate new audio clips seamlessly once your model is built
- Translate content into multiple languages to reach new audiences
- Create your own lexicon for custom terminology recognition
- Monetize custom voices for your podcast via Veritone Voice Network
How it works
Input audio, build a model, and create content
Step 1: Secure consent
As ethical cloning pioneers, we never build a voice model without approval. The individual whose voice will be used must provide their explicit consent. If the talent is deceased or in the public domain, the estate or IP owner must sign off.
Step 2: Input pre-existing or newly recorded audio content
Next, we need about three hours of high fidelity, isolated audio recording which we’ll use to train the model. We can use pre-existing audio or provide scripts to record. Content should model the desired output style, and multiple models can be built to accommodate different styles and languages.
Step 3: Customize voice content
Once the model is built, you can use the self-serve app for both text-to-speech and speech-to-speech content creation in near real-time. Or work with our experts to manage your output needs. Additional models in new languages can be built in about two days.
Real-world AI voice success
The Veritone Voice Network allows us to not only expand into new markets, but to authentically engage with our audience and build out those communities in ways that were not previously possible.
Doug Ellin, HBO hit series Entourage’s Emmy-award winning writer, producer, and creator
Veritone Voice has opened a whole new door for us. We have an answer to our core challenge—how can we get this content in front of a global audience at scale and with minimal cost in both time and resources? Veritone removes the barrier of language fluency to maximize the reach of my voice and message, and build communities outside of English-speaking markets.
David Meltzer, The Playbook podcast host and public speaker
There’s only so much time I can devote to endorsements in my role, and my brand recognition is at its highest demand during hockey season –– when I have the least amount of time to support local businesses and charities due to my schedule. Veritone Voice provides me with such a wide range of possibilities for my personal brand and endorsements because of its ease of use, minimal time commitment and control over the final voice file.
Randy Hahn, NHL Sports play-by-play commentator and on-air personality
Veritone Voice Custom Voice FAQ
How are AI voices made?
Once consent is received, we need about three hours of high fidelity, isolated audio recording which we’ll use to train the model. We can use pre-existing audio or provide scripts to record. Content should model the desired output style, and multiple models can be built to accommodate different styles and languages.
How long does it take to create an AI voice?
A custom voice model takes about two weeks once all of the required training data is received.
How do I create the most realistic sounding AI voice?
For a lifelike voice model, you’ll want to ensure the training data matches the use case for the voice model. For example, if you or the talent will be using the model for advertisements, the training data should be relevant ad reads. Next, you must ensure the training data meets all audio requirements. This will result in the optimal output. From there, you can leverage the Veritone Voice application to adjust tone, pitch, style, speed, intonation, and more. Text-to-speech can accomplish a very realistic voice though speech-to-speech is also an option to optimize accuracy that sounds indistinguishable from the talent. Additionally, for broadcast quality productions, an audio engineer may assist but is not always required.
What audio samples can be used to generate an AI voice?
As long as it meets the audio requirements, we can leverage pre-existing audio for example from an approved film or podcast or we can arrange studio time to record the samples. If the latter, we have scripts the talent is welcome to use or they may use their own.