Subscribe

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Service

New AI Model CSM-1B Powers Realistic Voice Assistant Maya

New AI Model CSM-1B Powers Realistic Voice Assistant Maya New AI Model CSM-1B Powers Realistic Voice Assistant Maya
IMAGE CREDITS: SESAME

Sesame, the AI startup behind the viral virtual assistant Maya, has officially released its base AI model, CSM-1B. The model, which boasts 1 billion parameters, is now available under an Apache 2.0 license, allowing developers to use it for commercial applications with minimal restrictions.

CSM-1B is designed to generate “RVQ audio codes” from both text and audio inputs, according to Sesame’s listing on the AI development platform Hugging Face. RVQ, or Residual Vector Quantization, is a technique that encodes audio into discrete tokens. This technology is also used in cutting-edge AI audio tools like Google’s SoundStream and Meta’s Encodec.

Built on Meta’s Llama architecture, CSM-1B incorporates an advanced audio decoder to enhance voice synthesis. While the model itself is a base version capable of producing various voices, a fine-tuned variant specifically powers Maya, Sesame’s hyper-realistic AI assistant.

Open-Source Potential with Limited Language Support

Sesame has made it clear that CSM-1B is not optimized for specific voices but has the ability to generate a wide range of speech outputs. However, its capacity for handling non-English languages is limited due to potential training data contamination.

The company has not disclosed details about the dataset used to train CSM-1B, leaving questions about its data sources and potential biases.

Ethical Concerns and Lack of Built-in Safeguards

One of the most notable aspects of CSM-1B is its lack of strict safeguards. Sesame operates on an honor system, advising users not to misuse the model for unethical purposes. The company urges developers to avoid cloning voices without consent, spreading misinformation, or engaging in harmful activities.

However, concerns persist. A quick test of the demo on Hugging Face revealed that cloning a voice took less than a minute, and generating speech on sensitive topics such as elections and geopolitical issues was alarmingly easy.

Consumer Reports has previously warned that many AI-powered voice cloning tools lack meaningful safeguards against fraud, raising further ethical concerns about the use of models like CSM-1B.

The Rise of Hyper-Realistic AI Assistants

Sesame, co-founded by Oculus co-creator Brendan Iribe, gained widespread attention in February for its AI assistants, Maya and Miles. These virtual assistants stand out by incorporating human-like pauses, breaths, and disfluencies, making their speech patterns more natural. Users can even interrupt them mid-sentence, much like OpenAI’s Voice Mode.

Sesame’s Future Plans: AI Glasses and Beyond

Backed by investors such as Andreessen Horowitz, Spark Capital, and Matrix Partners, Sesame is expanding its AI ambitions beyond voice assistants. The company has revealed that it is working on AI-powered smart glasses designed for all-day wear, leveraging its proprietary AI models to enhance user experiences.

With CSM-1B now open-source, the future of AI-generated voice technology is poised for rapid advancement. However, questions about ethical use and security remain critical as these tools become increasingly powerful and accessible.

Share with others