Artificial Intelligence

1780 readers

9 users here now

Welcome to the AI Community!

Let's explore AI passionately, foster innovation, and learn together. Follow these guidelines for a vibrant and respectful community:

Be kind and respectful.
Share high-quality contributions.
Stay on-topic.
Enhance accessibility.
Verify information.
Encourage meaningful discussions.

You can access the AI Wiki at the following link: AI Wiki

Let's create a thriving AI community together!

founded 2 years ago

MODERATORS

ikidd@lemmy.world

TTS still sucks (duarteocarmo.com)

submitted 1 week ago by PlanterTree@discuss.tchncs.de to c/ai_@lemmy.world

4 comments fedilink hide all child comments

A couple of years ago I decided to turn this blog into a podcast. At the time, I decided to make up a stupid rule: whatever model I use to clone my voice and generate article transcripts needs to be an open model.

top 4 comments

sorted by: hot top controversial new old

[–] PlanterTree@discuss.tchncs.de 3 points 1 week ago (2 children)

The guy has pretty big constraints on his setup: Voice Cloning AND open source …

At the top of the leaderboard is Kokoro. Kokoro is an amazing model! Especially for a modest 82 Million (!) parameters and a mere 360 MB (!). However, like many models in this leaderboard - I can’t use it - since it doesn’t support voice cloning.

[–] BananaTrifleViolin@lemmy.world 2 points 1 week ago

To be fair there are quite a few open source TTS AI engines that support voice cloning. Coqui TTS springs to mind. They do take some configuring and training to get right, especially for voice cloning, but it's definitely doable opensource.

Kokoro is designed to turn epubs into audio books, and designed to be lightweight. I think he's looking at the wrong tools for his use case.

[–] brucethemoose@lemmy.world 1 points 1 week ago

And also a leaderboard that doesn't even have all the TTS models? There are a TON of them.

[–] domi@lemmy.secnd.me 2 points 1 week ago

They left out VibeVoice, the leading model for voice cloning that is intended exactly for the described use case?