this post was submitted on 27 Aug 2025
21 points (100.0% liked)
Opensource
4779 readers
157 users here now
A community for discussion about open source software! Ask questions, share knowledge, share news, or post interesting stuff related to it!
⠀
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I've really enjoyed using Kokoro for generating audiobooks:
Be sure to first try using this convenient API wrapper:
Note that not all the modelled voices in Kokoro-82M are of equal quality, given disparities in limited training data from reference speakers. However, what's cool is that you can prescribe polynomial weights to multiple voices tags, enabling you to synthesize different variants weighted more heavily from the highest quality voices.
One current limitation for Kokoro is that there's no way to prescribe emotion or intonation procedurally using markup tags like SSML in the source text, unlike other models like Orpheus. But Orpheus sometimes generate weird hallucinations like repeating sentences, injecting new phrases, appending radio silence or filter words, and generally increasing the tempo of words per minute as a sentence progresses. Still, this may be of interest if you want to add emotion like fear or urgency to your generated dispatches, and manage to tune the input temperature you want for the model.
However, Kokoro is a lot more compute efficient and audibly consistent, requiring less scrutiny or manual supervision. The author behind Kokoro now also looks to be working towards an emotional variant as well:
Reference project I've been following for audiobook generation:
wow. thanks!