Linux

13472 readers

401 users here now

A community for everything relating to the GNU/Linux operating system (except the memes!)

Also, check out:

Original icon base courtesy of lewing@isc.tamu.edu and The GIMP

founded 2 years ago

MODERATORS

Ategon@programming.dev

anzo@programming.dev

dwraf_of_ignorance@programming.dev

Better TTS on Linux (shkspr.mobi)

submitted 1 week ago by mesamunefire@piefed.social to c/linux@programming.dev

10 comments fedilink hide all child comments

The venerable eSpeak is a mainstay of Linux distributions. It is a clever Text-To-Speech (TTS) program which will read aloud the written word using a phenomenally wide variety of languages and accents. The only problem is that it sounds robotic. It has the same vocal fidelity as a 1980s Speak 'n' Spell toy. Monotonous, clipped, and painful to listen to. For some people, this is a feature, not a…

top 10 comments

sorted by: hot top controversial new old

[–] insomniac_lemon@lemmy.cafe 13 points 1 week ago (3 children)

Honestly, a lot of newer TTS is worse than the 80s/90s stuff like DECtalk or PlainTalk (/MacinTalk). Both of which, while not exactly human-sounding, actually sounded better (at least in a sort of aesthetic way). For an example, Microsoft Sam (and whatever the voice is default for espeak) is such a downgrade IMO.

I'm not sure how heavy Piper models are (data or running), but I'm sure TTS could be better without neural anything.

[–] moonpiedumplings@programming.dev 12 points 1 week ago (1 children)

Many people who are visually impaired and rely on TTS don't want it to sound "better". The ultra robotic voices, have extremely consistent sounds, which makes it possible to make out what they are saying at many time accelerated speeds. Though it seems to take some practice.

On the other hands, "more natural" sounding voices, slur into eachother at high speeds, and aren't comprehensible. They are only listenable to at slower speeds.

example: https://web.archive.org/web/20220525081607/https://www.vincit.fi/en/software-development-450-words-per-minute/

The og site seems to be down. The audio files work for me though. It sounds like gibberish to me, but it's comprehensible easily to the author.

[–] insomniac_lemon@lemmy.cafe 3 points 1 week ago* (last edited 1 week ago)

I mean consistent sound is fully in-line with what I'm saying, I am fine with robotic sound though the issue I have is that it can be grating for newer. Which I just assumed was something about how samples are used (compared to older speech synthesis). Is the sound actually part of the design to allow such high-speed?

Even if it were, older-style synthesis could likely have that as a parameter or option (or just... a dedicated voice).

I've seen some videos on screen-readers with a somewhat fast voice (not quite as fast as your link) that does sound better, similar voices to DECtalk Paul. They don't seem to always give the voice name but I've seen some mention of IBMTTS so it might be related (though current results give AI service stuff that I'm not sure would trace back to those old videos (2016) but either way it might be some Paul derivative). EDIT: It might be ETI Eloquence?

It seems ETI Eloquence is both beloved in the blind community as well as something that has had support issues (proprietary abandonware). And I've seen one person on the subject:

It's frustrating to say the least. Eloquence haters are like, what's the big deal, but I'm like, show me a voice that is fast and responsive, and doesn't make me wanna claw my eyes out like eSpeak does. I don't like concatenative voices because you hear where the splices take place and it's just weird and off-putting. They are also not as snappy.

The problem I have with Dectalk is that it slurs like a drunk as you speed it up.

[–] 30p87@feddit.org 3 points 1 week ago

Only piper provides glados as a voice option tho

[–] mesamunefire@piefed.social 3 points 1 week ago (1 children)

Piper is VERY lightweight. Kinda like espeak. I got it working on a pi 3 once. And its good enough for my phone.

Theres more human spunding stuff but they use very intensive modeling.

[–] insomniac_lemon@lemmy.cafe 6 points 1 week ago* (last edited 1 week ago)

I just listened to the samples and it seems a bit hit-or-miss. Some of them still stumble over words, have stilted pacing, or just sound off in some other way (raspy-ness, speed). It seems to vary more voice-to-voice than by the quality setting.

I mean I'm sure some of these voices are fine and probably better than other AI models in terms of performance... though they are a bit uncanny valley and I still think a voice meant to sound robotic (while still having personality) is probably an easier target. I didn't notice anything like that in the samples, though I did see a couple of YT videos with a GlaDOS voice (sounding fairly accurate) that mention Piper (though I know such a thing likely wouldn't be front-and-center due to licensing).

[–] who@feddit.org 6 points 1 week ago (2 children)

https://github.com/KittenML/KittenTTS

[–] refalo@programming.dev 1 points 1 week ago

Seems broken for me, and the author is ignoring awesome PRs.

onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from /home/user/.cache/huggingface/hub/models--KittenML--kitten-tts-mini-0.8/snapshots/c02725660cea441db4c383af69f1f26f5cd00947/kitten_tts_mini_v0_8.onnx failed:/onnxruntime_src/onnxruntime/core/graph/model_load_utils.h:46 void onnxruntime::model_load_utils::ValidateOpsetForDomain(const std::unordered_map<std::__cxx11::basic_string<char>, int>&, const onnxruntime::logging::Logger&, bool, const std::string&, int) ONNX Runtime only *guarantees* support for models stamped with official released onnx opset versions. Opset 5 is under development and support for this is limited. The operator schemas and or other functionality may change before next ONNX release and in this case ONNX Runtime will not guarantee backward compatibility. Current official support for domain ai.onnx.ml is till opset 4.

[–] SkybreakerEngineer@lemmy.world 1 points 1 week ago

Not to be confused with Kitten from TTS

[–] onlinepersona@programming.dev 2 points 1 week ago

Looking for Maintainers

The Open Home Foundation is looking for maintainers for Piper! If you're interested, please send a message to voice@openhomefoundation.org

Well, that's a pity. But if it continues working, that should be good enough.