That's not quite how this all works. For local TTS you need specific boundaries and pre-trained models that can load, run, and interface with the local API of whatever your TTS system is.You can see a bunch here.
Then you have some like this that can run on espeak-ng, which should be a drop in TTS option for Firefox.