Programming

23570 readers

75 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Follow the programming.dev instance rules
Keep content related to programming in some way
If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities !webdev@programming.dev

founded 2 years ago

MODERATORS

snowe@programming.dev

Ategon@programming.dev

MaungaHikoi@lemmy.nz

UlrikHD@programming.dev

How can I do blog to epub on Manjaro Linux or Firefox? (programming.dev)

submitted 2 days ago by CoderSupreme@programming.dev to c/programming@programming.dev

3 comments fedilink hide all child comments

I’m trying to convert a blog into an EPUB and keep running into issues with existing tools.

I first tried blog2epub, but it fails during parsing with:

lxml.etree.XMLSyntaxError: Opening and ending tag mismatch: meta line 10 and head, line 17, column 8

I then tried WebToEpub on Firefox, providing:

Content selector: .article-content
Chapter title selector: .title

It generated an EPUB, but the file wouldn’t open in any reader.

What I’m looking for is a tool where I can point to a blog’s base URL, define CSS selectors for the article title and body, and have it automatically fetch all entries and create one chapter per post. Or something similar.

Does anyone know of a reliable tool, script, or workflow that does this well on Linux?

top 3 comments

sorted by: hot top controversial new old

[–] mjr@infosec.pub 3 points 2 days ago

No, but could you feed the website with mismatched tags through something like tidy first? That error looks like maybe it's expecting xhtml and getting html. Maybe the site is declaring one, then using the other. Lots of software won't care because it's a pretty common error, but some panics.

[–] MonkderVierte@lemmy.zip 1 points 2 days ago* (last edited 1 day ago)

HTML 5 in actual production use is only partially convertible (it's lossy). You need to get handsy with it. *

But one way around: get a markdown editor that can convert copy&paste from the web (i know of typora, it fetches (and opt. saves) images too) and then pandoc that.

* div#main, a.h1, div with naked text, i've seen things...

[–] CoderSupreme@programming.dev 1 points 2 days ago* (last edited 2 days ago)

I recently learned about abogen and audiblez and what I want to do is blog to adiobook but I'm still stuck in getting the book from the blog.

I'm now thinking maybe c/linux would have been a better place to ask since I'm not trying to program anything. Let me know if I should move it there.