this post was submitted on 16 Dec 2024
122 points (96.2% liked)

Opensource

4559 readers
154 users here now

A community for discussion about open source software! Ask questions, share knowledge, share news, or post interesting stuff related to it!

CreditsIcon base by Lorc under CC BY 3.0 with modifications to add a gradient



founded 2 years ago
MODERATORS
top 7 comments
sorted by: hot top controversial new old
[–] refalo@programming.dev 48 points 1 year ago* (last edited 1 year ago) (2 children)

If like me you were wondering if MS actually provided their own parsers for their Office file formats... they did not.

It seems to just be a bunch of random pyxyz 3rd-party support libraries all mashed together.

[–] mormund@feddit.org 10 points 1 year ago

What do you mean by parser? Office docs are just zipped XML files. They are trivial to parse. The hard part is all the quirks the document renderers have, which makes it impossible to perfectly match the output. But markdown can't handle any complex formatting anyway

[–] Sibbo@sopuli.xyz 6 points 1 year ago (1 children)

Maybe the people that wrote their parser have left the company? Typical big software corp problem.

[–] GissaMittJobb@lemmy.ml 4 points 1 year ago (1 children)

I mean, the parser would still be there even if the people left the company, right? The source code remains.

[–] Creat@discuss.tchncs.de 3 points 1 year ago

It might also be somewhere, but nobody knows where.

[–] Phoenix3875@lemmy.world 9 points 1 year ago

Reading through the source code, it's more of a repackaging of other open source libraries, probably for its AI effort.

[–] Chais@sh.itjust.works 2 points 1 year ago