Not really. All the features of that tool are basic functions we’ve had before LibreOffice was still OpenOffice.
Since this converts to Markdown, it’s inherently a very lossy conversion. What’s hard to pull off is preserve the full formatting when converting to an odt or something.
converting audio files to markdown must be a pretty recent feature
Quite curious… does it actually do that and if so how? Because STT to get a plaintext file or subtitle (so with timing) has been available via e.g. Whisper quite efficiently for a while now. If this though does do more, e.g. structure (differentiating a title, list, etc) I’d like to learn how.
Might open up a GDPR related issue there. I don’t think people using such a library assume they need connectivity nor that their data would be send to a 3rd party.
I convert from docx to md specifically with the purpose of getting rid of Microsoft formatting aka almost converting to plaintext but preserve at least some structure.
soffice works as CLI, can be called from Python and has plenty of related tooling, e.g. https://pypi.org/project/unoserver/ so I agree, I’m confused at what’s actually novel and better than that or even dedicated long lasting FLOSS projects like pandoc.
Not really. All the features of that tool are basic functions we’ve had before LibreOffice was still OpenOffice.
Since this converts to Markdown, it’s inherently a very lossy conversion. What’s hard to pull off is preserve the full formatting when converting to an odt or something.
I like libreoffice, but converting audio files to markdown must be a pretty recent feature, for I never heard of it before being part of libreoffice.
Quite curious… does it actually do that and if so how? Because STT to get a plaintext file or subtitle (so with timing) has been available via e.g. Whisper quite efficiently for a while now. If this though does do more, e.g. structure (differentiating a title, list, etc) I’d like to learn how.
There is nothing special going on. This whole project is just a bunch of python libraries coupled together to a cli tool. It uses the package SpeechRecognition to connect to the google speech recognition api: https://github.com/microsoft/markitdown/blob/main/src/markitdown/_markitdown.py#L691
Pretty uninteresting and a bit disappointing. Pandoc is a lot more interesting.
Thanks for the clarification. I checked the code you linked and noticed
recognize_google
and seems it’s relying on https://github.com/Uberi/speech_recognition which then seems to rely on https://github.com/Uberi/speech_recognition/blob/master/speech_recognition/recognizers/google.py so basically are they using an API, sending all the audio data to Google servers?Yes, this is how I read it as well. The library would support to use a local model, but they decided to just send the audio data to Google.
Might open up a GDPR related issue there. I don’t think people using such a library assume they need connectivity nor that their data would be send to a 3rd party.
In your saying this isn’t useful, you’re making a lot of assumptions about how someone might want to use this.
I convert from docx to md specifically with the purpose of getting rid of Microsoft formatting aka almost converting to plaintext but preserve at least some structure.
soffice
works as CLI, can be called from Python and has plenty of related tooling, e.g. https://pypi.org/project/unoserver/ so I agree, I’m confused at what’s actually novel and better than that or even dedicated long lasting FLOSS projects like pandoc.