DeepSpeech 0.6

pionerkotik | 278 points

I do not understand how to use Deepspeech even in the most simple use case.

1. I want to teach it ten words. How do I do this?

2. I want to speak into my microphone (available as a Pulseaudio device) and recognise the words and output the words as a text stream on stdout. How do I do this?

This is the documentation:

https://deepspeech.readthedocs.io/en/v0.6.0/Python-Examples.... https://deepspeech.readthedocs.io/en/v0.6.0/Python-API.html

It does not answer the questions I have.

https://deepspeech.readthedocs.io/en/v0.6.0/DeepSpeech.html

The introduction page is full of incomprehensible jargon.

bmn__ | 4 years ago

> It achieves a 7.5% word error rate on the LibriSpeech test clean benchmark

Anyone have a comparison for how good/bad that is compared to other solutions, and what it means for practical usage, if that can be guessed at from a single number?

detaro | 4 years ago

I just found https://voice.mozilla.org/

Besides being a great resource for speech analysis, this could be a real game changer for acquiring listening comprehension in a foreign language.

I feel that even after a few years of learning a new language I still have trouble with listening. Part of that is that it's often all or nothing, even one or two unknown words in a sentence means I can't understand the sentence. But worse is that most language teaching materials use a very small set of native speakers, which deprives the learner's brain of being able to generalize.

bayesian_horse | 4 years ago

Perhaps there's an opportunity to create a demo webpage using tensorflow.js? Especially with the new TSLite support. To raise visibility and awareness. The posenet webcam demo[1] for example, seems to have ~2k links.

[1] https://storage.googleapis.com/tfjs-models/demos/posenet/cam...

mncharity | 4 years ago

Congrats to the team on the new release! Following their progress since a while. It's an important project, and I'm very happy about the size reductions while they still delivered a WER improvement over the last release. Amazing!

est31 | 4 years ago

This reminds me of something I would love to see happen but I don't have the skills to put it all together. I really think there's some potential merit to a reading coach app(lication) that listens to someone read and looks for weaknesses/disorders/etc compared to a trained model. It could provide those diagnostics to an educator, guide the content to focus on those, coach the reader directly, etc.

It all seems very doable based on what I see in the technology today, I just don't have the skills to do it.

jcims | 4 years ago

I was very exited about this, then I tried it with the pre-trained model and I recorded "Hello this is a test message" 3 times and it was inferred as "a sassanian", "he is a paris" "he states that" ...

z3t4 | 4 years ago

I can't seem to find pre-trained models for other languages (French). How long did you training take for those in English? Do you think it makes sense to start from the English one? Thank you!

testbed | 4 years ago

I'm starting to look into the MycroftAI. It sounds like with this release I could use Mycroft+DeepSpeech on a new RaspberryPi for a completely offline smart speaker, do I understand that right?

tobylane | 4 years ago

> DeepSpeech v0.6 with TensorFlow Lite runs faster than real time on a single core of a Raspberry Pi 4

This is great news.

I'm not very familiar with the deep learning framework ecosystem. Does anyone know what the simplest way would be to incorporate a DeepSpeech model into a WebAssembly project?

For instance, is it straightforward to compile Tensorflow Lite with emscripten? If not, can this TensorFlow model be run with tensorflow.js? Or can it be converted to some other format that is easier to use with WASM?

maxbrunsfeld | 4 years ago

Tensorflow lite, interesting. Has anyone tried this with Google Coral USB accelerator? I got one laying around but very little experience with ML. I could get the USB Accelerator working with the pretrained posenet after much messing around, but my chances with DeepSpeech are small to none. This seems like the perfect fit though.

franciscop | 4 years ago

I wonder if it is any good at out-of-vocabulary words? Is it hard to teach it new things like medical terms and such?

bayesian_horse | 4 years ago

Looks great. Is it related to any of Mozilla's product?

kbumsik | 4 years ago

So who will be developing an open source assistant like Goggle Home / Alexa? :)

therealmarv | 4 years ago

Any recommendations on a good raspberry pi hat or microphone to test this?

travisporter | 4 years ago