Recently, I had a project where I needed to convert some audio to text. It took a bit more googling than I was used to in order to find the code, so I went ahead and whipped up a project that demonstrates its usage, so people can more easily find it.
This code uses the .NET System.Speech namespace and demonstrates how to transcribe audio using either a microphone or a previously created .wav file using C#.
The code can be divided into 2 main parts:
- configuring the
SpeechRecognitionEngineobject (and its required elements)
- handling the
Step 1: Configuring the SpeechRecognitionEngine
_speechRecognitionEngine = new SpeechRecognitionEngine();
_dictationGrammar = new DictationGrammar();
At this point your object is ready to start transcribing audio from the microphone. You need to handle some events though, in order to actually get access to the results.
Step 2: Handling the SpeechRecognitionEngine Events
_speechRecognitionEngine.SpeechRecognized -= new EventHandler
_speechRecognitionEngine.SpeechHypothesized -= new EventHandler
_speechRecognitionEngine.SpeechRecognized += new EventHandler
_speechRecognitionEngine.SpeechHypothesized += new EventHandler
private void SpeechHypothesizing(object sender, SpeechHypothesizedEventArgs e)
///real-time results from the engine
string realTimeResults = e.Result.Text;
private void SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
///final answer from the engine
string finalAnswer = e.Result.Text;
That’s it. If you want to use a pre-recorded .wav file instead of a microphone, you would use
_speechRecognitionEngine.SetInputToWaveFile(pathToTargetWavFile); instead of
There are a bunch of different options in these classes and they are worth exploring in more detail. This covers the bare essentials for a prototype. I have attached a full example and encapsulation here.