Converting or Transcribing audio to text using C# and .NET System.Speech

Recently, I had a project where I needed to convert some audio to text. It took a bit more googling than I was used to in order to find the code, so I went ahead and whipped up a project that demonstrates its usage, so people can more easily find it.

This code uses the .NET System.Speech namespace and demonstrates how to transcribe audio using either a microphone or a previously created .wav file using C#.

The code can be divided into 2 main parts:

Step 1: Configuring the SpeechRecognitionEngine

_speechRecognitionEngine = new SpeechRecognitionEngine();
_speechRecognitionEngine.SetInputToDefaultAudioDevice();
_dictationGrammar = new DictationGrammar();
_speechRecognitionEngine.LoadGrammar(_dictationGrammar);
_speechRecognitionEngine.RecognizeAsync(RecognizeMode.Multiple);

At this point your object is ready to start transcribing audio from the microphone. You need to handle some events though, in order to actually get access to the results.

Step 2: Handling the SpeechRecognitionEngine Events

_speechRecognitionEngine.SpeechRecognized -= new EventHandler(SpeechRecognized);
_speechRecognitionEngine.SpeechHypothesized -= new EventHandler(SpeechHypothesizing);

_speechRecognitionEngine.SpeechRecognized += new EventHandler(SpeechRecognized);
_speechRecognitionEngine.SpeechHypothesized += new EventHandler(SpeechHypothesizing);

private void SpeechHypothesizing(object sender, SpeechHypothesizedEventArgs e)
{
///real-time results from the engine
string realTimeResults = e.Result.Text;
}

private void SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
///final answer from the engine
string finalAnswer = e.Result.Text;
}

That’s it. If you want to use a pre-recorded .wav file instead of a microphone, you would use _speechRecognitionEngine.SetInputToWaveFile(pathToTargetWavFile); instead of _speechRecognitionEngine.SetInputToDefaultAudioDevice();.

There are a bunch of different options in these classes and they are worth exploring in more detail. This covers the bare essentials for a prototype. I have attached a full example and encapsulation here.

Converting Embedded Powerpoint Audio to .mp3

In my continuing quest to get a Master’s in Software Engineering I am enrolled in SENG 6270. Software Verification and Validation at East Carolina University. As a distance student, one of the most useful techniques I have found for studying is converting any lectures to regular compressed audio, so I can listen to them during my daily commute. This particular class has the base lectures going up as powerpoint files. In this post I will describe how to take a powerpoint presentation and convert it to .mp3 (or whichever audio format you desire).

Step 1: Extract WAV audio from PowerPoint

The easiest way to access the audio from a PowerPoint file is to save it as a webpage.  Unfortunately, Microsoft hid this option ins PowerPoint 2010, so here is the workaround.  Load up your powerpoint file in Microsoft PowerPoint and complete the following:

  1. Press ALT+F11
  2. Press CTRL+G
  3. In the immediate window type the following:

ActivePresentation.SaveAs “<Drive>:\users\<username>\desktop\<filename>.htm”, ppSaveAsHTML, msoFalse

  1. Press ENTER.

You should now have something like the following:

Inside the folders you will find the .wav files:

Step 2: Merge WAV files and convert to MP3 using foobar2000

  1. Download and install foobar2000.
  2. Download and install the LAME binary. This is used to convert .wav to .mp3 using foobar.
  3. Fire up foobar2000 and drag all of the .wav files into it.  It should look like the following when done:

  1. Select all the files using Ctrl+A.  Right click on the selected files and choose “Convert“.  Select the ““.

You will be presented with this windows, where you can configure settings:

Make sure to click and configure “Output format” and “Destination“.  ”Output format” lets you choose your destination audio format.  If you choose .mp3 it will ask for the location of your lame binary, which you previously downloaded.  In “Destination” make sure that “merge all tracks into one file” is selected:

Click “Convert”.  BAM.  Done.

Using C# .NET to detect .ogg vorbis file properties

Recently I needed to read the various audio file properties of a .ogg vorbis file for further processing. I wanted to be able to read these properties from managed C# code. I was not able to find a clear example online for doing this (even StackOverflow failed me). After a bit of scrounging online, I eventually arrived at a library that could do this: Vorbis .NET . Unfortunately, their documentation was dead and their example program error-ed out where I ran it, so I went ahead and started exploring it.

The relevant line of code needed was (after adding references to Vorbit .NET of course):
var x = OggVorbisDecoder.OggVorbisMemoryStream.LoadFromFile(_fullOggFilePath);.
This provided access to a VorbisInfo property which had the following attributes:

  • BitrateLower
  • BitrateNominal
  • BitrateUpper
  • Channels
  • Rate
  • Version

I have attached my source and an example program.
Ogg Vorbis Info Demo Source Only
Ogg Vorbis Info Demo Binary Only