c++cvoicevoice-recognition

Reading and processing WAV file data in C/C++


I'm currently doing a very very important school project. I need to extract the information of a WAVE file in C/C++ and use the information to obtain the LPC of a voice signal. But, in order to do that, I need to do some pre-processing to the signal, like doing Zero crossing and energy analysis, among other things. Which means that I need the sign and a real value. The problem is that I don't know how to obtain useful information and the correct format for that. I have already read every single field in the file, but I'm not sure I am doing it right. Suggestions, please?

This is the way I read the file at the moment:

readI = fread(&bps, 1, 2, audio); printf("bits per sample = %d \n", bps);

Thanks in advance.


Solution

  • My first recommendation would be to use some kind of library to help you out. Most sound solutions seem overkill, so a simple library (like the one recommended in the comment of your question, libsndfile) should do the trick.

    If you just want to know how to read WAV files so you can write your own (since your school might turn its nose up at having you use a library like any other regular person), a quick google search will give you all the info you need plus some people who have already wrote many tutorials on reading the .wav format.

    If you still don't get it, here's some of my own code where I read the header and all other chunks of the WAV/RIFF data file until I get to the data chunk. It's based exclusively off the WAV Format Specification. Extracting the actual sound data is not very hard: you can either read it raw and use it raw or do a conversion to a format you'd have more comfort with internally (32-bit PCM uncompressed data or something).

    When looking at the below code, replace reader.Read...( ... ) with equivalent fread calls for integer values and byte sizes of the indicated type. WavChunks is an enum that is the Little Endian values of the IDs inside of a WAV file chunk, and the format variable is one of the types of the Wav Format Types that can be contained in the WAV File Format:

    enum class WavChunks {
        RiffHeader = 0x46464952,
        WavRiff = 0x54651475,
        Format = 0x020746d66,
        LabeledText = 0x478747C6,
        Instrumentation = 0x478747C6,
        Sample = 0x6C706D73,
        Fact = 0x47361666,
        Data = 0x61746164,
        Junk = 0x4b4e554a,
    };
    
    enum class WavFormat {
        PulseCodeModulation = 0x01,
        IEEEFloatingPoint = 0x03,
        ALaw = 0x06,
        MuLaw = 0x07,
        IMAADPCM = 0x11,
        YamahaITUG723ADPCM = 0x16,
        GSM610 = 0x31,
        ITUG721ADPCM = 0x40,
        MPEG = 0x50,
        Extensible = 0xFFFE
    };
    
    int32 chunkid = 0;
    bool datachunk = false;
    while ( !datachunk ) {
        chunkid = reader.ReadInt32( );
        switch ( (WavChunks)chunkid ) {
        case WavChunks::Format:
            formatsize = reader.ReadInt32( );
            format = (WavFormat)reader.ReadInt16( );
            channels = (Channels)reader.ReadInt16( );
            channelcount = (int)channels;
            samplerate = reader.ReadInt32( );
            bitspersecond = reader.ReadInt32( );
            formatblockalign = reader.ReadInt16( );
            bitdepth = reader.ReadInt16( );
            if ( formatsize == 18 ) {
                int32 extradata = reader.ReadInt16( );
                reader.Seek( extradata, SeekOrigin::Current );
            }
            break;
        case WavChunks::RiffHeader:
            headerid = chunkid;
            memsize = reader.ReadInt32( );
            riffstyle = reader.ReadInt32( );
            break;
        case WavChunks::Data:
            datachunk = true;
            datasize = reader.ReadInt32( );
            break;
        default:
            int32 skipsize = reader.ReadInt32( );
            reader.Seek( skipsize, SeekOrigin::Current );
            break;
        }
    }