c++audioms-media-foundationmultimedia

WMAV2 MFT encoder


I'm trying to use WMA8 encoder in MFT to encode audio data. The whole duration of audio is 10 seconds, and ProcessInput and ProcessOut both run correctly/successfully. The time stamps of encoded audio data are also correct. The problem is, the encoded audio, after written into the file (with my own muxer) seems incorrect.

There's one thing I've noticed, from the output of mftrace, the output type seems to have a strange alignment and bytes per second. Below is the encoder setup code and output of mftrace.

CLSID* pCLSIDs = NULL;   // Pointer to an array of CLISDs.       
UINT32 nCount = 0;      
MFT_REGISTER_TYPE_INFO encoderInfo;      
encoderInfo.guidMajorType = MFMediaType_Audio;      
encoderInfo.guidSubtype = MFAudioFormat_WMAudioV8;      
HRESULT hr = fpMFTEnum(MFT_CATEGORY_AUDIO_ENCODER, 0, NULL, &encoderInfo, NULL, &pCLSIDs, &nCount);      
if (nCount == 0) {
    LFTRACE("Can't enumerate Audio encoders, returned encoder amount = 0");      
}
ciEncoder.CreateObject(pCLSIDs[0], IID_IMFTransform);      
if (ciEncoder.IsInvalid()) {             
    LFDEBUG("ciEncoder.CreateObject failed");         
    break;
}      
// encoder created now       
// to set input      
LComObject<IMFMediaType> ciInputType;  // Input media type of the encoder      
hr = fpMFCreateMediaType((IMFMediaType**)(ciInputType.GetAssignablePtrRef()));      
hr = ciInputType->SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Audio);      
hr = ciInputType->SetGUID(MF_MT_SUBTYPE, MFAudioFormat_PCM);      
hr = ciInputType->SetUINT32(MF_MT_AUDIO_BITS_PER_SAMPLE, 16); // bits per sample      
hr = ciInputType->SetUINT32(MF_MT_AUDIO_SAMPLES_PER_SECOND, 22050); // input sample rate      
hr = ciInputType->SetUINT32(MF_MT_AUDIO_NUM_CHANNELS, 2); // channels      
//alignment = uint16_t(pwfx->nChannels * uint16_t((BitsPerSample <= 8) ? 1 : ((BitsPerSample <= 16) ? 2 : 4)));      
hr = ciInputType->SetUINT32(MF_MT_AUDIO_BLOCK_ALIGNMENT, 4);      
hr = ciInputType->SetUINT32(MF_MT_AUDIO_AVG_BYTES_PER_SECOND, 22050 * 4); // sample rate * alignment      
hr = ciEncoder->SetInputType(0, ciInputType.get(), 0);      
LComInterface<IMFMediaType> ciOutPutType;      
hr = ciEncoder->GetOutputAvailableType(0, 1, (IMFMediaType**)ciOutPutType.GetAssignablePtrRef());      
hr = ciEncoder->SetOutputType(0, ciOutPutType.get(), 0);    
hr = ciEncoder->ProcessMessage(MFT_MESSAGE_NOTIFY_BEGIN_STREAMING, NULL);


4276,B78 04:27:33.76791 CMFTransformDetours::SetInputType @09A36E70 Succeeded MT: MF_MT_AUDIO_AVG_BYTES_PER_SECOND=88200;MF_MT_AUDIO_BLOCK_ALIGNMENT=4;MF_MT_AUDIO_NUM_CHANNELS=2;MF_MT_MAJOR_TYPE=MEDIATYPE_Audio;MF_MT_AUDIO_SAMPLES_PER_SECOND=22050;MF_MT_AUDIO_BITS_PER_SAMPLE=16;MF_MT_SUBTYPE=MFAudioFormat_PCM
4276,B78 04:27:33.76795 CMFTransformDetours::SetOutputType @09A36E70 Succeeded MT: MF_MT_AUDIO_AVG_BYTES_PER_SECOND=2751;MF_MT_AUDIO_BLOCK_ALIGNMENT=1022;MF_MT_AUDIO_NUM_CHANNELS=2;MF_MT_MAJOR_TYPE=MEDIATYPE_Audio;MF_MT_AUDIO_SAMPLES_PER_SECOND=22050;MF_MT_AUDIO_PREFER_WAVEFORMATEX=1;MF_MT_USER_DATA=00 44 00 00 0f 00 00 00 00 00 ;MF_MT_FIXED_SIZE_SAMPLES=1;MF_MT_ALL_SAMPLES_INDEPENDENT=1;MF_MT_AUDIO_BITS_PER_SAMPLE=16;MF_MT_SUBTYPE=MFAudioFormat_WMAudioV8
4276,B78 04:27:33.76796 CMFTransformDetours::ProcessMessage @09A36E70 Message type=0x10000000 MFT_MESSAGE_NOTIFY_BEGIN_STREAMING, param=00000000

from the trace you can set audio output type has block alignment of 1022, and bytes per second = 2751. The output sample rate is 22050, does that mean each sample only has 0.1byte?

from the trace, the encoded audio samples have correct time stamp. can anyone give a hint where the problem is? (by the way, do I need to feed the encoder with fixed amount of samples every time I can ProcessIput()? )

4276,B78 04:27:33.80126 CMFTransformDetours::ProcessOutput @09A36E70 Stream ID 0, Sample @085C0568, Time 0ms, Duration 232ms, Buffers 1, Size 1022B, MFSampleExtension_CleanPoint=1 
4276,B78 04:27:33.80130 CMFTransformDetours::ProcessOutput @09A36E70 Stream ID 0, Sample @085C0568, Time 232ms, Duration 232ms, Buffers 1, Size 1022B, MFSampleExtension_CleanPoint=1   
4276,B78 04:27:33.80134 CMFTransformDetours::ProcessOutput @09A36E70 Stream ID 0, Sample @085C0568, Time 464ms, Duration 278ms, Buffers 1, Size 1022B, MFSampleExtension_CleanPoint=1   
4276,B78 04:27:33.81006 CMFTransformDetours::ProcessOutput @09A36E70 Stream ID 0, Sample @085C0568, Time 743ms, Duration 278ms, Buffers 1, Size 1022B, MFSampleExtension_CleanPoint=1   
4276,B78 04:27:33.81010 CMFTransformDetours::ProcessOutput @09A36E70 Stream ID 0, Sample @085C0568, Time 1021ms, Duration 278ms, Buffers 1, Size 1022B, MFSampleExtension_CleanPoint=1  
4276,B78 04:27:33.81012 CMFTransformDetours::ProcessOutput @09A36E70 Stream ID 0, Sample @085C0568, Time 1300ms, Duration 278ms, Buffers 1, Size 1022B, MFSampleExtension_CleanPoint=1  
4276,B78 04:27:33.82536 CMFTransformDetours::ProcessOutput @09A36E70 Stream ID 0, Sample @085C0568, Time 1578ms, Duration 325ms, Buffers 1, Size 1022B, MFSampleExtension_CleanPoint=1  
4276,B78 04:27:33.82540 CMFTransformDetours::ProcessOutput @09A36E70 Stream ID 0, Sample @085C0568, Time 1904ms, Duration 278ms, Buffers 1, Size 1022B, MFSampleExtension_CleanPoint=1  
4276,B78 04:27:33.82541 CMFTransformDetours::ProcessOutput @09A36E70 Stream ID 0, Sample @085C0568, Time 2182ms, Duration 278ms, Buffers 1, Size 1022B, MFSampleExtension_CleanPoint=1  
4276,B78 04:27:33.83379 CMFTransformDetours::ProcessOutput @09A36E70 Stream ID 0, Sample @085C0568, Time 2461ms, Duration 325ms, Buffers 1, Size 1022B, MFSampleExtension_CleanPoint=1  
4276,B78 04:27:33.83383 CMFTransformDetours::ProcessOutput @09A36E70 Stream ID 0, Sample @085C0568, Time 2786ms, Duration 336ms, Buffers 1, Size 1022B, MFSampleExtension_CleanPoint=1  
4276,B78 04:27:33.84831 CMFTransformDetours::ProcessOutput @09A36E70 Stream ID 0, Sample @085C0568, Time 3123ms, Duration 313ms, Buffers 1, Size 1022B, MFSampleExtension_CleanPoint=1  
4276,B78 04:27:33.84835 CMFTransformDetours::ProcessOutput @09A36E70 Stream ID 0, Sample @085C0568, Time 3436ms, Duration 371ms, Buffers 1, Size 1022B, MFSampleExtension_CleanPoint=1  
4276,B78 04:27:33.85838 CMFTransformDetours::ProcessOutput @09A36E70 Stream ID 0, Sample @085C0568, Time 3808ms, Duration 371ms, Buffers 1, Size 1022B, MFSampleExtension_CleanPoint=1  
4276,B78 04:27:33.85842 CMFTransformDetours::ProcessOutput @09A36E70 Stream ID 0, Sample @085C0568, Time 4179ms, Duration 371ms, Buffers 1, Size 1022B, MFSampleExtension_CleanPoint=1  
4276,B78 04:27:33.87418 CMFTransformDetours::ProcessOutput @09A36E70 Stream ID 0, Sample @085C0568, Time 4551ms, Duration 371ms, Buffers 1, Size 1022B, MFSampleExtension_CleanPoint=1  
4276,B78 04:27:33.87425 CMFTransformDetours::ProcessOutput @09A36E70 Stream ID 0, Sample @085C0568, Time 4922ms, Duration 325ms, Buffers 1, Size 1022B, MFSampleExtension_CleanPoint=1  
4276,B78 04:27:33.88375 CMFTransformDetours::ProcessOutput @09A36E70 Stream ID 0, Sample @085C0568, Time 5247ms, Duration 371ms, Buffers 1, Size 1022B, MFSampleExtension_CleanPoint=1  
4276,B78 04:27:33.88379 CMFTransformDetours::ProcessOutput @09A36E70 Stream ID 0, Sample @085C0568, Time 5619ms, Duration 371ms, Buffers 1, Size 1022B, MFSampleExtension_CleanPoint=1  
4276,B78 04:27:33.89730 CMFTransformDetours::ProcessOutput @09A36E70 Stream ID 0, Sample @085C0568, Time 5990ms, Duration 417ms, Buffers 1, Size 1022B, MFSampleExtension_CleanPoint=1  
4276,B78 04:27:33.89734 CMFTransformDetours::ProcessOutput @09A36E70 Stream ID 0, Sample @085C0568, Time 6408ms, Duration 371ms, Buffers 1, Size 1022B, MFSampleExtension_CleanPoint=1  
4276,B78 04:27:33.90466 CMFTransformDetours::ProcessOutput @09A36E70 Stream ID 0, Sample @085C0568, Time 6780ms, Duration 417ms, Buffers 1, Size 1022B, MFSampleExtension_CleanPoint=1  
4276,B78 04:27:33.90470 CMFTransformDetours::ProcessOutput @09A36E70 Stream ID 0, Sample @085C0568, Time 7198ms, Duration 371ms, Buffers 1, Size 1022B, MFSampleExtension_CleanPoint=1  
4276,B78 04:27:33.91802 CMFTransformDetours::ProcessOutput @09A36E70 Stream ID 0, Sample @085C0568, Time 7569ms, Duration 325ms, Buffers 1, Size 1022B, MFSampleExtension_CleanPoint=1  
4276,B78 04:27:33.91806 CMFTransformDetours::ProcessOutput @09A36E70 Stream ID 0, Sample @085C0568, Time 7894ms, Duration 371ms, Buffers 1, Size 1022B, MFSampleExtension_CleanPoint=1  
4276,B78 04:27:33.93442 CMFTransformDetours::ProcessOutput @09A36E70 Stream ID 0, Sample @085C0568, Time 8266ms, Duration 464ms, Buffers 1, Size 1022B, MFSampleExtension_CleanPoint=1  
4276,B78 04:27:33.93447 CMFTransformDetours::ProcessOutput @09A36E70 Stream ID 0, Sample @085C0568, Time 8730ms, Duration 417ms, Buffers 1, Size 1022B, MFSampleExtension_CleanPoint=1  
4276,B78 04:27:33.93449 CMFTransformDetours::ProcessOutput @09A36E70 Stream ID 0, Sample @085C0568, Time 9148ms, Duration 417ms, Buffers 1, Size 1022B, MFSampleExtension_CleanPoint=1  
4276,B78 04:27:33.93451 CMFTransformDetours::ProcessOutput @09A36E70 Stream ID 0, Sample @085C0568, Time 9566ms, Duration 278ms, Buffers 1, Size 1022B, MFSampleExtension_CleanPoint=1  
4276,B78 04:27:33.93453 CMFTransformDetours::ProcessOutput @09A36E70 Stream ID 0, Sample @085C0568, Time 9845ms, Duration 150ms, Buffers 1, Size 1022B, MFSampleExtension_CleanPoint=1

Thanks


Solution

  • I've fixed this problem.

    The changes I did include:

    With all these changes, the encoded ASF can be played correctly.