完美解码设置循环播放 完美解码设置

背景win10自1803版本发布以来 , 取消了内置的h265的视频解码 , 虽然能安装插件可以播放 , 但是在一个支持硬解8k视频的N卡上 , 居然以软解的方式播放 。颠覆了Windows平台上DXVA(DirectX Video Acceleration)的认知 , 不得已 , 只能通过NVidia提供的sdk来硬解视频 , 其中用到了ffmpeg , 这是一个很好的开始 , 在开发过程中学到了关于音视频的不少知识 , 在此分享 。
播放视频视频播放的本质是将一堆序列图片按照帧频一张一张显示出来 。帧频决定了切换下一张图片的时间 , 而这里的图片指的是ARGB的像素集 , 而不是压缩过的png或者JPEG等 。
假设一个视频fps(帧频)是30 , 尺寸是1920x1080 , 时长30秒 , 那么原始数据的大小是 1920x1080x30x30x4=7G , 但是实际上视频文件不会这么大 , 充其量几十M , 那么播放视频就成了将数据解压成ARGB的像素集 , 然后根据帧频一张一张的显示出来 。
视频解码视频解码其实就是两步
1  , 根据视频的编码格式 , 解出每帧的图片 。
2 , 将每帧的图片色彩空间转成RGB的色彩空间 。
解出每帧的图片视频的编码格式有很多 , 比如h.264, hevc,vp9等 , 使用ffmpeg时可以使用-c:v 指定视频的编码

完美解码设置循环播放 完美解码设置

文章插图
完美解码设置循环播放 完美解码设置

文章插图
完美解码设置循环播放 完美解码设置

文章插图
视频播放当我们得到RGB像素值时就可以根据帧频显示视频了 。
//获取当前计算机运行的时间double getTime(){__int64 freq = 0;__int64 count = 0;if (QueryPerformanceFrequency((LARGE_INTEGER*)&freq) && freq > 0 && QueryPerformanceCounter((LARGE_INTEGER*)&count)){return (double)count / (double)freq * 1000.0;}return 0.0;}double interval = 1000.0/av_q2d(fmtc->streams->r_frame_rate);double estimateTime = frameIndex * interval;// 预计时间double actualTime = (getTime() - startTime);//实际时间上面的代码中 , 根据帧频得到了预计时间 , 实际时间是当前时间减去开始播放的时间 , 要是实际时间小于预计时间 , 那么需要sleep一会 , 等到预计时间下一帧显示 , 反之则要尽快显示下一帧 。
硬件解码上面代码中 , 如果一个视频的帧频是30帧/秒 , 意味着每帧切换的时间是33毫秒 , 那么问题来了 , 如果在33毫秒内没有解完下一帧的图像 , 视频播放就会延时或者是丢帧 , 但如果用硬件解码 , 那么出现这个问题的几率就大大降低了 。毕竟GPU在多核处理图像方面不是CPU所比肩的 。
硬件解码的原理是将avpacket直接提交给gpu , 然后gpu解码 , 得到一个surface交由应用程序处理 , 这个surface存在显存中 , 这里用的是DirectX9 , 即在cpu中以IDirect3DTexture9形式间接访问 , 作为贴图直接渲染出来 。需要注意的是这里的avpacket 在h264 , hevc编码中剔除了pps等信息 , 需要加回来才能提交给GPU , 如下
//硬件解码//(SPS)Sequence Paramater Set, (PPS)Picture Paramater Set,//Convert an H.264 bitstream from length prefixed mode to start code prefixed mode (as defined in the Annex B of the ITU-T H.264 specification).AVPacket* pktFilteredAVBSFContext *bsfc = NULL;av_init_packet(&pktFiltered);pktFiltered.data = https://www.yf-zs.com/xisu/0;pktFiltered.size = 0;const AVBitStreamFilter *bsf = av_bsf_get_by_name("h264_mp4toannexb" / "hevc_mp4toannexb");av_bsf_alloc(bsf, &bsfc);avcodec_parameters_copy(bsfc->par_in, fmtc->streams->codecpar);av_bsf_init(bsfc);av_bsf_send_packet(bsfc, packet);av_bsf_receive_packet(bsfc, &pktFiltered);硬件解码的具体示例参看NVidia的sdk示例 。
半硬件解码视频编码有千万种 , 但gpu对于能解的编码 , 分辨率有严格的要求 , 如果硬件不支持 , 我们还得靠cpu解码 , 但这里我们可以把yuv转rgb的代码放到gpu端处理 以减轻cpu的压力 , yuv转rgb的源码在ffmpeg中有 , 但没有参考的价值 , 因为针对cpu优化成int算法 , 而gpu擅长float的运算 。下面示例代码是yuv420p转rgb 。
//GPU代码//颜色空间转换 YUV420P to RGBfloat4x4 colormtx;texturetex0; texturetex1; texturetex2; sampler sam0 =sampler_state { Texture = <tex0>;MipFilter = LINEAR; MinFilter = LINEAR;MagFilter = LINEAR; };sampler sam1 =sampler_state { Texture = <tex1>;MipFilter = LINEAR; MinFilter = LINEAR;MagFilter = LINEAR; };sampler sam2 =sampler_state { Texture = <tex2>;MipFilter = LINEAR; MinFilter = LINEAR;MagFilter = LINEAR; };float4 c = float4(tex2D(sam0, uv).a, tex2D(sam1, uv).a, tex2D(sam2, uv).a, 1); color = mul(c, colormtx); //CPU代码D3DXMATRIXA16 yuv2rgbMatrix(){/*FLOAT r = (1.164 * (Y - 16) + 1.596 * (V - 128));FLOAT g = (1.164 * (Y - 16) - 0.813 * (V - 128) - 0.391 * (U - 128));FLOAT b = (1.164 * (Y - 16) + 2.018 * (U - 128));FLOAT r = 1.164 * Y + 1.596*V - 1.596*128.0/255.0 - 1.164*16.0/255.0;FLOAT g = 1.164 * Y - 0.391*U - 0.813*V - 1.164*16.0/255.0+0.813*128.0/255.0+0.391*128.0/255.0;FLOAT b = 1.164 * Y + 2.018*U - 1.164*16.0/255.0- 2.018*128.0/255.0;*/D3DXMATRIXA16 m(1.164, 0, 1.596, -1.596*128.0 / 255.0 - 1.164*16.0 / 255.0,1.164, -0.391, -0.813, -1.164*16.0 / 255.0 + 0.813*128.0 / 255.0 + 0.391*128.0 / 255.0,1.164, 2.018, 0, -1.164*16.0 / 255.0 - 2.018*128.0 / 255.0,0, 0, 0, 1);D3DXMatrixTranspose(&m, &m);return m;}void update(AVFrame* frame){int w = ctx_->textureWidth();int h = ctx_->textureHeight();int w2 = w /2;int h2 = h / 2;auto device = ctx_->getDevice3D(); //IDirect3DDevice9Exif (!texY_) //IDirect3DTexture9*{auto effect = render_->effect(); //ID3DXEffect*device->CreateTexture(w, h, 1, D3DUSAGE_DYNAMIC, D3DFMT_A8, D3DPOOL_DEFAULT, &texY_, NULL);device->CreateTexture(w2, h2, 1, D3DUSAGE_DYNAMIC, D3DFMT_A8, D3DPOOL_DEFAULT, &texU_, NULL);device->CreateTexture(w2, h2, 1, D3DUSAGE_DYNAMIC, D3DFMT_A8, D3DPOOL_DEFAULT, &texV_, NULL);effect->SetTexture("tex0", texY_);effect->SetTexture("tex1", texU_);effect->SetTexture("tex2", texV_);D3DXMATRIXA16 m = yuv2rgbMatrix();effect->SetMatrix("colormtx", &m);}upload(frame->data<0>, frame->linesize<0>, h,texY_);upload(frame->data<1>, frame->linesize<1>, h2, texU_);upload(frame->data<2>, frame->linesize<2>, h2, texV_);}void upload(uint8_t * data, int linesize, int h, IDirect3DTexture9* tex){D3DLOCKED_RECT locked = { 0 };HRESULT hr = tex->LockRect(0, &locked, NULL, D3DLOCK_DISCARD);if (SUCCEEDED(hr)){uint8_t* dst = (uint8_t*)locked.pBits;int size = linesize < locked.Pitch ? linesize : locked.Pitch;for (INT y = 0; y < h; y++){CopyMemory(dst, data, size);dst += locked.Pitch;data += linesize;}tex->UnlockRect(0);}}再举一例 , 是关于YUV422P10LE转化成argb的 , YUV422P10LE 每个像素占36bits , 其中alpha占12bits , YUV各占8bits , 但ffmpeg保存的数据是四个分量各占12bits , 每个分量两个字节保存 , 这里用D3DFMT_L16创建的贴图 。
//YUV422P10LE to ARGB///< planar YUV 4:4:4,36bpp, (1 Cr & Cb sample per 1x1 Y samples), 12b alpha, little-endian//YUVA444P12LE to ARGB//GPU Codefloat4x4 colormtx;texturetex0; texturetex1; texturetex2; texturetex3; sampler sam0 =sampler_state { Texture = <tex0>;MipFilter = LINEAR; MinFilter = LINEAR;MagFilter = LINEAR; };sampler sam1 =sampler_state { Texture = <tex1>;MipFilter = LINEAR; MinFilter = LINEAR;MagFilter = LINEAR; };sampler sam2 =sampler_state { Texture = <tex2>;MipFilter = LINEAR; MinFilter = LINEAR;MagFilter = LINEAR; };sampler sam3 =sampler_state { Texture = <tex3>;MipFilter = LINEAR; MinFilter = LINEAR;MagFilter = LINEAR; };,float4 c = float4(tex2D(sam0, uv).x, tex2D(sam1, uv).x, tex2D(sam2, uv).x, 0.06248569466697185); //0xfff/0xffffc = c * 16.003663003663004; //0xffff/0xfffcolor = mul(c, colormtx); color.a = tex2D(sam3, uv).x * 16.003663003663004; //CPU Codeint w = ctx_->textureWidth();int h = ctx_->textureHeight();auto device = ctx_->getDevice3D(); //IDirect3DDevice9Exif (!texY_) //IDirect3DTexture9* {auto effect = render_->effect(); //ID3DXEffect*check_hr(device->CreateTexture(w, h, 1, D3DUSAGE_DYNAMIC, D3DFMT_L16, D3DPOOL_DEFAULT, &texY_, NULL)); //12b Ycheck_hr(device->CreateTexture(w, h, 1, D3DUSAGE_DYNAMIC, D3DFMT_L16, D3DPOOL_DEFAULT, &texU_, NULL)); //12b Ucheck_hr(device->CreateTexture(w, h, 1, D3DUSAGE_DYNAMIC, D3DFMT_L16, D3DPOOL_DEFAULT, &texV_, NULL)); //12b Vcheck_hr(device->CreateTexture(w, h, 1, D3DUSAGE_DYNAMIC, D3DFMT_L16, D3DPOOL_DEFAULT, &texA_, NULL)); //12b Aeffect->SetTexture("tex0", texY_);effect->SetTexture("tex1", texU_);effect->SetTexture("tex2", texV_);effect->SetTexture("tex3", texA_);D3DXMATRIXA16 m = uv2rgbMatrix();effect->SetMatrix("colormtx", &m);}upload(frame->data<0>, frame->linesize<0>, h, texY_);upload(frame->data<1>, frame->linesize<1>, h, texU_);upload(frame->data<2>, frame->linesize<2>, h, texV_);upload(frame->data<3>, frame->linesize<3>, h, texA_);播放音频原始音频数据有几个重要的参数 , 采样率(Sample per second - sps) , 通道(channel) , 每个采样占用的bit数 (bits per sample - bps) 。
播放音频实际上就是把音频数据不停的发送到声卡上 , 声卡根据sps , channel , bps产生声音 。比如一段音频数据大小是4M , 采样率是44100 , channel是2 , bps是16位 , 如果将这段数据发送给声卡 , 那么过(4x1024x1024x8)/(44100x2x16)秒后 声卡会告诉你声音播放完了 。
使用wave api播放音频在Windows上 , 可以使用wave api播放音频 , 播放步骤是打开 , 写入 , 关闭 。可以使用软件gold wave导出原始的音频数据 , 另存为snd文件 , 导出时注意配置声道 , 采样率和bps 。下面一个播放sps是44100 , channel是2 , bps是16bit的原始音频数据的代码 。
//Play audio use the Wave API#include <mmsystem.h>const byte* pcmData =https://www.yf-zs.com/xisu/....//假设这个要播放的音频数据和数据大小int pcmSize = ....openAudio();writeAudio(pcmData, pcmSize);closeAudio();////////////////////////////#define AUDIO_DEV_BLOCK_SIZE 8192#define AUDIO_DEV_BLOCK_COUNT 4HWAVEOUT dev = 0;int available = 0;WAVEHDR* blocks = 0;int index = 0;Mutex mtx;//自定义类 基于EnterCriticalSection 和 LeaveCriticalSection 实现的void openAudio(){WAVEFORMATEX wfx = {0};wfx.nSamplesPerSec = 44100;wfx.wBitsPerSample = 16;wfx.nChannels = 2;wfx.cbSize = 0;wfx.wFormatTag = WAVE_FORMAT_PCM;wfx.nBlockAlign = (wfx.wBitsPerSample * wfx.nChannels) >> 3;wfx.nAvgBytesPerSec = wfx.nBlockAlign * wfx.nSamplesPerSec;waveOutOpen(&dev, WAVE_MAPPER, &wfx, (DWORD_PTR)waveOutProc, (DWORD_PTR)0, CALLBACK_FUNCTION);blocks = new WAVEHDR;memset(blocks, 0, sizeof(WAVEHDR) * AUDIO_DEV_BLOCK_COUNT);for (int i = 0; i < AUDIO_DEV_BLOCK_COUNT; i++){blocks.lpData = https://www.yf-zs.com/xisu/new char;blocks.dwBufferLength = AUDIO_DEV_BLOCK_SIZE;}}void closeAudio(){for (int i = 0; i < AUDIO_DEV_BLOCK_COUNT; i++){if (blocks.dwFlags & WHDR_PREPARED){waveOutUnprepareHeader(dev_, &blocks, sizeof(WAVEHDR));}delete blocks.lpData;}delete blocks;waveOutClose(dev);}void writeAudio(const byte* data, int size){if (!bok_)return;WAVEHDR* current;int remain;current = &blocks;while (size > 0){if (current->dwFlags & WHDR_PREPARED){waveOutUnprepareHeader(dev, current, sizeof(WAVEHDR));}if (size < (int)(AUDIO_DEV_BLOCK_SIZE - current->dwUser)){memcpy(current->lpData + current->dwUser, data, size);current->dwUser += size;break;}remain = AUDIO_DEV_BLOCK_SIZE - current->dwUser;memcpy(current->lpData + current->dwUser, data, remain);size -= remain;data += remain;current->dwBufferLength = AUDIO_DEV_BLOCK_SIZE;waveOutPrepareHeader(dev, current, sizeof(WAVEHDR));waveOutWrite(dev, current, sizeof(WAVEHDR));mtx.lock();available--;mtx.unlock();while (!available){Sleep(10);}index++;index %= AUDIO_DEV_BLOCK_COUNT;current = &blocks;current->dwUser = 0;}}上面代码中在open的时候设置了回调函数waveOutProc , 当该函数被调用的时候说明一个8192大小的音频数据块被播放完 , 在writeaudio里 , 不停的循环写入大小为8192四个数据块 , 这四个数据块预先写进去(waveOutWrite) , 在等waveOutProc回调时 , 又有可用的数据块再接着写 , 这样就可以连续的播放声音了 。
ffmpeg解压音频同样地 , 在视频文件中音频也是压缩过的 , 一帧一帧的 , 解出音频的完整代码如下:
//打开视频文件AVFormatContext *fmtc = NULL;avformat_network_init();avformat_open_input(&fmtc, "video file path", NULL, NULL);avformat_find_stream_info(fmtc, NULL);int autdioIndex = av_find_best_stream(fmtc, AVMEDIA_TYPE_AUDIO, -1, -1, NULL, 0);//创建音频解码器AVCodecContext* avctx = avcodec_alloc_context3(NULL);auto st = fmtc->streams;avcodec_parameters_to_context(avctx, st->codecpar);AVCodec*codec = avcodec_find_decoder(avctx->codec_id);avcodec_open2(avctx, codec, NULL);//解码音频AVFrame* frame = av_frame_alloc();AVPacket pkt;av_init_packet(&pkt);pkt.data = https://www.yf-zs.com/xisu/NULL;pkt.size = 0;while (av_read_frame(fmtc, &pkt) >= 0){if (pkt.stream_index == autdioIndex){int gotFrame = 0;if (avcodec_decode_audio4(avctx, frame, &gotFrame, &pkt) < 0) {//fprintf(stderr, "Error decoding audio frame (%s)/n", av_err2str(ret));break}if (gotFrame) {writeAudio(frame->extended_data<0>, linesize);}}}//关闭avcodec_free_context(&avctx);av_frame_free(&frame);if(pkt.data)av_packet_unref(&pkt);avformat_close_input(&fmtc_);return 0;不难看出 , 和解码视频如出一辙 , 最终的音频数据在AVFrame中 。
【完美解码设置循环播放 完美解码设置】音频转化虽然上面的例子中我们解出音频并且播放出来了 , 但是有个条件 , 就是视频文件中的音频的sps是44100 , bps是16 , channel是2 , 否则播放的音频是不正常的 。
我们知道 , 视频文件中sps , bps , channel不是固定的 , 这就需要我们转换下我们能播放的采样率 , 转化代码如下
//音频转化//创建音频编码转换器auto devSampleFormat = 16 == 8 ? AV_SAMPLE_FMT_U8 : AV_SAMPLE_FMT_S16;SwrContext * swrc = swr_alloc();av_opt_set_int(swrc, "in_channel_layout", av_get_default_channel_layout(avctx->channels), 0);av_opt_set_int(swrc, "in_sample_rate", avctx->sample_rate, 0);av_opt_set_sample_fmt(swrc, "in_sample_fmt", avctx->sample_fmt, 0);av_opt_set_int(swrc, "out_channel_layout", av_get_default_channel_layout(2), 0);av_opt_set_int(swrc, "out_sample_rate", 44100, 0);av_opt_set_sample_fmt(swrc, "out_sample_fmt", devSampleFormat, 0);swr_init(swrc);struct SwrBuffer{int samplesPerSec;int numSamples, maxNumSamples;uint8_t **data;int channels;int linesize;};SwrBuffer dst = {0};dst.samplesPerSec = dev.samplesPerSec();dst.channels = dev.channels();dst.numSamples = dst.maxNumSamples = av_rescale_rnd(numSamples, dst.samplesPerSec, avctx->sample_rate, AV_ROUND_UP);av_samples_alloc_array_and_samples(&dst.data, &dst.linesize, dst.channels, dst.numSamples, devSampleFormat, 0);//转换音频dst.numSamples = av_rescale_rnd(swr_get_delay(swrc, avctx->sample_rate) + frame->nb_samples, dst.samplesPerSec, avctx->sample_rate, AV_ROUND_UP);if (dst.numSamples > dst.maxNumSamples) {av_freep(&dst.data<0>);av_samples_alloc(dst.data, &dst.linesize, dst.channels, dst.numSamples, devSampleFormat, 1);dst.maxNumSamples = dst.numSamples;}/* convert to destination format */ret = swr_convert(swrc, dst.data, dst.numSamples, (const uint8_t**)frame->data, frame->nb_samples);if (ret < 0) {//error}int bufsize = av_samples_get_buffer_size(&dst.linesize, dst.channels, ret, devSampleFormat, 1);if (bufsize < 0) {//fprintf(stderr, "Could not get sample buffer size\n");}writeAudio(dst.data<0>, bufsize);查看声卡设备支持的参数上面示例中播放声音一直用的44100 , 16 , 2 , 没错 , 这也是我打开声卡设备所用的参数 , 如果声卡不支持这个参数那么waveOutOpen会调用失败 , 如何判断声卡支持的参数 , 代码如下:
WAVEINCAPS caps = {0};if(waveInGetDevCaps(0, &caps, sizeof(caps)) == MMSYSERR_NOERROR){//checkCaps(caps.dwFormats, WAVE_FORMAT_96S16, 96000, 2, 16);//checkCaps(caps.dwFormats, WAVE_FORMAT_96S08, 96000, 2, 8);}void checkCaps(DWORD devfmt, DWORD fmt, int sps, int channels, int bps){if (bps_)return;if (devfmt & fmt){bps_ = bps;channels_ = channels;sps_ = sps;}}音视频同步视频和音频播放起来需要同步 , 不能各播各的 , 那样很可能出现的问题是口型对不上声音 , 这里我们将用视频同步到音频的方式同步音视频 。
我们知道根据采样率等信息 , 完全可以知道音频播放了多长时间 , 那么根据这个时间就可以把视频同步上 , 伪代码如下:
int audioFrameIndex = 0;int videoFrameIndex = 0;//Thread 1while (true){decodeAudioData();writeAudio(...);}void CALLBACK waveOutProc(HWAVEOUT hWaveOut, UINT uMsg, DWORD_PTR dwInstance, DWORD dwParam1, DWORD dwParam2){if (uMsg != WOM_DONE)return;...audioFrameIndex ++;}//Thread 2double audioBitsPerSec = audioDev->bitsPerSample() * audioDev->samplesPerSec() * audioDev->channels();double interval = 1000.0/av_q2d(fmtc->streams->r_frame_rate);while (true){if (!decodeVideoFrame())continue;videoFrameIndex ++;while (true){double bits = audioFrameIndex * AUDIO_DEV_BLOCK_SIZE * 8.0;double ms = bits / audioBitsPerSec * 1000.0; //实际播放时间double to = videoFrameIndex * interval; //预计播放时间if (ms < to)//Need false then wait{Sleep(1);continue;}presentVideoFrame();break;}}这里多线程中略去了线程锁的问题 , 且行且小心
后记这是本人在实际开发过程中的一些见地 , 不足之处忘大家多多指正 。

    推荐阅读