mcerveny
Posts: 7
Joined: Sat Aug 04, 2012 7:22 pm

video_decode performance&observability

Mon May 16, 2016 3:45 pm

Hello.

I am decoding rtp/udp realtime h264 or mjpeg video streams (video_decode -> video_scheduler (without clock attached) -> video_render) for remote NVidia Grid VDI with low latency (https://gridforums.nvidia.com/default/t ... pberry-pi/). All works OK but I need to get some more observability about video processing pipeline and maybe some hint about performance optimization.

How to get more observability ?
  • There is mention but unimplemented "OMX_IndexParamBrcmSetCodecPerformanceMonitoring" structure DECODE_PROGRESS_REPORT_T.
    Is this bug or feature ?
  • I am measuring decoding pipeline latency between buffer pushed to decoder (OMX_EmptyThisBuffer() + OMX_BUFFERFLAG_TIME_UNKNOWN + OMX_BUFFERFLAG_ENDOFFRAME) and incremented nFrameCount in video_render (OMX_IndexConfigBrcmPortStats). I am using active pooling to get "increment" event :( :

    Code: Select all

     	while (1) {
    		OMX_CONFIG_BRCMPORTSTATSTYPE st;
    		OMX_INIT_STRUCTURE(st);
    		st.nPortIndex = videoRenderer_inPort;
    		OMX_GetParameter(videoRenderer_handle, OMX_IndexConfigBrcmPortStats, &st);
    	        if (last_frame_count !=  st.nFrameCount) {
    			struct timespec t;
    			clock_gettime(CLOCK_MONOTONIC, &t);
    			Log("frame increment %d %li\n", st.nFrameCount, (t.tv_sec - time_of_buffer_push.tv_sec)*1000ul + (t.tv_nsec - time_of_buffer_push.tv_nsec)/1000000);
    			last_frame_count = st.nFrameCount;
    		}
    		usleep(1000);
    	}
    
    The outputs seems to be correct (~12-22ms for decoding 1280x1024@30, ~15-29ms for 1080p@30).
    When is nFrameCount incremented (at input to video_render or when pushed to output (hdmi)) ?
    How to measure hdmi vsync wait time ?
    Is some better way to get video processing delay ?
How to get more performance ?
I need to optimize (C) performance point (see https://gridforums.nvidia.com/default/t ... 2689/#2689 )
  • RPI overclocking ?
  • I am pushing buffer to encoder after full reassembly from udp/rtp packets. Will be better/useful to push buffer with every received udp/rtp packet (eg. when video_decoder starts decoding - with any buffer in pipeline or after buffer with OMX_BUFFERFLAG_ENDOFFRAME) ?
  • Any other hint ?
Thanks for answers, Martin

dom
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 5182
Joined: Wed Aug 17, 2011 7:41 pm
Location: Cambridge

Re: video_decode performance&observability

Mon May 16, 2016 4:22 pm

Overclocking should help h264 decode time.

Code: Select all

gpu_freq=500
over_volatge=6
force_turbo=1
You will need the force_turbo as the arm may not be busy when video decode is in progress.
Note: overclocking is not guaranteed, but those settings work on a high percentage of boards.

You may be able to optimise the time to vsync. You can get vsync callbacks with vc_dispmanx_vsync_callback which allows you to see exactly where the vsyncs occur.
You can influence that with "vcgencmd adjust_hdmi_clock"

Examples of both calls here: https://github.com/popcornmix/xbmc/blob ... ux/RBP.cpp

So, if you find vsync is actually occurring 12ms after the point in time you are presenting, then use:

Code: Select all

vcgencmd adjust_hdmi_clock 1.001
and the HDMI will run a little (0.1%) faster than default. That should still be in spec, but you will find the 12ms delay gradually decreases.
When you get to desired value (e.g. 2ms), you can reduce the speed adjust. You will want to continuously adjust this to keep the delay within desired range.

mcerveny
Posts: 7
Joined: Sat Aug 04, 2012 7:22 pm

Re: video_decode performance&observability

Wed May 18, 2016 12:21 am

Thanks for hints. I did some new measurements (https://gridforums.nvidia.com/default/t ... 2891/#2891).

dom
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 5182
Joined: Wed Aug 17, 2011 7:41 pm
Location: Cambridge

Re: video_decode performance&observability

Wed May 18, 2016 11:40 am

HDMI vsync wait is measured with RPI vc_dispmanx_vsync_callback() "vsync latency" (13). The saw-waveform (f) comes from different clock domains (vGPU @ ~30 FPS and HDMI @ ~60 FPS).
I think this is avoidable with the adjust_hdmi_clock command. You can ensure that every second vsync occurs a fixed period after the vGPU capture by subtle adjustements to HDMI clock.
The synchronisation will also improve the quality of the remote stream (i.e. no duplicated/skipped vsync frames when rendering occurs close to a vsync event).

Return to “OpenMAX”