chrisd1100
Posts: 2
Joined: Tue Jun 18, 2013 5:08 am
Location: New York, NY

OMX video_decode / video_render Questions

Wed May 18, 2016 10:48 pm

Hello everyone, this (basically) my first post--sorry, but it's a long one. I've developed a cross-platform C app for low latency desktop streaming via H.264, and I'm trying to bring it to the Pi. I'm using the OpenMax IL and ilclient (<--thanks for this!) libraries provided with Raspbian. Most of what I've done is based off of the /opt/vc/src/hello_pi/hello_video/video.c file found on Raspbian.

My server program sends a stream of encoded H.264 frames at precisely 60 FPS over TCP to the client. In my Windows and Mac OS versions of the client, I use platform specific hardware decoder libraries, then draw the raw frames with the SDL2 library (which uses OpenGL under the hood).

Based on the video.c example mentioned above, I have a working version on the Pi, but not without some issues. FYI, I have removed the "clock" and "video_scheduler" components, as they seemed cause choppier video compared to simply tunneling the "video_decode" component output port directly to the "video_render" input port. All testing was done on a Pi 3 with the current unmodified version (2016-05-10) of Raspbian Jessie.

My questions / issues are as follows:

1. The main issue: If I set the H.264 quality low, say to a QP of 36 for 1080p@60, the Pi is able to handle the incoming data smoothly. The average bitrate in this scenario is about 2-3 Mbps. Now if I up the quality to around 30 QP, things start breaking. The resulting bitrate increases to about 5-6 Mbps, with occasional spikes of around 10 Mbps. The strange thing is, it is not the ilclient or OMX libraries returning errors here--I am actually getting packet loss over TCP. The video is sporadically choppy, with occasional, noticeable delays between certain frames. Eventually, when trying to read my size header that I send with every frame, the value is clearly incorrect (i.e. a negative value), indicating packet loss, and the program exits. This may be more of a networking issue than an OpenMax issue, but I know that the Pi can support much higher bandwidth than 10 Mbps out of the box (I am using the built in wired connection). I have tinkered with almost every sysctl net.core and net.ipv4.tcp* setting you can imagine to no avail. My only theory is that there is some bottleneck on a shared bus that is affecting the TCP reads. CPU usage during the issue reads at a low 15-25% (of a single core). Overclocking the Pi with gpu_freq, force_turbo, and over_voltage did not seem to help either. The TCP reads occur in a separate thread and are not blocked by any other processing. My voltage is stable and I am not getting any voltage or temperature warnings. Any ideas here? UPDATE: I tested the program without any OpenMax decoding or rendering, the issue remains. So it must be a networking bottleneck of some kind...

2. Will the "video_render" OMX component always be the best choice for presenting the frames, or would it be possible to use SDL2 w/ OpenGL to achieve similar performance? If SDL2 is an option, what is the default raw pixel format for the frames that come out of the "video_decode" component, and can I modify this pixel format?

3. Is there a way to turn vsync off for the "video_render" component?

4. Is there a way to increase the buffer size obtained through ilclient_get_input_buffer? The buffers seem to default to 80KB.

5. I'm interested in setting the OMX_DataUnitCodedPicture to potentially reduce latency, as I read the frames from the network one frame at a time, but I'm not sure how to set it via OMX_IndexParamBrcmDataUnit. Any examples out there?

6. Any other tips for reducing potential buffering / latency in the decoding & rendering process?

Thanks in advance!

Chris

6by9
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 6995
Joined: Wed Dec 04, 2013 11:27 am
Location: ZZ9 Plural Z Alpha, aka just outside Cambridge.

Re: OMX video_decode / video_render Questions

Thu May 19, 2016 7:54 am

1 - TCP should never lose data, but it may insert long delays if it has to retry frames. UDP is not a guaranteed service, so the application layer needs to handle reordered buffers and any potential retries. That certainly sounds like a networking issue.

2 - If you're dealing with multimedia buffers from video_decode or similar, then video_render is the best bet. GLES has to convert to RGB before processing the buffers, and I suspect SDL will have to compose on the ARM.

3 - What are you thinking that would achieve other than tearing? video_render effectively has an input FIFO of depth 1. If a newer buffer is presented before it has submitted the old one to dispmanX, then the older one will be dropped.

4 - You should be able to just increase it by doing a OMX_SetParameter((*comp)->comp, OMX_IndexParamPortDefinition, portdef) with nBufferSize having some larger value before you enable the port. 80kB is just the default. Likewise you can increase the number of buffers if that helps unblock your app.

5 - Can't remember the detail.

6 - You say that you submit encoded frames one per buffer, therefore ensure that you set nFlags appropriately with at least OMX_BUFFERFLAG_ENDOFFRAME. That saves the codec searching the stream for start codes.

Adding a clock and video_scheduler should result in a smoother playback, but you'll need to insert a delay in to avoid almost all frames appearing to be late. There is a distinct possibility that your source clock will drift though, and with no sensible way to sync them, it may be more hassle than it's worth.
1080P60 is above the level 4 or 4.1 (I forget which) that the codec is specified to support. It's there on a "best efforts" basis and will actually overclock slightly in an attempt to achieve it.
Software Engineer at Raspberry Pi Trading. Views expressed are still personal views.
I'm not interested in doing contracts for bespoke functionality - please don't ask.

Return to “OpenMAX”