0julian
Posts: 28
Joined: Sun Apr 21, 2013 3:59 pm

video_decode dpb/reference frames

Sat Aug 24, 2013 11:07 am

Hi,

while working at RPi/omx support in VLC a question raised, which I could not really find an answer to yet. We are using AllocateBuffer/UseBuffer combination to allow direct rendering between decoder and video output module without using an omx tunnel. Now what I am questioning myself is, how the omx module deals with the decoded picture buffer/reference frame list? For a real direct rendering approach I would expect that the buffers allocated with AllocateBuffer are used for the dpb, but from the behaviour it seems more as if the decoder module would keep internal copies of the reference frames, so that on the application side the dpb has not to be cared of.
Maybe someone with a little deeper knowledge could give some details?

Thanks,
Julian

dom
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 5343
Joined: Wed Aug 17, 2011 7:41 pm
Location: Cambridge

Re: video_decode dpb/reference frames

Sat Aug 24, 2013 11:09 am

The decoder will own the dbp in GPU memory space.
Any buffers returned to the arm will be copies.

0julian
Posts: 28
Joined: Sun Apr 21, 2013 3:59 pm

Re: video_decode dpb/reference frames

Sat Aug 24, 2013 11:40 am

dom wrote:The decoder will own the dbp in GPU memory space.
Any buffers returned to the arm will be copies.
When using AllocateBuffer the actual pictures will still be in the GPU memory space and only pointers to it are returned to the arm. Or do I misunderstand the spec here?
So the GPU buffers which is pointed to are copies then?
Is this the same when using omx tunnels? Or does AllocateBuffer/UseBuffer require additional copies on the GPU?

dom
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 5343
Joined: Wed Aug 17, 2011 7:41 pm
Location: Cambridge

Re: video_decode dpb/reference frames

Sat Aug 24, 2013 11:45 am

0julian wrote:
dom wrote:The decoder will own the dbp in GPU memory space.
Any buffers returned to the arm will be copies.
When using AllocateBuffer the actual pictures will still be in the GPU memory space and only pointers to it are returned to the arm. Or do I misunderstand the spec here?
So the GPU buffers which is pointed to are copies then?
Is this the same when using omx tunnels? Or does AllocateBuffer/UseBuffer require additional copies on the GPU?
OpenMAX has a "copy buffer" interface. The arm never points at a GPU buffer, it always has it's own copy. Similarly if you send a buffer to GPU it will get a copy.

The AllocateBuffer functions will allocate an arm and gpu buffer for non-tunneled components.

0julian
Posts: 28
Joined: Sun Apr 21, 2013 3:59 pm

Re: video_decode dpb/reference frames

Sat Aug 24, 2013 12:07 pm

dom wrote:
0julian wrote:
dom wrote:The decoder will own the dbp in GPU memory space.
Any buffers returned to the arm will be copies.
When using AllocateBuffer the actual pictures will still be in the GPU memory space and only pointers to it are returned to the arm. Or do I misunderstand the spec here?
So the GPU buffers which is pointed to are copies then?
Is this the same when using omx tunnels? Or does AllocateBuffer/UseBuffer require additional copies on the GPU?
OpenMAX has a "copy buffer" interface. The arm never points at a GPU buffer, it always has it's own copy. Similarly if you send a buffer to GPU it will get a copy.

The AllocateBuffer functions will allocate an arm and gpu buffer for non-tunneled components.
Ok, I slowly understand, thanks. So for non-tunneled a GPU as well as an ARM buffer is allocated for each AllocateBuffer call and video_decode will fill both of them. Still the video_render module can use the GPU buffer directly, so it will not be a copy GPU->CPU->GPU per frame, but only a parallel copy to CPU?
Still the copy to ARM memory is probably not done when running in tunneled mode, is it? In how far will this affect performance? 1080p decode+render seems to be fine in my case as of now, but I do not yet have tested deinterlacing...

dom
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 5343
Joined: Wed Aug 17, 2011 7:41 pm
Location: Cambridge

Re: video_decode dpb/reference frames

Sat Aug 24, 2013 1:33 pm

0julian wrote: Ok, I slowly understand, thanks. So for non-tunneled a GPU as well as an ARM buffer is allocated for each AllocateBuffer call and video_decode will fill both of them. Still the video_render module can use the GPU buffer directly, so it will not be a copy GPU->CPU->GPU per frame, but only a parallel copy to CPU?
Still the copy to ARM memory is probably not done when running in tunneled mode, is it? In how far will this affect performance? 1080p decode+render seems to be fine in my case as of now, but I do not yet have tested deinterlacing...
My understanding is:
for non-tunnelled, AllocBuffer allocates a buffer on arm and gpu. EmptyBuffer will copy the buffer from arm to gpu. FillBuffer will copy the buffer from gpu to arm.
for tunnelled components everything is done with references to existing buffer, so no duplication or copying.

If video_decode and video_render are not tunnelled, then I believe the whole decoded frame will be copied to the arm and back.
This is done with DMA, so the cost may not be overwhelming, but it's certainly a lot more expensive than tunelling.

0julian
Posts: 28
Joined: Sun Apr 21, 2013 3:59 pm

Re: video_decode dpb/reference frames

Sat Aug 24, 2013 2:44 pm

Alright, thanks for you explanation. I will go on implementing the missing features and see how it performs. Unfortunately tunneling does not really fit into the plugin architecture of VLC, so we can not really use tunneling there.

Return to “OpenMAX”