[MPlayer-users] ssa/ass rendering uses lots of cpu with -vo gl

Reimar Döffinger Reimar.Doeffinger at gmx.de
Wed Jan 26 22:03:33 CET 2011


On Mon, Jan 24, 2011 at 05:11:36AM +0300, Vladimir Mosgalin wrote:
>  On 2011.01.23 at 20:21:42 +0100, Reimar Döffinger wrote next:
> > > for each EOSD image element
> > >   find some room for it in one of the textures
> > >   copy the bitmap to a memory buffer associated to that texture
> > > upload the necessary parts of the memory buffers
> > > 
> > > Or maybe you already tried that.
> > 
> > Yes, I meant to say that I tried exactly that, and depending on
> > how exactly I did it it (MapBuffer vs. BufferSubData) it was the
> > same speed or slower. That's why I blame the driver.
> > There are other things that could be tried, but it tends to be quite a bit
> > of effort and not that likely to help.
> > For this specific case, checking for each part seperately whether
> > it was actually changed would help.
> > In general it could help for libass to merge non-overlapping glyphs,
> > but to a degree these all optimize only some special-cases.
> 
> Is it some unvoidable nature of libass rendering that one has to upload
> glyphs many times, one by one to the video card? Or you want to say that
> merely trying to render glyphs one by one into texture that's yet to be
> uploaded causes slowdown already?
> 
> Could it be possible to "compact" all the glyphs into yet to be uploaded
> texture as a mere image, without using opengl texture functions loading
> texture into vram? So to say, "-vf ass" like rendering into transparent
> high resolution texture, which is then layered on the main video texture
> in one pass? Or it's too complicated..

Depends on what you mean. If you mean a texture of the same size as the
image: in principle that's possible but it has issues like huge
GPU memory usage, problems with rendering of overlapped parts, probably
higher CPU usage for simpler things and it simply can't work with PCI
video cards (at 1080p the current OSD needs maybe 60 MB/s bandwidth,
whereas that approach would need almost 200 MB/s which without video
is already more than PCI can do).
-vo direct3d does it that way if I remember right.
A similar but better-working approach is what I mention as libass merging
glyphs, but it is even more effort to implement.

> (I know i'm a bit shooting at random here, sorry :). By the way I'm
> using open source radeon driver, even though it lacks behind fglrx in
> performance greatly, it has much less issues and problems, I'd rather
> use this "slow" driver than go back to fglrx one)

I don't think it has worse performance for this use-case.
It actually has the advantage it could be optimized to also work nicely
for this kind of workload if someone cared enough.


More information about the MPlayer-users mailing list