[FFmpeg-devel] QTRLE encode performance

Sun Feb 10 11:40:07 CET 2013

On Fri, Feb 08, 2013 at 05:10:09PM -0500, Malcolm Bechard wrote:
> On Fri, Feb 8, 2013 at 3:51 PM, Alexis Ballier <alexis.ballier at gmail.com>wrote:
> 
> > 2013/2/8 Alex Beregszaszi <alex at rtfs.hu>:
> > >> I don't know the quicktime encoder, but you should probably check the
> > >> filesize for the same color depth: ffmpeg's qtrle encoder is optimal
> > >> (in terms of output size), so it's likely a non-optimal heuristic is
> > >> much faster.
> > >
> > > If it is technically possible it would be nice to change qtlreenc to use
> > > ff_rle_encode and optimise/multithread the latter to benefit other
> > encoders
> > > using RLE.
> >
> > It's not exactly rle as in the ff_rle_encode sense: you can repeat n
> > times a pixel or have n raw pixels (which is what ff_rle_encode does
> > afaik) but also skip n pixels, meaning the pixels will be the same as
> > those from the previous frame.
> >
> > Also, it's probably way much harder to parallelize the rle encoder
> > than cutting each frame into # of theads parts of (# of lines / # of
> > threads) lines and encode them in parallel.
> >
> > Alexis.
> > _______________________________________________
> > ffmpeg-devel mailing list
> > ffmpeg-devel at ffmpeg.org
> > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> >
> 
> Great info for all, thanks.
> I think it does make sense to just parallelize the for loop that calls
> qtrle_encode_line() to get started. Any suggestion for the best file to use
> as a template for how encoding threading is done in ffmpeg?

It depends on how you want to parallelize.
If you want to split each frame into parts and encode each individually,
have a look at execute2 (used e.g. in DNxHD encoder).
Note that I have some doubts about how well this will work here.
Even though the RLE processing is independent, the output must still
be merged into one single buffer, which might be problematic.
The alternative is frame multithreading, but I don't know if we have any
encoders at all using that currently?
I'm not convinced it wouldn't be better to optimize in other ways,
because for screen capture stuff it actually is better to make it
do less so that other applications get more of the CPU instead of making
it use all CPUs.