[FFmpeg-devel] [RFC/PATCH] bitpacked_dec: Optimization for bitpacked_dec decoder performance

Marton Balint cus at passwd.hu
Fri May 12 01:20:02 EEST 2023



On Wed, 10 May 2023, Lance Wang wrote:

> On Sat, May 6, 2023 at 8:41 PM Devin Heitmueller <
> devin.heitmueller at ltnglobal.com> wrote:
>
>> On Sat, May 6, 2023 at 8:16 AM James Almer <jamrial at gmail.com> wrote:
>> > Can you bench with the START_TIMER and STOP_TIMER macros in timer.h?
>> > Also, define CACHED_BITSTREAM_READER in bitpacked_dec.c before including
>> > git_bits.h and test the actual implementation again, to see if it makes
>> > any difference.
>>
>> Original code:
>> 671661910 decicycles in bitpacked_dec,       1 runs,      0 skips
>> 669736380 decicycles in bitpacked_dec,       1 runs,      0 skips
>> 669370700 decicycles in bitpacked_dec,       1 runs,      0 skips
>>
>> Original code with CACHED_BITSTREAM_READER defined
>> 352599030 decicycles in bitpacked_dec,       1 runs,      0 skips
>> 336163810 decicycles in bitpacked_dec,       1 runs,      0 skips
>> 344628350 decicycles in bitpacked_dec,       1 runs,      0 skips
>>
>> My proposed versioned:
>> 257353330 decicycles in bitpacked_dec,       1 runs,      0 skips
>> 271527000 decicycles in bitpacked_dec,       1 runs,      0 skips
>> 252701500 decicycles in bitpacked_dec,       1 runs,      0 skips
>>
>>
> Yes, it's show better performance, so LGTM if nobody have plan to optimize
> the bitstream
> function.

Actually the cached bitstream reader was faster here than the manual 
approach:

./ffmpeg -stream_loop 128 -threads 1 -f bitpacked -pix_fmt yuv422p10le -s 3840x2160 -c:v bitpacked -i source.yuv -pix_fmt yuv422p10le -f null none -loglevel error

Old code:

821050920 decicycles in bitpacked,       1 runs,      0 skips
815402160 decicycles in bitpacked,       2 runs,      0 skips
814108410 decicycles in bitpacked,       4 runs,      0 skips
814213800 decicycles in bitpacked,       8 runs,      0 skips
815048325 decicycles in bitpacked,      16 runs,      0 skips
812866713 decicycles in bitpacked,      32 runs,      0 skips
809186523 decicycles in bitpacked,      64 runs,      0 skips
808317601 decicycles in bitpacked,     128 runs,      0 skips

With the patch:

379879920 decicycles in bitpacked,       1 runs,      0 skips
387491580 decicycles in bitpacked,       2 runs,      0 skips
397720260 decicycles in bitpacked,       4 runs,      0 skips
389581560 decicycles in bitpacked,       8 runs,      0 skips
381820635 decicycles in bitpacked,      16 runs,      0 skips
379791675 decicycles in bitpacked,      32 runs,      0 skips
379246303 decicycles in bitpacked,      64 runs,      0 skips
379221671 decicycles in bitpacked,     128 runs,      0 skips

Old code and #defined CACHED_BITSTREAM_READER 1

345122280 decicycles in bitpacked,       1 runs,      0 skips
343663020 decicycles in bitpacked,       2 runs,      0 skips
343372680 decicycles in bitpacked,       4 runs,      0 skips
342554535 decicycles in bitpacked,       8 runs,      0 skips
340816522 decicycles in bitpacked,      16 runs,      0 skips
340225672 decicycles in bitpacked,      32 runs,      0 skips
340283520 decicycles in bitpacked,      64 runs,      0 skips
339643105 decicycles in bitpacked,     128 runs,      0 skips

Regards,
Marton


More information about the ffmpeg-devel mailing list