[FFmpeg-devel] [PATCH 3/3] lavc/cbrt_tablegen: unroll table generation loop
Ganesh Ajjanagadde
gajjanagadde at gmail.com
Fri Jan 1 00:59:43 CET 2016
On Thu, Dec 31, 2015 at 3:53 PM, Ganesh Ajjanagadde
<gajjanagadde at gmail.com> wrote:
> On Thu, Dec 31, 2015 at 8:46 AM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
>> Hi,
>>
>> On Thu, Dec 31, 2015 at 11:39 AM, Ganesh Ajjanagadde
>> <gajjanagadde at gmail.com> wrote:
>>>
>>> This patch does not seem to have measurable impact, at least on x86-64,
>>> though
>>> there could be benefits for less than stellar branch predictors.
>>
>> [..]
>>>
>>> - for (i = 0; i < 1<<13; i++) {
>>> - if (!(i & 7))
>>> - cbrt_tab[i].f = 16 * cbrt_tab[i>>3].f;
>>> - else
>>> - cbrt_tab[i].f = i * cbrt(i);
>>> + for (i = 0; i < 1<<13; i+=8) {
>>> + cbrt_tab[i].f = 16 * cbrt_tab[i>>3].f;
>>> + cbrt_tab[i+1].f = (i+1) * cbrt(i+1);
>>> + cbrt_tab[i+2].f = (i+2) * cbrt(i+2);
>>> + cbrt_tab[i+3].f = (i+3) * cbrt(i+3);
>>> + cbrt_tab[i+4].f = (i+4) * cbrt(i+4);
>>> + cbrt_tab[i+5].f = (i+5) * cbrt(i+5);
>>> + cbrt_tab[i+6].f = (i+6) * cbrt(i+6);
>>> + cbrt_tab[i+7].f = (i+7) * cbrt(i+7);
>>
>>
>> gcc (and most other compilers) will unroll the loop automatically, I
>> suspect. Check disassembly to confirm?
>>
>> (That doesn't mean the patch shouldn't go in, I'm just trying to help you
>> explain the result. I have no comment on the patch itself.)
>
> Patch series dropped, I have superior approach that brings down to ~
> 400k cycles (as opposed to original 750k, proposed 660k). Currently at
> work seeing if there is anything I can easily squeeze further.
Sorry, actually 300k cycles.
[...]
More information about the ffmpeg-devel
mailing list