[FFmpeg-devel] [PATCH 1/2] lavc/pcm_tablegen: slight speedup of table generation
Ganesh Ajjanagadde
gajjanag at mit.edu
Mon Jan 4 06:11:28 CET 2016
On Sun, Jan 3, 2016 at 7:32 PM, Michael Niedermayer
<michael at niedermayer.cc> wrote:
> On Mon, Jan 04, 2016 at 04:04:02AM +0100, Michael Niedermayer wrote:
>> On Wed, Dec 30, 2015 at 08:34:55PM -0800, Ganesh Ajjanagadde wrote:
>> > This gets rid of some branches to speed up table generation slightly
>> > (impact higher on mulaw than alaw). Tables are identical to before,
>> > tested with FATE.
>> >
>> > Sample benchmark (Haswell, GNU/Linux+gcc):
>> > old:
>> > 313494 decicycles in build_alaw_table, 4094 runs, 2 skips
>> > 315959 decicycles in build_alaw_table, 8190 runs, 2 skips
>> >
>> > 323599 decicycles in build_ulaw_table, 4095 runs, 1 skips
>> > 318849 decicycles in build_ulaw_table, 8188 runs, 4 skips
>> >
>> > new:
>> > 261902 decicycles in build_alaw_table, 4096 runs, 0 skips
>> > 266519 decicycles in build_alaw_table, 8192 runs, 0 skips
>> >
>> > 209657 decicycles in build_ulaw_table, 4096 runs, 0 skips
>> > 232656 decicycles in build_ulaw_table, 8192 runs, 0 skips
>> >
>> > Signed-off-by: Ganesh Ajjanagadde <gajjanagadde at gmail.com>
>> > ---
>> > libavcodec/pcm_tablegen.h | 24 ++++++++++++------------
>> > 1 file changed, 12 insertions(+), 12 deletions(-)
>> >
>> > diff --git a/libavcodec/pcm_tablegen.h b/libavcodec/pcm_tablegen.h
>> > index 1387210..7269977 100644
>> > --- a/libavcodec/pcm_tablegen.h
>> > +++ b/libavcodec/pcm_tablegen.h
>> > @@ -87,21 +87,21 @@ static av_cold void build_xlaw_table(uint8_t *linear_to_xlaw,
>> > {
>> > int i, j, v, v1, v2;
>> >
>> > - j = 0;
>> > - for(i=0;i<128;i++) {
>> > - if (i != 127) {
>> > - v1 = xlaw2linear(i ^ mask);
>> > - v2 = xlaw2linear((i + 1) ^ mask);
>> > - v = (v1 + v2 + 4) >> 3;
>> > - } else {
>> > - v = 8192;
>> > - }
>> > - for(;j<v;j++) {
>> > + j = 1;
>> > + linear_to_xlaw[8192] = mask;
>> > + for(i=0;i<127;i++) {
>> > + v1 = xlaw2linear(i ^ mask);
>> > + v2 = xlaw2linear((i + 1) ^ mask);
>> > + v = (v1 + v2 + 4) >> 3;
>> > + for(;j<v;j+=1) {
>> > + linear_to_xlaw[8192 - j] = (i ^ (mask ^ 0x80));
>> > linear_to_xlaw[8192 + j] = (i ^ mask);
>> > - if (j > 0)
>> > - linear_to_xlaw[8192 - j] = (i ^ (mask ^ 0x80));
>> > }
>> > }
>> > + for(;j<8192;j++) {
>> > + linear_to_xlaw[8192 - j] = (127 ^ (mask ^ 0x80));
>> > + linear_to_xlaw[8192 + j] = (127 ^ mask);
>> > + }
>> > linear_to_xlaw[0] = linear_to_xlaw[1];
>>
>> i think you can make the tables 8 times smaller
>
> forget this, i should have checked the whole table or looked when i
> am awake ...
ha ha. By the way, both changes are needed to get this level of
speedup, with only the j change which you acked, the speedup is much
smaller. But then also note that the other parts of the patch also
increase the binary size more.
>
> [...]
>
> --
> Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> I do not agree with what you have to say, but I'll defend to the death your
> right to say it. -- Voltaire
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
More information about the ffmpeg-devel
mailing list