[FFmpeg-devel] [PATCH 1/4] ssim: refactor a weird double loop.
Paul B Mahol
onemda at gmail.com
Mon Jul 13 00:07:16 CEST 2015
On 7/12/15, Ronald S. Bultje <rsbultje at gmail.com> wrote:
> Hi,
>
> On Sun, Jul 12, 2015 at 10:29 AM, Paul B Mahol <onemda at gmail.com> wrote:
>
>> Dana 12. 7. 2015. 14:18 osoba "Ronald S. Bultje" <rsbultje at gmail.com>
>> napisala je:
>> >
>> > Hi,
>> >
>> > On Sun, Jul 12, 2015 at 6:48 AM, Paul B Mahol <onemda at gmail.com> wrote:
>> >
>> > > Dana 12. 7. 2015. 01:56 osoba "Ronald S. Bultje" <rsbultje at gmail.com>
>> > > napisala je:
>> > > >
>> > > > ---
>> > > > libavfilter/vf_ssim.c | 5 ++---
>> > > > 1 file changed, 2 insertions(+), 3 deletions(-)
>> > > >
>> > > > diff --git a/libavfilter/vf_ssim.c b/libavfilter/vf_ssim.c
>> > > > index 0721ddd..3ef122f 100644
>> > > > --- a/libavfilter/vf_ssim.c
>> > > > +++ b/libavfilter/vf_ssim.c
>> > > > @@ -134,7 +134,7 @@ static float ssim_end1(int s1, int s2, int ss,
>> int
>> > > s12)
>> > > > / ((float)(fs1 * fs1 + fs2 * fs2 + ssim_c1) * (float)(vars
>> +
>> > > ssim_c2));
>> > > > }
>> > > >
>> > > > -static float ssim_end4(int sum0[5][4], int sum1[5][4], int width)
>> > > > +static float ssim_endn(int (*sum0)[4], int (*sum1)[4], int width)
>> > > > {
>> > > > float ssim = 0.0;
>> > > > int i;
>> > > > @@ -169,8 +169,7 @@ static float ssim_plane(uint8_t *main, int
>> > > main_stride,
>> > > > &sum0[x]);
>> > > > }
>> > > >
>> > > > - for (x = 0; x < width - 1; x += 4)
>> > > > - ssim += ssim_end4(sum0 + x, sum1 + x, FFMIN(4, width -
>> > > > x
>> -
>> > > 1));
>> > > > + ssim += ssim_endn(sum0, sum1, width - 1);
>> > > > }
>> > > >
>> > > > return ssim / ((height - 1) * (width - 1));
>> > > > --
>> > > > 2.1.2
>> > > >
>> > > >
>> > >
>> > > Why? There was reason behind this code I guess.
>> > >
>> >
>> > I think it's for simd code simplification. See, I'm guessing the code
>> > you
>> > took from libvpx had an extra condition to do only 4-sized chunks
>> > through
>> a
>> > function pointer, and then the odd tail in c code. If you do this, the
>> simd
>> > code has a fixed size (always 4), which makes the implementation much
>> more
>> > trivial: 4 16-byte loads, add, transpose4x4d, and then ssim_end1 to get
>> > 4
>> > results, which you horizontal-add and return.
>> >
>>
>> I took this from tiny_ssim.c as pengvado said its ok to relicense to lgpl.
>
>
> I think the same reasoning still applies - this will get better
> performance, particularly if we consider avx2.
OK, patch lgtm.
>
> Ronald
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
More information about the ffmpeg-devel
mailing list