[FFmpeg-devel] [PATCH v2 2/3] avcodec/h274: add film grain synthesis routine

Wed Aug 18 21:35:44 EEST 2021

18 Aug 2021, 17:41 by jamrial at gmail.com:

> On 8/17/2021 4:25 PM, Niklas Haas wrote:
>
>> From: Niklas Haas <git at haasn.dev>
>>
>> This could arguably also be a vf, but I decided to put it here since
>> decoders are technically required to apply film grain during the output
>> step, and I would rather want to avoid requiring users insert the
>> correct film grain synthesis filter on their own.
>>
>> The code, while in C, is written in a way that unrolls/vectorizes fairly
>> well under -O3, and is reasonably cache friendly. On my CPU, a single
>> thread pushes about 400 FPS at 1080p.
>>
>> Apart from hand-written assembly, one possible avenue of improvement
>> would be to change the access order to compute the grain row-by-row
>> rather than in 8x8 blocks. This requires some redundant PRNG calls, but
>> would make the algorithm more cache-oblivious.
>>
>> The implementation has been written to the wording of SMPTE RDD 5-2006
>> as faithfully as I can manage. However, apart from passing a visual
>> inspection, no guarantee of correctness can be made due to the lack of
>> any publicly available reference implementation against which to
>> compare it.
>>
>> Signed-off-by: Niklas Haas <git at haasn.dev>
>> ---
>>  libavcodec/Makefile |   1 +
>>  libavcodec/h274.c   | 811 ++++++++++++++++++++++++++++++++++++++++++++
>>  libavcodec/h274.h   |  52 +++
>>  3 files changed, 864 insertions(+)
>>  create mode 100644 libavcodec/h274.c
>>  create mode 100644 libavcodec/h274.h
>>
>> diff --git a/libavcodec/Makefile b/libavcodec/Makefile
>> index 9a6adb9903..21739b4064 100644
>> --- a/libavcodec/Makefile
>> +++ b/libavcodec/Makefile
>> @@ -42,6 +42,7 @@ OBJS = ac3_parser.o                                                     \
>>  dirac.o                                                          \
>>  dv_profile.o                                                     \
>>  encode.o                                                         \
>> +       h274.o                                                           \
>>  imgconvert.o                                                     \
>>  jni.o                                                            \
>>  mathtables.o                                                     \
>> diff --git a/libavcodec/h274.c b/libavcodec/h274.c
>> new file mode 100644
>> index 0000000000..0efc00ca1d
>> --- /dev/null
>> +++ b/libavcodec/h274.c
>> @@ -0,0 +1,811 @@
>> +/*
>> + * H.274 film grain synthesis
>> + * Copyright (c) 2021 Niklas Haas <ffmpeg at haasn.xyz>
>> + *
>> + * This file is part of FFmpeg.
>> + *
>> + * FFmpeg is free software; you can redistribute it and/or
>> + * modify it under the terms of the GNU Lesser General Public
>> + * License as published by the Free Software Foundation; either
>> + * version 2.1 of the License, or (at your option) any later version.
>> + *
>> + * FFmpeg is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> + * Lesser General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU Lesser General Public
>> + * License along with FFmpeg; if not, write to the Free Software
>> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
>> + */
>> +
>> +/**
>> + * @file
>> + * H.274 film grain synthesis.
>> + * @author Niklas Haas <ffmpeg at haasn.xyz>
>> + */
>> +
>> +#include "libavutil/avassert.h"
>> +#include "libavutil/imgutils.h"
>> +
>> +#include "h274.h"
>> +
>> +// The code in this file has a lot of loops that vectorize very well, this is
>> +// about a 40% speedup for no obvious downside.
>> +#pragma GCC optimize("tree-vectorize")
>>
>
> Will this not break compilation with msvc and such?
>
> Also, tree vectorization is know to cause issues in old GCC versions, and even recent ones. I don't know if this is worth the potential problems it could introduce, but i guess it can be done until someone writes simd.
>

I really, really would rather not have any compiler hints at all. It's not like
the function is incredibly slow without SIMD, and comparatively 40% speedup
for a handwritten SIMD function is a failing grade for me, so I think we should
leave it out.