[FFmpeg-devel] [PATCH v2 2/4] swscale/x86: add sse4 {lum, chr}ConvertRange
Ramiro Polla
ramiro.polla at gmail.com
Fri Jun 14 18:46:22 EEST 2024
On Wed, Jun 12, 2024 at 4:54 PM Ramiro Polla <ramiro.polla at gmail.com> wrote:
>
> Hi,
>
> On Tue, Jun 11, 2024 at 8:42 PM James Almer <jamrial at gmail.com> wrote:
> >
> > On 6/11/2024 3:26 PM, Michael Niedermayer wrote:
> > > On Tue, Jun 11, 2024 at 02:28:56PM +0200, Ramiro Polla wrote:
> > >> chrRangeFromJpeg_8_c: 28.7
> > >> chrRangeFromJpeg_8_sse4: 16.2
> > >> chrRangeFromJpeg_24_c: 152.7
> > >> chrRangeFromJpeg_24_sse4: 29.7
> > >> chrRangeFromJpeg_128_c: 366.5
> > >> chrRangeFromJpeg_128_sse4: 233.0
> > >> chrRangeFromJpeg_144_c: 408.0
> > >> chrRangeFromJpeg_144_sse4: 182.5
> > >> chrRangeFromJpeg_256_c: 698.7
> > >> chrRangeFromJpeg_256_sse4: 325.5
> > >> chrRangeFromJpeg_512_c: 1348.7
> > >> chrRangeFromJpeg_512_sse4: 660.2
> > >> chrRangeToJpeg_8_c: 37.7
> > >> chrRangeToJpeg_8_sse4: 16.2
> > >> chrRangeToJpeg_24_c: 115.7
> > >> chrRangeToJpeg_24_sse4: 36.2
> > >> chrRangeToJpeg_128_c: 631.2
> > >> chrRangeToJpeg_128_sse4: 163.7
> > >> chrRangeToJpeg_144_c: 710.7
> > >> chrRangeToJpeg_144_sse4: 183.0
> > >> chrRangeToJpeg_256_c: 1253.0
> > >> chrRangeToJpeg_256_sse4: 343.5
> > >> chrRangeToJpeg_512_c: 2491.2
> > >> chrRangeToJpeg_512_sse4: 654.2
> > >> lumRangeFromJpeg_8_c: 11.7
> > >> lumRangeFromJpeg_8_sse4: 10.5
> > >> lumRangeFromJpeg_24_c: 38.5
> > >> lumRangeFromJpeg_24_sse4: 19.0
> > >> lumRangeFromJpeg_128_c: 237.5
> > >> lumRangeFromJpeg_128_sse4: 79.2
> > >> lumRangeFromJpeg_144_c: 255.7
> > >> lumRangeFromJpeg_144_sse4: 90.5
> > >> lumRangeFromJpeg_256_c: 441.5
> > >> lumRangeFromJpeg_256_sse4: 161.7
> > >> lumRangeFromJpeg_512_c: 879.0
> > >> lumRangeFromJpeg_512_sse4: 333.2
> > >> lumRangeToJpeg_8_c: 20.0
> > >> lumRangeToJpeg_8_sse4: 11.7
> > >> lumRangeToJpeg_24_c: 61.5
> > >> lumRangeToJpeg_24_sse4: 17.7
> > >> lumRangeToJpeg_128_c: 357.5
> > >> lumRangeToJpeg_128_sse4: 80.0
> > >> lumRangeToJpeg_144_c: 371.5
> > >> lumRangeToJpeg_144_sse4: 93.2
> > >> lumRangeToJpeg_256_c: 651.5
> > >> lumRangeToJpeg_256_sse4: 164.5
> > >> lumRangeToJpeg_512_c: 1279.0
> > >> lumRangeToJpeg_512_sse4: 333.7
> > >> ---
> > >> libswscale/swscale_internal.h | 1 +
> > >> libswscale/utils.c | 2 +
> > >> libswscale/x86/Makefile | 1 +
> > >> libswscale/x86/range_convert.asm | 130 +++++++++++++++++++++++++++++++
> > >> libswscale/x86/swscale.c | 36 +++++++++
> > >> 5 files changed, 170 insertions(+)
> > >> create mode 100644 libswscale/x86/range_convert.asm
> > >
> > > breaks x86-32 build
> > >
> > > LD ffmpeg_g
> > > /usr/lib/gcc-cross/i686-linux-gnu/7/../../../../i686-linux-gnu/bin/ld: libswscale/libswscale.a(utils.o): in function `sws_setColorspaceDetails':
> > > ffmpeg/linux32/src/libswscale/utils.c:1086: undefined reference to `ff_sws_init_range_convert_x86'
> > > collect2: error: ld returned 1 exit status
> > > make: *** [Makefile:139: ffmpeg_g] Error 1
> > >
> > > thx
> >
> > The functions are wrapped in ARCH_X86_64 checks for seemingly no reason,
> > so they should be removed in the next iteration.
>
> Fixed.
>
> James walked me through on IRC to optimize and improve the functions
> in a way that they work both with sse2 and avx2. New patch attached.
I'll apply tomorrow if there are no more comments.
More information about the ffmpeg-devel
mailing list