[FFmpeg-devel] Subject: [PATCH 3/3] lavc/dnxhdenc: R-V V get_pixels_8x4_sym

Sun Feb 18 14:27:56 EET 2024

ping

flow gg <hlefthleft at gmail.com> 于2024年1月30日周二 00:22写道：

> > I expect that it would be faster to make one large load, and then 4 small
> > stores, but that might work only for exactly 128-bit vectors?
>
> This seems to require vle128, so I didn't modify it.
>
> > That's not needed. You can use immediate values.
> > You can reorder to avoid immediate data dependencies on the addresses.
> > In any case, you need to check the vector length in init.
>
> Okay, I've updated it in the reply.
>
> Rémi Denis-Courmont <remi at remlab.net> 于2024年1月29日周一 23:41写道：
>
>> Hi,
>>
>> +/*
>> + * Copyright (c) 2023 Institue of Software Chinese Academy of Sciences
>> (ISCAS).
>> + *
>> + * This file is part of FFmpeg.
>> + *
>> + * FFmpeg is free software; you can redistribute it and/or
>> + * modify it under the terms of the GNU Lesser General Public
>> + * License as published by the Free Software Foundation; either
>> + * version 2.1 of the License, or (at your option) any later version.
>> + *
>> + * FFmpeg is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> + * Lesser General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU Lesser General Public
>> + * License along with FFmpeg; if not, write to the Free Software
>> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
>> 02110-1301
>> USA
>> + */
>> +
>> +#include "libavutil/riscv/asm.S"
>> +
>> +func ff_get_pixels_8x4_sym_rvv, zve64x
>> +        vsetivli    zero, 8, e8, mf2, ta, ma
>> +        vlse64.v    v16, (a1), a2
>> +        li          t0, 8 * 8
>> +        vsetvli     zero, t0, e16, m4, ta, ma
>> +        vzext.vf2   v8, v16
>> +        vse16.v     v8, (a0)
>> +        li          a2, 8*2
>>
>> That's not needed. You can use immediate values.
>>
>> +        vsetivli    zero, 2, e8, mf8, ta, ma
>> +        addi        a1, a0, 48
>> +        addi        a0, a0, 32*2
>> +        vle64.v     v0, (a1)
>> +        vse64.v     v0, (a0)
>> +        sub         a1, a1, a2
>> +        vle64.v     v0, (a1)
>> +        add         a0, a0, a2
>> +        vse64.v     v0, (a0)
>> +        sub         a1, a1, a2
>> +        vle64.v     v0, (a1)
>> +        add         a0, a0, a2
>> +        vse64.v     v0, (a0)
>> +        sub         a1, a1, a2
>> +        vle64.v     v0, (a1)
>> +        add         a0, a0, a2
>> +        vse64.v     v0, (a0)
>>
>> You can reorder to avoid immediate data dependencies on the addresses.
>>
>> I expect that it would be faster to make one large load, and then 4 small
>> stores, but that might work only for exactly 128-bit vectors?
>>
>> In any case, you need to check the vector length in init.
>>
>> +
>> +        ret
>> +endfunc
>>
>> --
>> 雷米‧德尼-库尔蒙
>> http://www.remlab.net/
>>
>>
>>
>> _______________________________________________
>> ffmpeg-devel mailing list
>> ffmpeg-devel at ffmpeg.org
>> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>>
>> To unsubscribe, visit link above, or email
>> ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".
>>
>