[FFmpeg-devel] [PATCH 0/6] avcodec/vc1: Arm optimisations
Ben Avison
bavison at riscosopen.org
Mon Mar 21 19:37:18 EET 2022
Hi Martin,
Thanks very much for taking a look.
On 19/03/2022 23:06, Martin Storsjö wrote:
> As you are writing assembly for these functions, I would very much
> appreciate if you could add checkasm tests for all the functions you're
> implementing. I see that there exists a test for the blockdsp functions,
> but all the other ones are missing a test.
I think I'd have a bit of a learning curve ahead of me there! I did
write my own fuzz testers to check the validity of my assembly
implementations, and I could share them (they'd probably need a bit of
tidying up since I wasn't intending them for public consumption) but
they were written in ignorance of the checkasm framework, so probably
wouldn't slot in neatly.
Is there any writeup of checkasm anywhere, discussing how it's used,
what sorts of things it tests, any speed/memory limits that tests should
try to adhere to - that sort of thing?
> The other main issue I'd like to request is to indent the assembly
> similarly to the rest of the existing assembly. For the 32 bit assembly,
> your patches do match the surrounding code, but for the 64 bit assembly,
> your patches align the operands column differently than the rest.
Since I was creating new source files for the 64-bit stuff, I assumed I
had a bit of leeway in indentation style - but I can easily change it.
For what it's worth, the opcodes in AArch64 are significantly shorter
than in AArch32, since the vector element size qualifiers go on the
operands instead of the opcodes, so there's less need for extra indentation.
> Finally, the 32 bit assembly fails to build for me both with (recent)
> clang and old binutils, with errors like these:
>
> src/libavcodec/arm/vc1dsp_neon.S: Assembler messages:
> src/libavcodec/arm/vc1dsp_neon.S:1579: Error: bad type for scalar --
> `vmov r0,d4[1]'
Thanks - the Armv8-A ARM says (section F6.1.139) that the data type can
be omitted here, and in that case it is equivalent to '32', so that's a
bug in clang. But easy to work around.
> Oh, sidenote - I do see that the last patch in the set uses much more
> inconsistent indentation, with varying indentation between lines. Is
> this intentional to signify some structure in the code, or just
> accidental?
That was deliberate! The inner loop there is unrolled x2, and then
adjacent iterations are overlapped 180 degrees out of phase. This is
because each iteration starts off busy, with lots of instructions to
execute, keeping pipelines full, and towards the end, it thins out,
meaning we can benefit by using what would otherwise be stalls to
speculatively start to process the next iteration before we've completed
the current one.
Effectively, if you only read a series of instructions with matching
indentation, you get one logical iteration of the loop - for example, in
the AArch32 version, you can follow through the process from loading the
source buffer into q10 (line 1849) until we store from it to the
destination buffer, having determined that it doesn't contain the start
of any escape sequences (line 1890).
It's a trick I've seen used a few times elsewhere, which is why I didn't
bother explaining it in a comment. I could add one, or if you still
don't like it once you've understood what it means, I'd be happy to take
it out.
Ben
More information about the ffmpeg-devel
mailing list