[FFmpeg-devel] [PATCH] lavc/aarch64: h264qpel, add lowpass_8 based functions
Martin Storsjö
martin at martin.st
Fri Sep 3 14:26:15 EEST 2021
On Fri, 3 Sep 2021, Martin Storsjö wrote:
>> +function \type\()_h264_qpel8_v_lowpass_neon_10
>> + ld1 {v16.8H}, [x1], x3
>> + ld1 {v18.8H}, [x1], x3
>> + ld1 {v20.8H}, [x1], x3
>> + ld1 {v22.8H}, [x1], x3
>> + ld1 {v24.8H}, [x1], x3
>> + ld1 {v26.8H}, [x1], x3
>> + ld1 {v28.8H}, [x1], x3
>> + ld1 {v30.8H}, [x1], x3
>> + ld1 {v17.8H}, [x1], x3
>> + ld1 {v19.8H}, [x1], x3
>> + ld1 {v21.8H}, [x1], x3
>> + ld1 {v23.8H}, [x1], x3
>> + ld1 {v25.8H}, [x1]
>> +
>> + transpose_8x8H v16, v18, v20, v22, v24, v26, v28, v30, v0, v1
>> + transpose_8x8H v17, v19, v21, v23, v25, v27, v29, v31, v0, v1
>> + lowpass_8_10 v16, v17, v18, v19, v16, v17
>> + lowpass_8_10 v20, v21, v22, v23, v18, v19
>> + lowpass_8_10 v24, v25, v26, v27, v20, v21
>> + lowpass_8_10 v28, v29, v30, v31, v22, v23
>> + transpose_8x8H v16, v17, v18, v19, v20, v21, v22, v23, v0, v1
>
> I'm a bit surprised by doing this kind of vertical filtering by transposing
> and doing it horizontally - when vertical filtering can be done so
> efficiently as-is without needing any extra 'ext' instructions and such. But
> I see that the existing code does it this way. I'll give it a try to make a
> PoC of rewriting the existing code for some case to see how it behaves
> without the transposes.
The potential speedups for the vertical filters are huge actually; I've
sent a patch that IMO simplifies this (getting rid of all transposes). I'd
appreciate if you'd remodel your patch according to it.
// Martin
More information about the ffmpeg-devel
mailing list