[FFmpeg-devel] [PATCH] avcodec/scpr: optimize shift loop.
Brian Matherly
brian.matherly at yahoo.com
Sun Sep 10 00:37:52 EEST 2017
On 9/9/2017 1:27 PM, Michael Niedermayer wrote:
> + // If the image is sufficiently aligned, compute 8 samples at once
> + if (!(((uintptr_t)dst) & 7)) {
> + uint64_t *dst64 = (uint64_t *)dst;
> + int w = avctx->width>>1;
> + for (x = 0; x < w; x++) {
> + dst64[x] = (dst64[x] << 3) & 0xFCFCFCFCFCFCFCFCULL;
> + }
> + x *= 8;
> + } else
> + x = 0;
> + for (; x < avctx->width * 4; x++) {
> dst[x] = dst[x] << 3;
> }
Forgive me if I'm not understanding the code correctly, but couldn't you
always apply the optimization if you align the first (up to) 7 samples?
Pseudocode:
uint64_t *dst64 = (uint64_t *)dst;
int w = avctx->width>>1;
x=0
// compute un-aligned beginning samples
for (; x < (avctx->width * 4) && (((uintptr_t)dst) & 7); x++) {
dst[x] = dst[x] << 3;
}
// compute aligned samples
for (; x < w; x+=8) {
dst64[x] = (dst64[x] << 3) & 0xFCFCFCFCFCFCFCFCULL;
}
x -= 8;
// compute un-aligned ending samples
for (; x < avctx->width * 4; x++) {
dst[x] = dst[x] << 3;
}
More information about the ffmpeg-devel
mailing list