[MPlayer-dev-eng] Help with MMX asm code
Billy Biggs
vektor at dumbterm.net
Thu Oct 23 19:07:29 CEST 2003
Jason Tackaberry (tack at auc.ca):
> Representation of the image was taken from the original bmovl filter.
> The image is stored in YUVA where each channel is a separate array: Y
> and A channels being size width*height, and U and V channels being
> size width*height/4. This is also how mplayer represents the image
> (except that there is no alpha channel).
If your Cb/Cr channels (U and V) are only width*height/4, then you
have a 4:2:0 image not a 4:2:2 image. For some explanation see:
http://www.poynton.com/PDFs/Chroma_subsampling_notation.pdf
> So computation is done rather straightforwardly byte for byte between
> corresponding elements of the src and dst arrays. Where mpimg is the
> video frame, and img is the image stored as described above (to be
> overlaid), the process is roughly this, if we assume mpimg and img are
> the same dimensions:
>
> foreach y in height:
> foreach x in width:
> pos = y * width + x
> a = layer_alpha/255 * img.A[pos]
I would recommend instead: a = layer_alpha/256 * img.A[pos] as
division by 255 is expensive and it's cheap to keep around 4 bytes
instead of only 1 byte for your layer alpha.
> mpimg.Y[pos] = blend(mpimg.Y[pos], img.Y[pos], a)
> if y % 2 and x % 2:
> pos = y/2 * width/2 + x/2
> mpimg.U[pos] = blend(mpimg.U[pos], img.U[pos], a)
> mpimg.V[pos] = blend(mpimg.V[pos], img.V[pos], a)
First, this seems wrong. If we look at a block of four pixels:
A B
C D
You're using the alpha from pixel D to apply to the Cb/Cr components.
For MPEG2, the chroma samples are positioned halfway between A and C, so
if you want to be really correct, you should filter the alpha channel,
for example by taking the average alpha value between A and C. If this
is expensive, at least use the alpha of pixel A and not pixel D.
> def blend(p1, p2, a):
> # Which you pointed out is wrong ...
> return ( (255-a)*p1 + a*p2 ) >> 8
Yeah, you should fix that :)
> My thoughts were to use MMX to parallelize the blend computation
> several bytes at once. But maybe for now I should go back to the
> beginning and rework the above approach?
This memory layout is fine and you can optimize it like it is. My
code might help as a starting point ... I can definitely edit any code
you come up with too :)
If you want help let me know.
-Billy
More information about the MPlayer-dev-eng
mailing list