[FFmpeg-devel] [RFC] SSE3/4 implementation of flac_encode_residual_lpc

Mon May 4 11:51:16 CEST 2009

On Sun, 3 May 2009, Jason Garrett-Glaser wrote:

>> pmaddwd is your 16x16->32 signed multiply instruction. ?It will do
>> just as much work as pmulld in the case where the data is limited to
>> 16 bits--except at twice the speed.
>
> Also note about this: if you know that adding the results of any two
> multiplies won't overflow 32 bits, pmaddwd will do twice as much work
> as pmulld, and for a bonus, it even adds each pair of values together,
> finishing part of your horizontal sum.

AFAICT the flac spec doesn't say what arithmetic precision to use. It also 
doesn't say whether coded samples are allowed to overflow the output 
bitdepth, and if so, what to do with them.
But flac is supposed to be lossless, so unless proven otherwise I'll 
assume that streams which don't losslessly represent _anything_ are 
invalid.

The maximum lpc rightshift is 15. If you're coding 16bit or 17bit samples, 
the result of lpc after the horizontal sum will fit in 32bit. Intermediate 
values might not, but if you add them with overflow you'll get the right 
result.

--Loren Merritt