On Tue, Oct 6, 2015 at 9:59 PM, Ronald S. Bultje <rsbultje at gmail.com> wrote: > +cglobal vp9_idct_idct_4x4_add_12, 4, 4, 6, dst, stride, block, eob [...] > + movd m0, coefd > + punpcklwd m0, m0 > + pshufd m0, m0, q0000 pshuflw + punpcklqdq is faster on some older CPUs, such as Conroe.