[FFmpeg-devel] [patch v4, v5] libpostproc: mmx code uses stack below %esp
Michael Niedermayer
michaelni
Tue Feb 2 20:20:59 CET 2010
On Tue, Feb 02, 2010 at 04:47:44PM +0300, Yuriy Kaminskiy wrote:
> On 30.01.2010 15:48, Michael Niedermayer wrote:
> > On Fri, Jan 29, 2010 at 10:24:11PM +0300, Yuriy Kaminskiy wrote:
> >> ==32414==
> >> ==32414== Invalid write of size 8
> >> ==32414== at 0x874CF44: postProcess_MMX2 (in /path/to/mplayer)
> >> ==32414== Address 0xbeffa9d0 is just below the stack ptr. To suppress, use:
> >> --workaround-gcc296-bugs=yes
> >> I, of course, don't use gcc-2.96 ;-)
> >> I've looked into libpostproc/postprocess_template.c, and, indeed, it uses memory
> >> below %esp:
> >> === cut ===
> >> static inline void RENAME(doVertDefFilter)(uint8_t src[], int stride, PPContext *c)
> >> {
> >> [...]
> >> __asm__ volatile(
> >> "pxor %%mm7, %%mm7 \n\t"
> >> "lea -40(%%"REG_SP"), %%"REG_c" \n\t" // make space for 4 8-byte
> >> vars
> >> "and "ALIGN_MASK", %%"REG_c" \n\t" // align
> >> static inline void RENAME(dering)(uint8_t src[], int stride, PPContext *c)
> >> static av_always_inline void RENAME(do_a_deblock)(uint8_t *src, int step, int
> >> stride, PPContext *c){
> >> === cut ===
> >> Not sure if this *must* be fixed, but it feels unsafe, so...
> >> Patch attached; doVertDefFilter and do_a_deblock changes should not affect
> >> speed, not sure about dering one.
> >
> > please put a START/STOP TIMER around things to make sure
>
> tried, failed to acquire conclusive results; original/v3/v4 variants sometimes
> [tiny bit] slower, sometimes [even less] faster (for all functions; at least, my
> assumption that doVertDef and do_a changes "safer", and dering "more
> problematic" seems wrong; btw, doVertDef should be checked with MMX2 and 3DNOW
> disabled, dering - with MMX2 or 3DNOW enabled [do_a_deblock - either way]).
>
> >> + DECLARE_ALIGNED(8, uint64_t, tmp)[4]; // make space for 4 8-byte vars
> >
> > tabs are forbidden in our svn
>
> och; i know; sorry
>
> >> - "movq (%%"REG_c"), %%mm2 \n\t" // 2L0 - 5L1 + 5L2 - 2L3
> >> - "movq 8(%%"REG_c"), %%mm3 \n\t" // 2H0 - 5H1 + 5H2 - 2H3
> >> + "movq %3, %%mm2 \n\t" // 2L0 - 5L1 + 5L2 - 2L3
> >> + "movq 8+%3, %%mm3 \n\t" // 2H0 - 5H1 + 5H2 - 2H3
> >
> > i have a bad feeling about this, what makes you belive this will work
> > better than the last time such syntax was tried
> > also my gcc replaces "o" with 4+%0 happily by 4+(%edx)
>
> works for me (tm), and it should work with gas, because it is supposed to be
> used in this way; well, and for cc not to use 1234(%esp) here would be very
> strange (not that we can rely on that). anyway, alternative version [v4], that
> does not use 8+%3/"o" attached;
> or v5, that does not attempt to save one register in dering (seems, that was not
> needed even for -fPIC -fomit-frame-pointer, and -fPIC -fno-omit-frame-pointer
> was broken before
> [plus postproc asm is not PIC anyway];
there are much more people who insist on -fPIC because they are confused than
there are people actually need code free of relocations
I think after reading over this again the best solution would be to use the
context as temporary space, we have a "m"(c->pQPb) anyway so if we put a
pointer into a register to the context we could address pQPb and the temp
aligned space easily.
That way we dont need to create any aligned space on the stack ...
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
The educated differ from the uneducated as much as the living from the
dead. -- Aristotle
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100202/10f5fa45/attachment.pgp>
More information about the ffmpeg-devel
mailing list