[Ffmpeg-devel] [RFC] Addition of JIT accelerated scaler for ARM into libswscale
Michael Niedermayer
michaelni
Wed Jan 24 00:31:16 CET 2007
Hi
On Wed, Jan 24, 2007 at 12:39:00AM +0200, Siarhei Siamashka wrote:
> On Tuesday 23 January 2007 14:30, Reimar Doeffinger wrote:
>
> > > A natural solution for getting good scaler performance is to use JIT
> > > style dynamic code generation. I spent full two days on the last weekend
> > > and got some initial scaler implementation working (it is quite simple
> > > and straightforward and uses less than 300 lines of code):
> > > https://garage.maemo.org/plugins/scmsvn/viewcvs.php/trunk/libswscale_noki
> > >a770/?root=mplayer
> >
> > What is the point of those four mprotects?? AFAICT at most you would
> > want to do one mprotect at the end to remove the write permission, but
> > if that is worth the extra dependency...
>
> One thing that is a bit different on ARM is that instruction cache coherency
> is not guaranteed automatically for self modifying code and explicit cache
> flush is required. Cache flush is performed by privileged instructions and
> can't be done in user mode. So operating system should provide some
> API for cache flushing. There is "Instruction Memory Barriers" part in ARM
> Architecture Reference Manual [1], it contains the recommendation for
> operating systems to use 'SWI 0xF00000' instruction to do syscall for
> providing this functionality. I did some search in the web for ARM, dynamic
> code generation and cache flushing and found some chunk of code that is
> used in mono virtual machine to do cache flush [2]:
>
> void
> mono_arch_flush_icache (guint8 *code, gint size)
> {
> __asm __volatile ("mov r0, %0\n"
> "mov r1, %1\n"
> "mov r2, %2\n"
> "swi 0x9f0002 @ sys_cacheflush"
> : /* no outputs */
> : "r" (code), "r" (code + size), "r" (0)
> : "r0", "r1", "r3" );
> }
>
> So syscall number for linux is actually different from what is recommended by
> ARM and apparently this code is not portable (systems other than linux may use
> something different).
>
> It would be reasonable to assume that when we do mmap to request an
> executable block of memory, instructions cache would be already flushed for
> this area. But unfortunately there seem to be some issues because of probably
> some bugs:
> http://lists.arm.linux.org.uk/pipermail/linux-arm-kernel/2006-January/033367.html
>
> So all there mprotect calls in my code are done in order to ensure that cache
> flushing works correctly. So we do:
> * mmap some memory block with permission to execute code from it
> * generate a simple code for function that should return 0
> * call this code and check that it really returned 0
> * do mprotect to disable and reenable code execution (and hope that it does
> cache flush)
> * generate a simple code for function that should return 1
> * call this code and check that it really returned 1 (without mprotect calls
> it would still return 0)
i do not like this redundant mess, especially because a simple task switch
at the right moment will make it look as if everything is ok
> * finally do mprotect to disable and reenable code execution to have
> instructions cache flushed again
>
> After all these steps have been successfully completed, we can be sure that
> everything works as expected. The only possible reason for this code to break
> is when original mmapped buffer already contains some cached instructions
> and the third step would result in a crash. But we can't do anything to
> prevent this anyway (and probability of crash in this situation should be
> extremely low). I just want to be sure that a broken mmap (which does not
> flush cache) will not result in the following pattern:
> * we do mmap and generate some scaling code inside of this buffer
> * we need to change video resolution, buffer is unmapped and we do mmap
> again getting buffer at the same address with already cached instructions
> * we generate new scaling code
> * attempt of calling generated code results in old scaler code execution
> because it is fetched from cache, resulting in undefined behaviour
just take a continous piece of code which is larger then the code cache, and
execute it 100 times, after that the code cache will be flushed if not your
hardware is very odd
iam wondering if a call or 2 to a random glibc function would do ...
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Those who are too smart to engage in politics are punished by being
governed by those who are dumber. -- Plato
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070124/6d674675/attachment.pgp>
More information about the ffmpeg-devel
mailing list