[MPlayer-dev-eng] [PATCH] vf_osd updates - fully baked?

Jason Tackaberry tack at sault.org
Mon Sep 12 17:59:56 CEST 2005


Hi Reimar,

On Mon, 2005-09-12 at 10:30 +0200, Reimar Döffinger wrote: 
> I don't like that two much since they usually tend to end up being
> declared multiple time. 

True :)

[tack at viper main]$ find -name '*.h' | xargs grep -i "min("
./libaf/af.h:#define min(a,b)(((a)>(b))?(b):(a))
./loader/wine/windef.h:#define __min(a,b) (((a) < (b)) ? (a) : (b))
./loader/wine/windef.h:#define min(a,b)   (((a) < (b)) ? (a) : (b))
./tremor/os.h:#  define min(x,y)  ((x)>(y)?(y):(x))
./libfaad2/common.h:#define min(a, b) (((a) < (b)) ? (a) : (b))
./libmpdemux/aviheader.h:#define MIN(a,b) (((a)<(b))?(a):(b))
./libmpdemux/aviheader.h:#define min(a,b) (((a)<(b))?(a):(b))
./libmpdemux/asf.h:#define MIN(a,b) (((a)<(b))?(a):(b))
./libavcodec/common.h:#define FFMIN(a,b) ((a) > (b) ? (b) : (a))

> For consistency at least use MIN and MAX I'd say...

Ok.

> I somehow missed to comment on the place where you used them, it looked
> extremely obfuscated to me. Maybe the good old chain of ifs isn't that
> bad...

I mainly use it for clipping regions.  A chain of if/else would work, of
course, but I find myself thinking of the problem in terms of max/mins.
But I can see how a line like

  slice_h = min(priv->h - slice_y, max(0, min(priv->slice_h, src_mpi->height-slice_y)));

might be termed obfuscated.  I'll see about expanding that into if/else.

> > +#define FALSE 0
> > +#define TRUE !FALSE
> 
> I really _hate_ it when people use defines for things like that. Why
> isn't just using 0 and 1 good enough? I think at least every C programmer
> knows that 1 is true and 0 is false...

If the return type is int, and I'm returning 1, am I returning a
true/false value?  Or maybe the number of items I've processed, in which
case if I process 2 items, should I return 2?

I like to remove this kind of ambiguity.  I could document the return
value in the function's doc, but returning a TRUE/FALSE value is
something I can see quite explicitly in the code that "ok, this
function's return is a bool" without having to page up to the comment
that describes the return value.

So it's not about me forgetting whether TRUE is 1 or 42 or 0, but
eliminating that moment needed to consider if the return type of a
function I'm debugging and don't look at very often is a boolean or not.

Now, those being my arguments, for 50k of code to maintain, it's not a
big deal.  If the preference among the core hackers is 1/0 instead of
TRUE/FALSE defines, that's fine with me. :)


> > +    /** Lockbyte points to the first byte of the shared memory buffer which
> > +     *  is used for synchronization. See \a BUFFER_LOCKED and \a
> > +     *  BUFFER_UNLOCKED  flags above.
> > +     */
> > +    unsigned char *lockbyte;
> 
> I guess I should have a closer look at this code somewhen. Most of the
> "locking" code I have seen just didn't work...

Yes, it's very easy to making broken locking code.  The semantics of the
OSD locking are made simple, because otherwise there are race conditions
and it simply can't work.  In a nutshell, it works because MPlayer is
only allowed to set the flag UNLOCKED, and the controlling app is only
allowed to set the flag LOCKED.  (The controlling app sets the flag
locked, essentially locking the buffer from itself.  It can't write to
it again until MPlayer sets the flag unlocked, which means MPlayer is
done reading it.)

I stress test the OSD code (one of my tests is using another mplayer
instance to play a movie on the OSD of another running movie) and am
pretty comfortable in stating that it works.  I also think the locking
semantics are correct because they're pretty simple.

> > +    /** Points to the beginning of the BGRA image in shared memory (which
> > +     *  is simply (lockbyte+1).
> 
> I do not like this. Doesn't this mean that the bgra samples are
> unaligned? That can create serious problems, esp. on Sparc...
> Preferably any larger data should be 16 byte aligned, since that is the
> only values that is sure to create no problems - even when using SSE.

You're right.  I suppose I should allocate a few extra bytes and begin
the BGRA data on a 16 byte aligned boundary.

Is it safe to assume shmat() will return a page-aligned address?  It
doesn't explicitly say, but rather says "the system chooses a suitable
(unused) address at which to attach the segment."  Still, I'd be
surprised if it wasn't aligned.

Given that, it should be safe to specify that the BGRA data begins at
shmptr+16.


> I read the reason why you don't use swscaler, but I don't really agree,
> since not using it will make it more difficult to support more formats
> later on (in case somebody cares).

A long time ago I had tried using the swscaler for the colorspace
conversion, and the only way I could make it work was to convert the
entire image (even if only a small region was changed).  Updating a
slice caused problems (which I forget now).  But maybe that was because
the BGRA buffer wasn't aligned properly.

Also, swscaler only seems to support slices, not arbitrary rectangular
regions.  In practice, this might not matter as swscaler will probably
still be faster converting an entire slice.

> Also after extracting the alpha to a seperate plane you should be able
> to scale it by using the scaler for IMGFMT_Y800.

This is an interesting idea, but I'm not sure if it would work for the
chroma alpha plane.  Given pixels of the luma alpha plane:

     A   B
     C   D

The chroma alpha is (A+C)>>1.  (In other words, this is just how chroma
is subsampled for YV12: the chroma alpha is done the same way.)  I could
take the luma alpha plane (which is equivalent the the alpha channel of
the BGRA image) and scale it down to w/2 by h/2, and the resulting plane
might be suitable for use as the chroma alpha plane, but I'm not sure if
it's technically correct given the above definition.  Would any of the
scaler algorithms yield this?


> > +#ifdef STOPWATCH
> > +/**
> > + * \brief Simple timer for profiling and debugging.
> 
> Hmm... at least for x86 libavutil/common.h has the much nocer and
> precise START_TIMER and STOP_TIME macros. I think they would be
> preferable if they work for you.

One problem with these macros is that you can't nest them.  For example,
I have a timer on all of put_image, which is broken down into several
subtimers (colorspace conversion, alpha premultiplication, and
blending).  Plus it has more flexibility in the output.  (You can use a
printf style formatting.)

Is it really more accurate?  At least, stopwatch() provides microsecond
precision (but it's dependent on the accuracy of gettimeofday).

At any rate, it will be ifdef'd out by default.  I left it there because
it's convenient for profiling changes.

> All I heard indicates that atexit creates a lot of problems. Also I do
> not see much point, the OS should free any associated resources on exit
> (or is that not the case for shared memory?)

Yes, the real reason is to remove the shared memory, which does not
happen automatically.

> > +    gettimeofday(&curtime, &tz);
> 
> You should definitly use the functions from osdep/timer.h

Ok.

> 
> > +    priv->shm_id = shmget(priv->shm_key, bufsize, IPC_CREAT | 0600);
> 
> In relation with the atexit problem, it would also be possible to let
> another application create the shared memory segment - if it's a good
> idea depends of course on what you want to use it for.

That would only work when the OSD size is fixed (i.e. specified by the
controlling application).  When it's not, the app can't know the size of
the OSD until the video starts playing.  vf_osd will allocate enough
memory to hold an OSD that fits over the video.

> > + * put_image), but the BGR32->BGR32 scaler doesn't scale the alpha channel.
> 
> As I mentioned before, since you have to seperate out the alpha in an
> extra plane anyway, you can first do that and then scale. I think. Btw.
> the swscaler can do the conversion and scaling in one step AFAIK.

It does.  I'll try to rework the code to use Swscaler.  I agree that
it's just a better design that way.  I may have to ask for help. :)

> > +static void
> > +invalidate_rect(struct vf_priv_s *priv, int x, int y, int w, int h)
> [...]
> > +    last = priv->invalid_rects;
> 
> Is priv->invalid_rects always != NULL?

At that point in the code it must be, yes.  Here is more context
surrounding the above quoted code:

            if (!priv->invalid_rects) {
                priv->invalid_rects = r;
                return;
            }
            last = priv->invalid_rects;
            while (last->next)
                last = last->next;
            last->next = r;

last will not be dereferenced unless it is not NULL.

> > +    while (last->next)
> > +        last = last->next;
> > +    last->next = r;
> 
> Since you don't seem to keep any specific order, why not just add it in
> the front with O(1) instead of at the end with O(n)?

True, good catch.

> "mov %4, %%"REG_a"\n\t"          // %eax = layer alpha
> "cmp $255, %%"REG_a"eax\n\t"        // don't apply layer alpha if it's 100% opaque
> 
> and include (I think) cpudetect.h. Otherwise it won't work on AMD64.

As Michael pointed out, I should be doing cmpl $255, %4 to begin with.
I'm not sure why I'm not.  I think at one point that code was in a loop
and I wanted to copy the value to a register since I would be comparing
it often.  Now I don't know if that really is much faster, but that was
my thinking. :)

> > +                "movl %4, %%ecx\n\t"
> 
> Same here with REG_c

Ok.

> > +static void **
> > +grow_array(void **old_array, int old_size, int diff)
> > +{
> > +    void **tmp = (void **)calloc(1, sizeof(void *) * (old_size + diff));
> > +    memcpy(tmp, old_array, sizeof(void *) * old_size);
> > +    if (old_array)
> > +        free(old_array);
> > +    return tmp;
> > +}
> 
> how about just using realloc??

I didn't think of it. :)  I'll change the code to use realloc.

> > +    if (cmd->id == MP_CMD_VF_OSD) {
> > +        /* This parsing code is not robust. Passing a malformed argument
> > +         * will probably result in a segfault. But this is not atypical
> > +         * for MPlayer. :)
> > +         */
> 
> Maybe you can use the subopt helper code? See subopt-helper.[ch]. Well,
> if you don't mind using ':' as seperator instead of ','.
> See e.g. vo_gl.c, preinit function for an example of how to use it.

I don't think this will work because it's possible to specify the same
command multiple times:

   osdcmd invalidate=0:0:200:200,invalidate=400:400:50:50

I think I can add some sanity checks to the existing parsing code to
make it more robust.


> > -      while( (cmd = mp_input_get_cmd(20,1,1)) == NULL) {
> > +      while( (cmd = mp_input_get_cmd(3,1,1)) == NULL) {
> 
> please keep the value as high as possible, I think esp. with gmplayer a
> change here can increase CPU load in paused mode a lot.

I did test with gmplayer and didn't observe any difference in load while
paused.

> > -             usec_sleep(20000);
> > +	     if (sh_video && sh_video->vfilter) 
> > +	       ((vf_instance_t*)sh_video->vfilter)->control(sh_video->vfilter, VFCTRL_PAUSE_UPDATE, video_out);
> > +	     usec_sleep(1000);
> 
> same here. Maybe you should check the return value, if it was not uses
> (e.g. returns IMPLEM), sleep longer. Not sure if it has a return value
> at all though...

I think it should return CONTROL_TRUE if the vfctrl was handled.
Sleeping longer if it wasn't handled is a good idea.

> > -      if (cmd && cmd->id == MP_CMD_PAUSE) {
> > -      cmd = mp_input_get_cmd(0,1,0);
> > -      mp_cmd_free(cmd);
> > +      switch (cmd->id) {
> > +            // Handle commands while paused.
> > +            case MP_CMD_LOADFILE:
> > +                cmd = mp_input_get_cmd(0,1,0);
> > +                mp_input_queue_cmd(cmd);
> > +                goto handle_cmd;
> 
> I don't like that goto here...
> > +            case MP_CMD_PAUSE:
> > +                cmd = mp_input_get_cmd(0,1,0);
> > +                mp_cmd_free(cmd);
> >        }
> > +
> 
> Shouldn't this be a seperate patch anyway?

Yes, I agree, this part should.  It implements the ability to load a
file while the video is paused.  It's a stripped version of another
patch I maintain, which implements seek-while-paused functionality.  But
you're right, for vf_osd, this part shouldn't be there.

The VFCTRL stuff in mplayer.c does need to be part of this patch,
though.  It's important functionality for vf_osd.

> remove the additional empty lines IMHO.

Done.

Thanks for your review, Reimar.  Between yours and Wanderer's
documentation comments I have a few hours work ahead of me.  I should
submit another patch by tomorrow.

Cheers,
Jason.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 229 bytes
Desc: This is a digitally signed message part
URL: <http://lists.mplayerhq.hu/pipermail/mplayer-dev-eng/attachments/20050912/170eae2e/attachment.pgp>


More information about the MPlayer-dev-eng mailing list