[FFmpeg-devel] Performances improvement in "image_copy_plane"

Marco Vianini marco_vianini at yahoo.it
Wed Jul 13 18:54:16 EEST 2022








On Wednesday, July 13, 2022 at 05:08:27 PM GMT+2, Paul B Mahol <onemda at gmail.com> wrote: 





On Wed, Jul 13, 2022 at 5:02 PM Marco Vianini <
marco_vianini-at-yahoo.it at ffmpeg.org> wrote:

>  I did following tests on Windows 10 64bit.I compiled code in Release.
> I copied my pc camera frames 1000 times (resolution 1920x1080):
> With Coalesce (MY PATCH):copy_cnt=100  size=1920x1080
> tot_time_copy(us)=36574 (average=365.74)copy_cnt=200  size=1920x1080
> tot_time_copy(us)=78207 (average=391.035)copy_cnt=300  size=1920x1080
> tot_time_copy(us)=122170(average=407.233)copy_cnt=400  size=1920x1080
> tot_time_copy(us)=163678(average=409.195)copy_cnt=500  size=1920x1080
> tot_time_copy(us)=201872(average=403.744)copy_cnt=600  size=1920x1080
> tot_time_copy(us)=246174(average=410.29)copy_cnt=700  size=1920x1080
> tot_time_copy(us)=287043(average=410.061)copy_cnt=800  size=1920x1080
> tot_time_copy(us)=326462(average=408.077)copy_cnt=900  size=1920x1080
> tot_time_copy(us)=356882(average=396.536)copy_cnt=1000 size=1920x1080
> tot_time_copy(us)=394566(average=394.566)
> Without Coalesce:copy_cnt=100  size=1920x1080 tot_time_copy(us)=44303
> (average=443.03)copy_cnt=200  size=1920x1080
> tot_time_copy(us)=100501(average=502.505)copy_cnt=300  size=1920x1080
> tot_time_copy(us)=150097(average=500.323)copy_cnt=400  size=1920x1080
> tot_time_copy(us)=201010(average=502.525)copy_cnt=500  size=1920x1080
> tot_time_copy(us)=256818(average=513.636)copy_cnt=600  size=1920x1080
> tot_time_copy(us)=303273(average=505.455)copy_cnt=700  size=1920x1080
> tot_time_copy(us)=359152(average=513.074)copy_cnt=800  size=1920x1080
> tot_time_copy(us)=414413(average=518.016)copy_cnt=900  size=1920x1080
> tot_time_copy(us)=465315(average=517.017)copy_cnt=1000 size=1920x1080
> tot_time_copy(us)=520381(average=520.381)
> I think the results are very good.What do you think about?
> Thank You
>
>
First stop top posting.

Where is patch?


>
>    Il mercoledì 13 luglio 2022 11:52:23 CEST, Paul B Mahol <
> onemda at gmail.com> ha scritto:
>
>  On Wed, Jul 13, 2022 at 11:38 AM Marco Vianini <
> marco_vianini-at-yahoo.it at ffmpeg.org> wrote:
>
> >  You can get a very big improvement of performances in the special (but
> > very likely) case of: "(dst_linesize == bytewidth && src_linesize ==
> > bytewidth)"
> >
> > In this case in fact We can "Coalesce rows", that is using ONLY ONE
> > MEMCPY, instead of a smaller memcpy for every row (that is looping for
> > height times).
> >
> > Code:"static void image_copy_plane(uint8_t      *dst, ptrdiff_t
> > dst_linesize,                            const uint8_t *src, ptrdiff_t
> > src_linesize,                            ptrdiff_t bytewidth, int
> > height){    if (!dst || !src)        return;
> > av_assert0(abs(src_linesize) >= bytewidth);
> av_assert0(abs(dst_linesize)
> > >= bytewidth); // MY PATCH START    // Coalesce rows.    if (dst_linesize
> > == bytewidth && src_linesize == bytewidth) {      bytewidth *= height;
> > height = 1;      src_linesize = dst_linesize = 0;    }// MY PATCH STOP
> >    for (;height > 0; height--) {        memcpy(dst, src, bytewidth);
> >  dst += dst_linesize;        src += src_linesize;    }}"
> > What do You think about?Thank You
> >
>
> Show the benchmark numbers.
>
>
> > Marco Vianini
> > _______________________________________________
> > ffmpeg-devel mailing list
> > ffmpeg-devel at ffmpeg.org
> > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> >
> > To unsubscribe, visit link above, or email
> > ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".
> >
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".

>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".
>
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel at ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".


Sorry, my mail client was using html format.
I hope now the mail will be sent correctly.


You can get a very big improvement of performances in the special (but very likely) case of: "(dst_linesize == bytewidth && src_linesize == bytewidth)"

In this case in fact We can "Coalesce rows", that is using ONLY ONE MEMCPY, instead of a smaller memcpy for every row (that is looping for height times).

Code:
"
static void image_copy_plane(uint8_t       *dst, ptrdiff_t dst_linesize,
                             const uint8_t *src, ptrdiff_t src_linesize,
                             ptrdiff_t bytewidth, int height)
{
    if (!dst || !src)
        return;
    av_assert0(abs(src_linesize) >= bytewidth);
    av_assert0(abs(dst_linesize) >= bytewidth);
    
    /// MY PATCH START
    /// Coalesce rows.
    if (dst_linesize == bytewidth && src_linesize == bytewidth) {
      bytewidth *= height;
      height = 1;
      src_linesize = dst_linesize = 0;
    }
    /// MY PATCH STOP

    for (;height > 0; height--) {
        memcpy(dst, src, bytewidth);
        dst += dst_linesize;
        src += src_linesize;
    }
}
"


I did following tests on Windows 10 64bit.
I compiled code in Release.
I copied my pc camera frames 1000 times (resolution 1920x1080):

With Coalesce:
copy_cnt=100  size=1920x1080 tot_time_copy(us)=36574 (average=365.74)
copy_cnt=200  size=1920x1080 tot_time_copy(us)=78207 (average=391.035)
copy_cnt=300  size=1920x1080 tot_time_copy(us)=122170(average=407.233)
copy_cnt=400  size=1920x1080 tot_time_copy(us)=163678(average=409.195)
copy_cnt=500  size=1920x1080 tot_time_copy(us)=201872(average=403.744)
copy_cnt=600  size=1920x1080 tot_time_copy(us)=246174(average=410.29)
copy_cnt=700  size=1920x1080 tot_time_copy(us)=287043(average=410.061)
copy_cnt=800  size=1920x1080 tot_time_copy(us)=326462(average=408.077)
copy_cnt=900  size=1920x1080 tot_time_copy(us)=356882(average=396.536)
copy_cnt=1000 size=1920x1080 tot_time_copy(us)=394566(average=394.566)

Without Coalesce:
copy_cnt=100  size=1920x1080 tot_time_copy(us)=44303 (average=443.03)
copy_cnt=200  size=1920x1080 tot_time_copy(us)=100501(average=502.505)
copy_cnt=300  size=1920x1080 tot_time_copy(us)=150097(average=500.323)
copy_cnt=400  size=1920x1080 tot_time_copy(us)=201010(average=502.525)
copy_cnt=500  size=1920x1080 tot_time_copy(us)=256818(average=513.636)
copy_cnt=600  size=1920x1080 tot_time_copy(us)=303273(average=505.455)
copy_cnt=700  size=1920x1080 tot_time_copy(us)=359152(average=513.074)
copy_cnt=800  size=1920x1080 tot_time_copy(us)=414413(average=518.016)
copy_cnt=900  size=1920x1080 tot_time_copy(us)=465315(average=517.017)
copy_cnt=1000 size=1920x1080 tot_time_copy(us)=520381(average=520.381)


I think the results are very good.
What do you think about?


Thank You




More information about the ffmpeg-devel mailing list