[Ffmpeg-devel-irc] ffmpeg-devel.log.20190119

Sun Jan 20 03:05:03 EET 2019

[01:26:19 CET] <Compn> i'll upload the other arbc samples for durandal , but its all the same source
[02:22:28 CET] <UukGoblin> hello, I was thinking to add a filter that could buffer decoded frames in order to smooth out playback (this is when software-decoding 10-bit HEVC on an Atom processor - it works most of the time, but has short periods where the codec is so intensive it eats up 100% CPU and frames get dropped)
[02:23:17 CET] <UukGoblin> I started off looking at vf_random.c, as it has a buffer and does quite a similar job (or at least so I thought)
[02:25:15 CET] <UukGoblin> but now I'm quite baffled whether I can do this at all: in particular, I'm wondering when filter_frame and request_frame get called... is it even possible that filter_frame will get called more often than request_frame?
[02:26:12 CET] <UukGoblin> as in, during times when CPU would not normally be utilized at 100%, will filter_frame be called as often as possible, thus filling CPU back to 100%?
[02:26:40 CET] <UukGoblin> or is mpv the only place where such buffering could be done?
[03:16:07 CET] <cone-953> ffmpeg 03Jun Zhao 07master:32fb83e43188: lavc/hls: Cosmetics: Fix indentation for free_segment_list
[03:50:59 CET] <Compn> UukGoblin : you know mpv has its own channel right ?
[03:51:08 CET] <Compn> atom wasnt built for 10bit hevc methinks :D
[03:52:46 CET] <UukGoblin> Compn, yeah, I know, but I suspect this kind of buffering should rather be done in ffmpeg. Someone there recommended writing a filter for it.
[03:53:30 CET] <UukGoblin> Compn, Atom Z8350 software-decodes 10-bit HEVC almost-fine for me :-) and when increasing the thread count to 32, it's actually smooth on this test video I'm using
[03:54:14 CET] <UukGoblin> the thread count increase seems to also increase the decoded-frame cache size, which does exactly what I want. I'm actually now searching whether it's possible to increase this thread via some option I've not yet found
[03:54:21 CET] <Compn> hmm
[03:54:31 CET] <UukGoblin> s/this thread/this cache/
[03:54:41 CET] <Compn> probably because the threads split up the frames against more buffrs
[03:54:46 CET] <UukGoblin> yup
[03:55:46 CET] <UukGoblin> extra_hw_frames doesn't seem to do it
[03:56:10 CET] <UukGoblin> (probably because it's for hwdec)
[07:39:01 CET] <rcombs> so I've been putting together checkasm for yadif
[07:39:19 CET] <rcombs> and it turns out the existing ASM functions are not consistent with the C
[07:40:07 CET] <rcombs> at least for 10- and 16-bit
[07:41:31 CET] <rcombs> and the existing tests are broken
[07:42:29 CET] <rcombs> FATE uses "-pix_fmt [format]", which converts _after_ the filter, so it doesn't actually test the 10- and 16-bit cases
[07:44:15 CET] <rcombs> the SSE4 routine for 16-bit is actually fine, but the SSE2 version isn't, because it uses the PMINSD macro, which is lossy on SSE2
[07:47:39 CET] <rcombs> which should be no surprise, since it converts from 32-bit int to 32-bit float and back
[10:47:14 CET] <nevcairiel> rcombs: interesting findings, but luckily high-bit-depth interlaced material is quite rare in consumer space :)
[10:47:37 CET] <rcombs> yeah it's not a huge deal, just making these test cases very frustrating to implement
[10:47:50 CET] <rcombs> in the field of "bigger deals", turns out VLC downloads updates over plaintext HTTP
[10:48:00 CET] <rcombs> and verifies them against a PGP key that it also downloads over plaintext HTTP
[10:48:46 CET] <nevcairiel> i dont suppose we have existing bitexact flags for filters eh
[10:48:58 CET] <nevcairiel> probably not
[10:50:21 CET] <rcombs> I'm just changing that particular macro to make the inexact path conditional on the caller requesting it
[10:50:29 CET] <rcombs> it only has 2 callers (yadif and swscale)
[10:50:52 CET] <nevcairiel> does it have an exact sse2 path even?
[10:50:53 CET] <rcombs> maybe swscale doesn't use any values that don't round-trip through float losslessly, idk
[10:51:13 CET] <rcombs> the SSE2 path is the inexact one; the MMX one is exact
[10:51:26 CET] <nevcairiel> that sounds weird
[10:51:31 CET] <nevcairiel> what did mmx have that sse2 didnt
[10:51:36 CET] <rcombs> and only a couple instructions longer
[10:51:50 CET] <rcombs> nothing, the SSE2 path just saves a couple instructions by being inexact
[10:52:04 CET] <rcombs> see PMINSD in x86util
[10:52:06 CET] <nevcairiel> i see
[10:52:16 CET] <nevcairiel> the "mmx" path is also valid sse2,  just longer
[10:52:20 CET] <rcombs> yup
[10:52:52 CET] <rcombs> (I think that's true of most cases where there's an MMX path and an SSE2 path, just, usually the SSE2 path is also exact)
[10:53:02 CET] <rcombs> I'm kinda curious if it's actually even faster
[10:53:14 CET] <rcombs> since it involves a floating-point transition
[10:55:03 CET] <nevcairiel> looks like it was blindly stolen from swscale and put into the generic macro at some point, possibly without too much consideration of the limitations swscale might have on the code in the first place
[10:55:29 CET] <rcombs> yup
[10:55:37 CET] <rcombs> (I did look through the blame)
[10:56:05 CET] <rcombs> I'm still trying to work out a problem with the 10-bit variant, which is being really weird
[10:56:26 CET] <rcombs> the FATE test is passing whether I enable the ASM via -cpuflags or not
[10:56:39 CET] <rcombs> but my checkasm pass isn't, and I can't figure out why
[10:56:57 CET] <rcombs> the 8-bit variant was always fine, and the 16-bit variant is fine after fixing PMINSD
[10:57:36 CET] <atomnuker> I kinda doubt that on modern CPUs you save much at all by going inexact
[10:57:37 CET] <nevcairiel> I assume you fixed the fate pixfmt thing for that test
[10:59:47 CET] <rcombs> yeah
[11:00:18 CET] <rcombs> atomnuker: it's not so much saving by being inexact as saving by having a dedicated MIN instruction, instead of doing a compare and then some bitwise ops
[11:00:49 CET] <rcombs> but my guess would be that the compare and bitwise ops would be cheaper (at least on modern CPUs) than converting to float, doing the MIN, and converting back
[11:00:51 CET] <nevcairiel> I doubt it was even intentional to be inexact here
[11:00:53 CET] <rcombs> even if it's more instructions
[11:01:13 CET] <nevcairiel> just a lack of full understanding of the code
[11:01:35 CET] <rcombs> if it actually mattered, I don't think FATE tests depending on that swscale code would pass
[11:01:39 CET] <rcombs> (assuming there are any)
[11:02:07 CET] <rcombs> like, if the inputs are always representable exactly as floats then it doesn't matter
[11:02:21 CET] <rcombs> maybe that's true in the 1 place this is used in swscale, idk
[11:02:42 CET] <nevcairiel> just have to keep the absolute value below 24-bit and its fine
[11:02:50 CET] <rcombs> yeah
[11:03:16 CET] <nevcairiel> when does sws deal with numbers over that, individual components typically end up 16-bit or lower in the end
[11:03:23 CET] <nevcairiel> so maybe the error also just rounds out eventually
[11:03:25 CET] <rcombs> probably never
[11:03:30 CET] <rcombs> I'm not actually sure why yadif _does_
[11:03:41 CET] <atomnuker> maybe we ought to fix yadif to make it bitexact
[11:03:57 CET] <rcombs> yes I'm working on that
[11:04:00 CET] <nevcairiel> it is, if it doesnt use this "broken" instruction  emulation
[11:04:10 CET] <rcombs> well, for 16-bit
[11:04:14 CET] <rcombs> still not sure what's wrong with 10-bit
[11:04:26 CET] <nevcairiel> i would think 10-bit is just 16-bit with more zeros
[11:04:38 CET] <nevcairiel> but maybe there is some extra magic to speed it up
[11:04:39 CET] <rcombs> it's a whole different implementation
[11:05:06 CET] <rcombs> I'd imagine one's derived from the other but it's in a separate .asm
[11:05:36 CET] <rcombs> as opposed to just being macroized
[11:19:11 CET] <kurosu__> regarding swscale discussion on sse2 vs mmx, one thing is that sse2 is actually the extension of mmxext
[11:19:52 CET] <kurosu__> pavg(us)b needs several instructions and unpacking to be properly emulated
[11:20:03 CET] <kurosu__> so maybe the mmx takes some shortcuts
[11:20:32 CET] <kurosu__> no idea beyond that, I haven't looked at the rest of the discussion
[00:00:00 CET] --- Sun Jan 20 2019