[Ffmpeg-devel-irc] ffmpeg-devel.log.20160214
burek
burek021 at gmail.com
Mon Feb 15 02:05:02 CET 2016
[00:12:14 CET] <cone-621> ffmpeg 03FearThe1337 07master:c33ffc7b21b9: libavdevice/dshow.c: Correct CoGetMalloc check
[00:38:57 CET] <cone-621> ffmpeg 03Neil Birkbeck 07master:3b0974d3ef7f: lavc/hevc Parse SEI_TYPE_MASTERING_DISPLAY_INFO and propagate content into the AVMasteringDisplayMetadata side data.
[00:39:03 CET] <J_Darnley> When did ffmpeg get a ridiculous tty demuxer and why can't I find it in the docs?
[00:41:07 CET] <wm4> it's rather old
[00:41:11 CET] <wm4> and indeed ridiculous
[00:41:29 CET] <wm4> I have it blacklisted because it tends to probe positively for text files or so
[00:41:52 CET] <J_Darnley> Is this what was that big drama over data leaks was a few weeks ago?
[00:42:10 CET] <nevcairiel> nah
[00:42:53 CET] <wm4> that was concat and hls
[00:44:06 CET] <Timothy_Gu> J_Darnley: i believe it was a relic of mplayer era
[00:44:28 CET] <J_Darnley> Damn that is *old*
[00:48:04 CET] <Timothy_Gu> durandal_1707: can you take a look at "checkasm: Add vf_blend tests" when you get a chance? thanks
[00:48:36 CET] <Timothy_Gu> i thought i sent it two days ago but apparently i forgot
[01:30:31 CET] <cone-621> ffmpeg 03Timothy Gu 07master:123ff81a45b5: avutil: Remove x86_cpu.h
[01:36:38 CET] <cone-621> ffmpeg 03Marton Balint 07master:0250fc2146b3: avformat/img2enc: return error if image rename fails
[01:36:39 CET] <cone-621> ffmpeg 03Marton Balint 07master:35890aaa653a: avformat/img2enc: disable atomic file creation by default
[01:40:42 CET] <J_Darnley> Timothy_Gu: are you looking for CVTPS2DQConvert Packed Single-Precision FP Values to Packed Dword Integers?
[01:41:06 CET] <Timothy_Gu> J_Darnley: no, but rather to make sure that the value is correctly rounded/truncated when converting
[01:41:30 CET] <Timothy_Gu> plus, im already using that instruction :)
[01:41:39 CET] <J_Darnley> ah
[01:41:57 CET] Action: J_Darnley must consult the other half of the manual
[01:43:18 CET] <J_Darnley> oh that keeps float
[01:48:07 CET] <J_Darnley> I guess I don't have any other suggestions.
[01:48:26 CET] <J_Darnley> I haven't done any float simd yet.
[02:00:16 CET] <Timothy_Gu> the other option is to change the MXCSR register, but that is ugly and slow
[02:02:21 CET] <Timothy_Gu> what happened to the nvenc header bundling patch?
[02:03:57 CET] <Timothy_Gu> oh andreas blocked it
[02:08:20 CET] <michaelni> nevcairiel, is it ok to apply andreas hwaccel-mt error->warning patch (now after the vp9 fix) ?
[02:50:13 CET] <cone-621> ffmpeg 03Marton Balint 07master:3235241061d6: avutil/parseutils: use microsecond precision when parsing "now" in av_parse_time()
[02:50:14 CET] <cone-621> ffmpeg 03Marton Balint 07master:f834f0cab60f: avutil/parseutils: accept everything in av_parse_time that ff_iso8601_to_unix_time accepts
[02:50:15 CET] <cone-621> ffmpeg 03Marton Balint 07master:e942454daf05: avformat/utils: add ff_parse_creation_time_metadata
[02:50:16 CET] <cone-621> ffmpeg 03Marton Balint 07master:ea1bf08a4c5a: avformat/asfenc: use ff_parse_creation_time_metadata
[02:50:17 CET] <cone-621> ffmpeg 03Marton Balint 07master:bf0607b6dbca: avformat/dvenc: use ff_parse_creation_time_metadata
[02:50:18 CET] <cone-621> ffmpeg 03Marton Balint 07master:83b01ed21239: avformat/ffmenc: use ff_parse_creation_time_metadata
[02:50:19 CET] <cone-621> ffmpeg 03Marton Balint 07master:5c20bc8f4726: avformat/gxfenc: use ff_parse_creation_time_metadata
[02:50:20 CET] <cone-621> ffmpeg 03Marton Balint 07master:5f64f3d8cf06: avformat/movenc: use ff_parse_creation_time_metadata
[02:50:21 CET] <cone-621> ffmpeg 03Marton Balint 07master:ad17cc97446c: avformat/mxfenc: use ff_parse_creation_time_metadata
[02:50:23 CET] <cone-621> ffmpeg 03Marton Balint 07master:66e85a180ab3: avformat/matroskaenc: use ff_parse_creation_time_metadata
[02:50:24 CET] <cone-621> ffmpeg 03Marton Balint 07master:a573e6c10371: avformat/utils: remove ff_iso8601_to_unix_time
[03:01:57 CET] <jamrial> most of those could have been squashed into a single patch
[03:03:48 CET] <Gramner> Timothy_Gu: is that float conversion required? what happens if you multiply by 255 first, then divide, while keeping them as integers?
[03:04:22 CET] <Timothy_Gu> Gramner: hmm good idea. Why didn't I think of it?
[03:04:29 CET] <Timothy_Gu> ah the original C code uses ints
[03:04:32 CET] <Timothy_Gu> *floats
[03:04:45 CET] <Gramner> the c could be changed as well in that case
[03:04:49 CET] <Timothy_Gu> https://github.com/FFmpeg/FFmpeg/blob/master/libavfilter/vf_blend.c#L243
[03:04:50 CET] <Timothy_Gu> yeah
[03:12:48 CET] <Gramner> hmm, is there even an integer SIMD divide though? never needed to use that
[03:13:43 CET] <atomnuker> for integer division? nope
[03:13:46 CET] <BBB> nope
[03:13:57 CET] <BBB> the integer simd divide is pmulhuw with the inverse
[03:14:00 CET] <BBB> its not exacy
[03:14:11 CET] <BBB> (see whatever your quantize() does in x264)
[03:15:35 CET] <atomnuker> you might as well use FASTDIV if precision's not important though
[03:37:30 CET] <Timothy_Gu> jamrial: what do you mean by "these two"?
[03:41:29 CET] <jamrial> roundps and minps
[03:43:35 CET] <Timothy_Gu> without roundps the cvtps2dq uses the wrong rounding
[03:44:25 CET] <Timothy_Gu> without minps the conversion does something weird with divide by 0 (can't remember what)
[03:45:09 CET] <Timothy_Gu> "If a converted result is larger than the maximum signed doubleword integer, the floating-point invalid exception is raised, and if this exception is masked, the indefinite integer value (80000000H) is returned."
[03:46:03 CET] <Timothy_Gu> hah! found it: CVTTPS2DQ
[03:50:29 CET] <Timothy_Gu> also scratch on the minps comment. seems like it works fine now without it
[04:09:30 CET] <Timothy_Gu> hmm minps seems needed again lol
[04:24:48 CET] <Timothy_Gu> im feeling my mail frequency is approaching Mats's...
[04:28:11 CET] <jamrial> regarding the speed of gcc autovectorization, did you try it again after your "Reduce number of arguments for kernel function" patch?
[04:31:46 CET] <Timothy_Gu> jamrial: not really
[05:04:20 CET] <Timothy_Gu> jamrial: just tested, doesn't make any difference
[05:04:38 CET] <jamrial> ok
[05:04:43 CET] <Timothy_Gu> probably because the function already has like 9 args
[05:05:45 CET] <jamrial> it was mostly the fact you changed the loop into a simpler one which may be less confusing to gcc's vectorizer
[05:07:16 CET] <Timothy_Gu> ah
[05:07:48 CET] <Timothy_Gu> gcc can't vectorize integer divide anyway
[05:07:51 CET] <Timothy_Gu> neither can I :)
[06:22:49 CET] <andrewrk> Timothy_Gu, are you the same one who just accepted my pull request on mxe?
[06:48:38 CET] <Timothy_Gu> andrewrk: yes
[06:48:49 CET] <andrewrk> heh. nice :)
[06:48:53 CET] <Timothy_Gu> :)
[09:29:12 CET] <mateo`> xyz: depending on the mediacodec implementations, you are likely to be spammed with this kind of message, does the decoder (h264_mediacodec) output buffers as expected ?
[10:30:53 CET] <xyz> mateo`: yes it works fine on videos I've tested
[10:36:10 CET] <wm4> mateo`: so is asynchronity a problem? do you have a link to your wip?
[10:56:49 CET] <mateo`> wm4: https://github.com/mbouron/FFmpeg/commits/stupeflix-devel
[10:57:49 CET] <mateo`> wm4: asynchronity is not a problem atm, the big missing piece of this wip is surface output / hwaccel
[11:05:33 CET] <wm4> hm ok
[11:06:01 CET] <mateo`> I'm wondering what users would expect from the hwaccel part, do they want to provide their own surface, if so, is it enough if the surface is provided as a jobject (android/view/Surface), as there is not way, afaik, to convert a ANativeWindow to a java Surface object (you can do the other way around).
[11:10:23 CET] <mateo`> About the asynchronity, the complexity will be at the user level, ie: if you want to read the texture for a frame you have just renderered onto the surface, you have to listen to a particular callback, and then call the appropriate function
[11:10:58 CET] <wm4> personally I'd be interested in gl interop, but no idea how that works, and I heard it's slower
[11:11:29 CET] <wm4> wat
[11:11:48 CET] <mateo`> you can do that with a fbo copy
[11:12:07 CET] <mateo`> so you get a GL_TEXTURE_2D to deal with it in the end
[11:12:16 CET] <mateo`> and not an OES texture
[11:13:08 CET] <wm4> what's the problem with an oes texture?
[11:13:39 CET] <mateo`> if your sink is able to deal with such texture, there is no problem
[11:15:55 CET] <fritsch> if it helps, you can have a look in kodi concerning mediacodec surface rendering
[11:16:18 CET] <fritsch> works nicely, besides all the shortcomings mediacodec has concerning postprocessing / colors and the like
[11:20:06 CET] <mateo`> fritsch: yes, i've looked at the code base in the past to see how the java listener was implemented, but i didn't looked at how the display of the frames were handled.
[11:22:08 CET] <mateo`> i've implemented the surface output in gst also, wm4, i think a good start would be that i try to integrate this in libmpv while i develop it so i have a real use case
[11:26:17 CET] <wm4> personally I don't have a single android device, but maybe xyz is interested in going further
[11:27:11 CET] <fritsch> i implemented the "new" v23 passthrough api lately ...
[11:27:15 CET] <fritsch> :-( not fun
[11:27:22 CET] <fritsch> they pack themselves
[11:27:28 CET] <fritsch> and don't consume IEC frames
[11:27:43 CET] <fritsch> so half of the formats they don't recognize, stall the sink
[11:29:33 CET] <mateo`> fritsch: what a mess :(
[11:30:10 CET] <fritsch> yeah - I am in contact with their chief audio dev
[11:30:11 CET] <fritsch> :-)
[11:30:24 CET] <fritsch> chances are very good that next stable will have an IEC mode
[11:30:43 CET] <fritsch> e.g. 2 channels 192 khz or 2 channels 48 / 44.1 khz for EAC3, AC3, DTS
[11:30:49 CET] <fritsch> and 8 channels 192 khz for dtshd, truehd
[11:30:59 CET] <fritsch> so you can push those iec frames like you would "normally" do
[11:31:07 CET] <fritsch> on all available other normal apis
[11:31:34 CET] <fritsch> for now kodi will only use this new api on v23 and later (an on nvidias shield v22, which backports some v23)
[11:31:48 CET] <fritsch> we will even use PCM/16bit iec passthrough on v21, v22
[11:31:55 CET] <fritsch> as this other api is useless and overly complicated
[11:35:04 CET] <mateo`> :(
[11:36:26 CET] <nevcairiel> michaelni: i think the warning could use some clearer wording that its use is discouraged and known to break
[11:36:28 CET] <mateo`> wm4: I meant I can help if there's an android project that uses libmpv to design how the hwaccel will work
[11:37:07 CET] <mateo`> fritsch: do you use the MediaCodec C API in kodi in some cases ?
[11:40:54 CET] <fritsch> https://github.com/xbmc/xbmc/blob/master/xbmc/cores/VideoPlayer/DVDCodecs/Video/DVDVideoCodecAndroidMediaCodec.cpp <- jni for the win :-)
[11:41:28 CET] <wm4> m_frameready->WaitMSec(50);
[11:41:50 CET] <fritsch> read ...
[11:41:56 CET] <fritsch> more properly :-)
[11:42:00 CET] <fritsch> it's a max wait
[11:42:10 CET] <fritsch> if it was an always wait, would be hard to render more than 20 fps
[11:42:41 CET] <fritsch> though I really must admit, that mediacodec not really matches in kodi's rendermanager concept
[11:42:47 CET] <fritsch> those bypass renderers do whatever they want
[11:42:54 CET] <wm4> I've stared a lot at kodi's mmal wrapper, which has 2 timeouts... they're design faults of the mmal API though
[11:42:55 CET] <fritsch> so rendering subs and the like is always off
[11:43:10 CET] <fritsch> yeah - the mmal kodi write has written the PI firmware
[11:43:15 CET] <fritsch> :-)
[11:45:17 CET] <fritsch> I am currently rewriting kodi's AMLCodec (though no timeline - I don't use it all and it's hard to motivate myself ... to do things I don't really care about)
[11:59:05 CET] <michaelni> nevcairiel, ok, what should the warning say? "Hardware accelerated decoding with frame threading does require drivers and hw acceleration APIs to be thread save and or requires complex locking to be done by the user application otherwise It can result in artifacts or crashes. This combination is thus discouraged"
[12:00:10 CET] <michaelni> or something else ?, nevcairiel you are the expert on this ...
[12:13:11 CET] <xyz> mateo`: there's an android project here https://github.com/xyzz/mpv-android (and build scripts at https://github.com/xyzz/mpv-android-build); it's fairly basic and could be broken in some ways since I don't really have any experience with video, or android
[12:49:07 CET] <nevcairiel> michaelni: maybe not quite as long, how about "Hardware accelerated decoding with frame threading is known to be unstable and its use is discouraged"
[12:52:31 CET] <michaelni> sure, ok
[13:17:38 CET] <cone-139> ffmpeg 03Alex Agranovsky 07master:09b8e97ab62d: lavf/mpjpeg: Trim quotes on MIME boundary, if present.
[13:17:38 CET] <cone-139> ffmpeg 03Andreas Cadhalpun 07master:5edd1f62ca15: avcodec: only warn about hwaccel with frame threads
[13:34:23 CET] <Daemon404> the reign of mats continues
[13:34:34 CET] <Daemon404> i can't help but watch with morbid curiosity
[13:46:38 CET] <Daemon404> woah what
[13:46:45 CET] <Daemon404> did that massive flamethread end?
[13:49:18 CET] <nevcairiel> by giving into the stupid flamers, so not really
[13:49:29 CET] <Daemon404> i feared as much
[13:49:53 CET] <nevcairiel> reasonable people have no chance to win if there is people like that around
[13:50:10 CET] <nevcairiel> because you try to be reasonable and compromise, but they dont
[13:50:12 CET] <nevcairiel> so..
[13:50:21 CET] <Daemon404> 18:27 <+wm4> nevcairiel: well I'm leaving the debian guy to you
[13:50:21 CET] <Daemon404> 18:31 <@Daemon404> the worst part about these sorts of people is that they have infintie energy to argue their side
[13:50:24 CET] <Daemon404> 18:31 <@Daemon404> eventually everyone else gets tired / gives up
[13:50:26 CET] <Daemon404> 18:31 <@Daemon404> -> shit is pushed
[13:50:30 CET] <Daemon404> quoting myself.
[13:50:54 CET] <nevcairiel> personally I would just expell such people for the better of the project, but what can you do
[13:51:01 CET] <wm4> at least there's still the warning, reducing the probability that unsuspecting new API users run into this
[13:51:14 CET] <Daemon404> only if theyre looking at stdout.
[13:51:17 CET] <Daemon404> er stderr
[13:51:23 CET] <wm4> well they should
[13:51:35 CET] <Daemon404> using as a library? to write a plugin?
[13:51:37 CET] <Daemon404> seems unlikely
[13:52:05 CET] <wm4> if you use it as a lib you should overwrite the log callback (and pray nothing else in your process uses libav*)
[13:52:20 CET] <Daemon404> i wager next to no api users do this
[13:52:27 CET] <Daemon404> maybe even just you.
[13:52:44 CET] <nevcairiel> at least developers should look at the log to see if something is up, even if they dont expose it to users
[13:54:20 CET] <nevcairiel> all sane players used it properly already anyway
[14:53:49 CET] <cone-139> ffmpeg 03Alex Agranovsky 07master:ddda2cc43c85: lavf/mpjpeg: do not include CRLF preceding boundary as part of the returned frame
[15:10:38 CET] <jamrial> so, was that hwaccel revert actually accepted by every person blocking it?
[15:11:10 CET] <wm4> kind of
[15:13:04 CET] <jamrial> meaning?
[15:14:35 CET] <wm4> accepted but not too happy about it
[15:14:43 CET] <jamrial> alright
[15:22:18 CET] <cone-139> ffmpeg 03Paul B Mahol 07master:e167d4ebace0: avfilter/f_metadata: remove unused headers
[15:52:05 CET] <cone-139> ffmpeg 03Carl Eugen Hoyos 07master:593bb50e062f: MAINTAINERS: Add myself as libutvideo maintainer.
[15:52:24 CET] <durandal_1707> lol
[16:03:57 CET] <jamrial> wm4: why didn't you start the vote?
[16:04:15 CET] <jamrial> now carl came up with a bullshit excuse to keep that thing in place
[16:07:47 CET] <wm4> I actually dislike drama
[16:07:54 CET] <wm4> but maybe I or someone else should
[16:14:32 CET] <cone-139> ffmpeg 03Carl Eugen Hoyos 07master:4c44972f9929: avcodec: Fix a typo.
[17:16:04 CET] <J_Darnley> Why has vim started trying to remember the last cursor position in files?
[17:16:27 CET] <BtbN> it allways did for me
[17:17:13 CET] <J_Darnley> I'm sure its started doing that recently
[17:17:38 CET] <J_Darnley> Every git commit message now starts where I left the cursor in the previous one.
[17:31:34 CET] <c_14> J_Darnley: check /etc/vim/vimrc for something like g'\"" in an autocmd BufReadPost
[17:40:03 CET] <J_Darnley> Files in /etc shouldn't be used if I have a ~/.vimrc
[17:40:25 CET] <c_14> They are both sourced with options set in ~/.vimrc overriding those set in /etc/vim/vimrc
[17:40:53 CET] <J_Darnley> /rolleyes
[17:42:12 CET] <J_Darnley> Yes, there would appear to be some things in /etc
[17:44:39 CET] <J_Darnley> "When editing a file, always jump to the last cursor position" FUCK YOU /ETC!
[17:45:24 CET] <Daemon404> lol carl
[17:49:47 CET] <kierank> is there a guide to making and adding fate tests
[17:51:07 CET] <kierank> https://trac.ffmpeg.org/wiki/FATE/AddingATest
[17:51:12 CET] <kierank> but how do I make a reference value
[17:51:28 CET] <kurosu> GEN=1 ?
[17:55:18 CET] <Timothy_Gu> kierank: lol at reading my wip wiki page
[17:57:17 CET] <cone-139> ffmpeg 03Timothy Gu 07master:ba25936df589: vf_blend: Templatize identity function and use a better name
[17:57:18 CET] <cone-139> ffmpeg 03Timothy Gu 07master:ee281b884e2d: vf_blend: Use memcpy when opacity is 0
[17:59:21 CET] <cone-139> ffmpeg 03Timothy Gu 07master:45743239738b: vf_blend: Reduce number of arguments for kernel function
[18:00:21 CET] <Timothy_Gu> Gramner: is there a "magic number" for how many bytes one should process in one iteration?
[18:00:41 CET] <Timothy_Gu> if not, what factors does that number depend on?
[18:00:47 CET] <Gramner> not really, it depends on the circumstances
[18:01:01 CET] <Timothy_Gu> like, what circumstances?
[18:02:11 CET] <Gramner> can you avoid some overhead by doing more stuff at the same time, how efficient is your cpu at out-of-order executions and other µarch-specifics, etc.
[18:02:33 CET] <Timothy_Gu> ok
[18:02:44 CET] <cone-139> ffmpeg 03David Monro 07master:4b750104ea2b: lavf/spdifenc: Support MLP encapsulation.
[18:02:48 CET] <Timothy_Gu> so basically i'll have to check and see for every circumstance?
[18:02:54 CET] <Gramner> in this case for example, the unpacking and packing would be more efficient if you processed more data per iteration
[18:03:01 CET] <Gramner> pretty much, yes
[18:03:31 CET] <Timothy_Gu> ah
[18:03:36 CET] <Timothy_Gu> thanks
[18:05:23 CET] <Gramner> Timothy_Gu: another thing, it might be better to do the multiply before converting to float since you could do it with a shift and a sub that way
[18:05:40 CET] <Timothy_Gu> hmm true
[18:06:03 CET] <Timothy_Gu> and float mul is always slower than integer shifting/subtracting?
[18:08:54 CET] <Gramner> multiply is 4-5 clocks latency, basic int logic/arith are 1 clock
[18:09:56 CET] <Timothy_Gu> cool
[18:12:46 CET] <cone-139> ffmpeg 03Timothy Gu 07master:a678d667816a: vf_blend: Use integers for divide mode
[18:14:52 CET] <kurosu> main reason for unrolling is a more parallel execution: when the same insn is executed twice, and its result used slightly later (happens often when there's packing near the end of a loop)
[18:15:15 CET] <kurosu> bonus reason: fewer end of loops checks
[18:15:42 CET] <kurosu> (ie more stuff done in the loop compared to the loop logic)
[18:17:45 CET] <Timothy_Gu> kurosu: ok
[18:18:19 CET] <Timothy_Gu> so the cpu doesn't actually execute instr-by-instr but uses some kind of asynchronous logic
[18:18:26 CET] <kurosu> usually, it's not that hard to unroll some of those loops, so I often test the 2 versions
[18:18:39 CET] <kurosu> yes, the out-of-order execution stuff
[18:19:07 CET] <Timothy_Gu> michaelni: you are supposed to apply "[PATCH 1/2] vf_blend: Move C dsp function mapping to separate function" first, although it needs some rebasing
[18:20:08 CET] <kurosu> it's somewhat disturbed by conditional flow, but usually it can reorder a notable amount of insn (is it like a hundred, or more?)
[18:20:16 CET] <Gramner> modern x86 cpus has multiple execution engines and execute multiple instructions at the same time, and it can reorder instructions if that means it can execute something instead of waiting
[18:20:33 CET] <michaelni> Timothy_Gu, yep i forgot to apply that one
[18:20:40 CET] <Timothy_Gu> michaelni: you could use https://github.com/TimothyGu/FFmpeg/tree/blend-checkasm
[18:21:16 CET] <Timothy_Gu> Gramner: so instructions using different parts of the CPU can be executed at the same time
[18:21:22 CET] <kurosu> http://www.realworldtech.com/haswell-cpu/3/
[18:21:39 CET] <Gramner> yes. also on skylake the reorder buffer has 224 entries so it shuffle around instructions quite a lot
[18:22:08 CET] <Timothy_Gu> is a µop a cycle?
[18:22:36 CET] <Gramner> not really
[18:22:41 CET] <kurosu> nop, that's what the processor divides the insn into, if I'm not mistaken
[18:23:37 CET] <Timothy_Gu> ok
[18:24:59 CET] <kurosu> turtles^risc all the way down
[18:26:29 CET] <Gramner> not really true either. you can't really divide modern cpus into risc or cisc anymore since everything is kind of a hybrid
[18:27:15 CET] <Timothy_Gu> michaelni: crap, forgot to fix the checkasm func signature
[18:40:13 CET] <Timothy_Gu> Gramner: unrolling doesn't help with a 720p sample and a 256x256 sample
[18:40:59 CET] <kurosu> what are the cycle counts like? (using START/STOP_TIMER)
[18:41:25 CET] <Gramner> Timothy_Gu: can you post code?
[18:41:48 CET] <Timothy_Gu> one sec
[18:42:27 CET] <Timothy_Gu> http://sprunge.us/WbPI?diff
[18:42:40 CET] <Timothy_Gu> (incremental diff)
[18:42:47 CET] <Timothy_Gu> also I haven't made the * 255 change yet
[18:43:53 CET] <Gramner> you can use movq instead of 2x movd, both for loading and storing
[18:44:28 CET] <Gramner> the first set of punpcklbw can be reused for both registers
[18:44:51 CET] <Gramner> at the end use packssdw m0, m2 instead of doing them separately
[18:44:51 CET] <jamrial> and use a single packuswb with the two registers
[18:44:55 CET] <jamrial> lol
[18:46:09 CET] <kurosu> can't pshuflw or shufps be used here to save some unpacking ?
[18:46:55 CET] <Gramner> no, but you can use pmovzxbd with sse4.1
[18:47:42 CET] <jamrial> or pshufb wiht ssse3
[18:47:53 CET] <Gramner> ah yeah. that too
[18:48:55 CET] <Timothy_Gu> how would punpcklbw unpack to dword?
[18:50:02 CET] <jamrial> alongside a punpcklwd like you're already doing
[18:51:08 CET] <Timothy_Gu> im kind of confused now
[18:51:21 CET] <Timothy_Gu> so when i movq i'm loading 8 bytes
[18:51:47 CET] <Timothy_Gu> and punpcklbw make these bytes words
[18:51:54 CET] <Gramner> instead of 4x punpcklbw + 4x punpcklwd you can do 2x punpcklbw + 2x punpcklwd + 2x punpckhwd
[18:51:59 CET] <Timothy_Gu> oh
[18:52:01 CET] <Timothy_Gu> h
[18:52:12 CET] <Timothy_Gu> yeah I forgot about those :p
[19:08:13 CET] <Timothy_Gu> now i got http://sprunge.us/fOBK?diff
[19:08:25 CET] <Timothy_Gu> still doesn't make it *significantly* faster
[19:08:51 CET] <Timothy_Gu> i also couldn't get rid of the mova at line 36
[19:09:39 CET] <Timothy_Gu> or i could pxor m2 and m3
[19:10:51 CET] <Timothy_Gu> hmm maybe not
[19:12:17 CET] <Gramner> the division is probably the bottleneck
[19:12:28 CET] <Gramner> division is really slow
[19:12:53 CET] <Timothy_Gu> ok well I guess
[19:13:15 CET] <Timothy_Gu> I'll try the integer * 255 change
[19:14:03 CET] <Gramner> do it before converting to dwords, that way you only need to do it once
[19:15:22 CET] <Timothy_Gu> divps is 24%, mulps 11%
[19:24:05 CET] <Timothy_Gu> Gramner: doesn't make it faster either, though divps is now 36% according to perf
[19:24:14 CET] <Timothy_Gu> so yeah, it's probably divps's fault
[19:25:45 CET] <kurosu> is divps really needed, or rcpps' precision enough ?
[19:25:49 CET] <cone-139> ffmpeg 03Michael Niedermayer 07master:b65ea6ab4490: avfilter/vf_tinterlace: fix image alignment
[19:25:50 CET] <cone-139> ffmpeg 03KO Myung-Hun 07master:22a4046d66f7: compat/os2threads: Improve pthread_cond_xxx() functions
[19:25:51 CET] <cone-139> ffmpeg 03KO Myung-Hun 07master:6bf5e7d3e7d8: compat/os2threads: support the return value of joined thread
[19:25:52 CET] <cone-139> ffmpeg 03KO Myung-Hun 07master:b8bc6b14a556: compat/os2threads: split long lines
[19:25:53 CET] <cone-139> ffmpeg 03Mark Reid 07master:8395b6eeaa27: libavcodec/dnxhd_parser: add parser and probe support raw 444 and dnxhr formats
[19:26:00 CET] <Gramner> one round of rcpps is not enough, no
[19:26:17 CET] <Gramner> you'd need one interation of newton-raphson as well
[19:26:41 CET] <Gramner> which probably makes it a wash
[19:27:45 CET] <Gramner> you also need to add 1/256 at the end too because we truncate and rcpps might give a result that's too low
[19:28:09 CET] <Timothy_Gu> now I have no idea what you guys are talking about :)
[19:28:27 CET] <Gramner> fancy floating point magic, basically
[19:28:28 CET] <kurosu> rcpps is an approximation of the reciprocal of the source operand
[19:29:01 CET] <kurosu> I'd thought the input might allow this, but I haven't even really looked at how it's used
[19:29:49 CET] <Timothy_Gu> kurosu: it has to be bit-exact
[19:29:59 CET] <Timothy_Gu> so reciprocal probably doesn't have enough precision
[19:31:05 CET] <Gramner> you can get it bit-exact, but then you need one iteration of newton-raphson to refine the reciprocal output and add a small positive bias at the end
[19:31:06 CET] <kurosu> yeah, but the output was byte-sized, so I thought there would still be enough precision
[19:31:21 CET] <jamrial> Timothy_Gu: is it bitexact as is between x86_64 (sse math) and x86_32 (x87 math) with gcc?
[19:31:38 CET] <Timothy_Gu> let me check
[19:32:08 CET] <jamrial> i guess that after "vf_blend: Use integers for divide mode" it should be
[19:32:59 CET] <Gramner> even before that it should be, since normal float ops has 24 bits of precision
[19:33:51 CET] <Timothy_Gu> damn, avdev doesn't have multilib
[19:34:23 CET] <kierank> i can install it
[19:34:46 CET] <Timothy_Gu> ok that'd be great
[19:41:24 CET] <BBB> michaelni: wrong ticket
[19:41:29 CET] <BBB> michaelni: he said 5215, not 5125
[19:41:46 CET] <michaelni> ohh ooops
[19:41:59 CET] <BBB> no problem :-p I mea, fix whatever issue you want, alls great
[19:42:05 CET] <BBB> but its just not the issue he pointed out ;)
[19:42:34 CET] <BBB> (theyre cfhd crashes)
[19:47:08 CET] <cone-139> ffmpeg 03Timothy Gu 07master:8c56a4a1ed7d: vf_blend: Move C dsp function mapping to separate function
[19:47:09 CET] <cone-139> ffmpeg 03Timothy Gu 07master:a953a2991e28: checkasm: Add vf_blend tests
[19:48:58 CET] <cone-139> ffmpeg 03Timothy Gu 07master:ebf648d49044: checkasm/vf_blend: Decrease iteration count
[19:55:47 CET] <jamrial> Timothy_Gu: couldn't you have waited for durandal_1707's approval before pushing the checkasm test?
[19:55:56 CET] <kierank> michaelni: so what do I do about this rgb_to_a magic that's missing?
[19:55:57 CET] <Timothy_Gu> jamrial: he did approve
[19:56:20 CET] <Timothy_Gu> or wait his mail isn't on the ML
[19:56:24 CET] <jamrial> where?
[19:57:26 CET] <Timothy_Gu> http://sprunge.us/gZSC
[19:57:53 CET] <jamrial> odd, the reply-to field in your email had ffmpeg-devel's address
[19:59:31 CET] <Timothy_Gu> /shrug
[20:01:36 CET] <michaelni> kierank, see planar_rgb12XX_to_y vs. planar_rgb_to_y and do the same change to planar_rgb_to_a, should be easy to see if that is correct or off in some way if you hava a test case with some alpha
[20:02:17 CET] <kierank> I don't that's the probelm
[20:02:40 CET] <kierank> where does the shift of 6 come from in planar_rgb_to_a
[20:02:55 CET] <kierank> sr
[20:03:00 CET] <kierank> surely that is bit-depth dependent or something
[20:05:33 CET] <michaelni> yes, i think it should be 2 for 12bit input
[20:06:56 CET] <kierank> where does 14 come from?
[20:07:00 CET] <kierank> is it a magic number?
[20:07:17 CET] <Timothy_Gu> kurosu: the HAVE_MMX_INLINE macro is used to prevent asm being built that are not actually used
[20:08:03 CET] <nevcairiel> but inline is the wrong macro for yasm code, isnt it
[20:08:11 CET] <kierank> michaelni: and for 16 bit you shift right or what?
[20:08:19 CET] <kierank> all confusing and undocumented
[20:08:33 CET] <michaelni> for 16 there should be no shift right as that would loose precission
[20:10:02 CET] <michaelni> its a bit documented in doc/swscale.txt (the precission between h and v scalers) but this possibly predates later improvments that allow higher precission
[20:10:47 CET] <michaelni> also the text predates the changes done in last years GSoC
[20:11:32 CET] <cone-139> ffmpeg 03Timothy Gu 07master:bcc223523e68: x86/vc1dsp: Port vc1_*_hor_16b_shift2 to NASM format
[20:15:46 CET] <cone-139> ffmpeg 03Marton Balint 07master:ae51f9bd6c18: avutil/parseutils: remove 2112 date from fate test
[20:27:24 CET] <kurosu> Timothy_Gu, I see the point now, indeed
[20:33:12 CET] <kierank> michaelni: hmm strange crash
[20:39:57 CET] <kierank> is it possible to mix frame threading and use multiple threads to decode within a single frame?
[20:45:16 CET] <kurosu> kierank, no
[20:45:22 CET] <kurosu> (afaik)
[20:45:32 CET] <kierank> even with avctx->execute?
[20:45:47 CET] <kurosu> maybe, I don't know
[20:45:59 CET] <kurosu> I know openhevc has something like this, though
[20:46:50 CET] <kurosu> but that they introduced a lot of changes to threading
[20:48:57 CET] <kurosu> Timothy_Gu / nevcairiel: I understand Timothy_Gu's point now: if inline asm isn't activated then the callers aren't compiled, so what you build will end up being stripped anyway
[20:53:52 CET] <nevcairiel> kurosu: i see
[20:58:15 CET] <kierank> michaelni: are you joking?
[20:59:14 CET] <michaelni> kierank, about marking cfhd as experimental ?
[20:59:23 CET] <kierank> Yes
[21:01:36 CET] <michaelni> its a suggestion, i dont have the time to fix the bugs before the release. if you feel its minor and can be released as is then lets drop the patch
[21:03:23 CET] <kierank> It's no broken than any other RE'd codec
[21:09:06 CET] <mateo`> speaking of the 2.9 release, when is there a release date ?
[21:09:30 CET] <jamrial> it will probably be called 3.0 instead
[21:20:40 CET] <michaelni> people wanted 3.0 when i asked IIRC, so 3.0, when well ASAP
[21:22:35 CET] <mateo`> i'm asking because I'm thinking whether or not it would be a good to include basic mediacodec support in this release (only h264 / no hwaccel), but it might a bit late for that
[21:23:02 CET] <jamrial> yeah, leave that for the next release
[21:33:42 CET] <cone-139> ffmpeg 03Michael Niedermayer 07master:dcb6d5b831b3: avformat/genh: Mark coef_splitted as av_unused
[21:33:43 CET] <cone-139> ffmpeg 03Michael Niedermayer 07master:e5655a32bc74: avcodec/h264_cabac: Check decode_cabac_mb_mvd() for failure
[21:33:44 CET] <cone-139> ffmpeg 03Michael Niedermayer 07master:8352f5c80758: doc/protocols: document protocol_whitelist
[21:33:45 CET] <cone-139> ffmpeg 03Michael Niedermayer 07master:0eb4092c1bf4: avutil/imgutils: remove special case for aligning the palette
[21:33:46 CET] <cone-139> ffmpeg 03Michael Niedermayer 07master:b4018544fbbc: avformat/img2enc: remove unused variable
[21:38:44 CET] <michaelni> rcombs, carl says "The one-liner that changes dts calculation in mov.c fixes neither of the tickets." in 0214 21:34 Carl Eugen Hoyo ( 1)
[21:40:30 CET] <kierank> michaelni: turn off frame threading if you want
[21:40:49 CET] <kierank> the bugs are all resolution changes with frame threading which end up going wrong
[21:40:53 CET] <kierank> because frame threads clobbers data
[21:42:45 CET] <durandal_1707> fosdem videos are available, yaj
[21:43:35 CET] <michaelni> kierank, ok, will disabled frame-mt for cfhd on the release/3.0 branch, i think we can leave it enabled on master
[21:43:48 CET] <kierank> is there a proper way to synchronise resolution changes?
[21:44:02 CET] <kierank> because one thread clobbers the resolution of another thread
[21:45:15 CET] <rcombs> michaelni: yeah, it specifically fixes the issue from the ML, not the tickets
[21:45:44 CET] <michaelni> rcombs, ok, i thoght it fixes all 3 :(
[21:45:54 CET] <rcombs> nope, I still need to poke at the others more
[21:46:24 CET] <michaelni> ok, please do, even if it doesnt make it into 3.0 ill backport it so its in 3.0.1
[21:46:57 CET] <michaelni> also dont hesitate to apply that one line dts fix if you are sure its correct
[21:47:07 CET] <kierank> michaelni: is this true still? https://trac.ffmpeg.org/ticket/1312
[21:47:49 CET] <kierank> i.e resolution changes not allowed for frame threads?
[21:48:11 CET] <michaelni> resolution changes should work for frame threads but it can be annoying implementation wise
[21:48:42 CET] <kierank> what do you have to do
[21:49:27 CET] <kierank> can we document this somewhere
[21:51:44 CET] <michaelni> if each frame and decoder is fully independant then it should just work i think. otherwise update_thread_context / ff_thread_finish_setup can be used to sync when the next thread starts and the previous finished doing its setup
[21:51:56 CET] <kierank> each frame is independent yes
[21:52:01 CET] <kierank> but I get data clobbered
[21:52:28 CET] <kierank> printf shows that I allocate allocating 720x352 but my s->alloc_width and alloc_height variables are 720x480
[21:52:32 CET] <kierank> which causes all the crashes
[21:53:16 CET] <kurosu> kierank, maybe also ask BBB what's needed, because he had to deal with this in ffvp9?
[21:58:40 CET] <michaelni> kierank are all pointers in each thread independant or does something end up being shared? also you might have to implement init_thread_copy()
[21:58:57 CET] <BBB> ff_set_dimensions
[21:59:03 CET] <BBB> youre not using ff_set_dimensions
[21:59:07 CET] <kierank> I do
[21:59:13 CET] <BBB> hm....
[21:59:18 CET] <BBB> well that was the issue in ffvp9
[21:59:31 CET] <kierank> but I store the allocated width and height elsewhere
[21:59:37 CET] <kierank> in my context
[21:59:39 CET] <kierank> and that gets clobbered
[22:00:50 CET] <BBB> durandal_1707: anything interesting about multimedia projects?
[22:01:41 CET] <durandal_1707> I'm watching daala one
[22:02:34 CET] <BBB> kierank: is it possible the decoding per-tag makes it go crazy?
[22:02:55 CET] <BBB> e.g. you could have a tag that makes you do get_buffer, be followed by a tag that changes w/h, then a tag that makes you update_size
[22:03:50 CET] <kierank> yes but it should be handled
[22:10:12 CET] <BBB> so, the frame-mt path copies all w/h variables from thread to thread
[22:10:29 CET] <BBB> is it possible the ones inside your priv_data and the (copied) generic ones got out of sycn and you need to resync them?
[22:10:41 CET] <BBB> like, is it possible some frames do not have a w/h tag?
[22:10:44 CET] <BBB> or do each frame have them?
[22:12:23 CET] <kierank> the bug isn't that the frame lacks a w/h tag, the bug is that the stored value for the allocated width and height is wrong
[22:13:09 CET] <BBB> right, Im just trying to understand from the source how it could happen
[22:13:09 CET] <kierank> the code reads the written w/h, compares it with the allocated and frees and alloc's if necessary
[22:13:23 CET] <BBB> right, so theres the bug then, no?
[22:13:30 CET] <kierank> https://www.irccloud.com/pastebin/hlWb8fet/
[22:13:48 CET] <BBB> if you have 2 threads, a reads 1x1 and allocs, then b reads 2x2 and reallocs, b copies that back to a
[22:13:57 CET] <BBB> now internal state of a is 1x1 but external state is 2x2
[22:14:17 CET] <kierank> yes, but how do you deal with that?
[22:14:45 CET] <BBB> if internal state size != external state size, call ff_set_dimensions
[22:14:46 CET] <BBB> thats all
[22:14:54 CET] <BBB> (no realloc necessary of internal buffers)
[22:15:17 CET] <kierank> internal state = s->avctx->width?
[22:15:27 CET] <BBB> internal state is priv_data->a_width/a_height
[22:15:32 CET] <BBB> external state is AVCodecContext->*
[22:16:13 CET] <BBB> (not that that matters :-p)
[22:17:26 CET] <kierank> hmm ok let's try that
[22:21:37 CET] <kierank> I get [avi @ 0x7f25500008c0] Could not find codec parameters for stream 0 (Video: cfhd (CFHD / 0x44484643), yuv422p10le): unspecified size
[22:21:42 CET] <kierank> which I don't understand
[22:22:58 CET] <BBB> maybe FW the bug to me, Ill look monday
[22:23:06 CET] <BBB> I think I have kind of a clue whats going on maybe
[22:23:17 CET] <BBB> but Im taking a break today, I work hard enough on weekdays :-p
[22:24:35 CET] <kierank> I emailed it to you
[22:24:37 CET] <kierank> thanks
[22:28:16 CET] <cone-139> ffmpeg 03Michael Niedermayer 07master:bb9f7bf1a21d: Changelog/APIChanges Put 3.0 release marker
[22:29:24 CET] <atomnuker> ah, damn it, I should have made a changelog entry for the Dirac stuff
[22:29:31 CET] <atomnuker> too late now?
[22:30:18 CET] <atomnuker> michaelni: ^^?
[22:30:47 CET] <michaelni> atomnuker, not too late, you can still do it
[22:31:07 CET] <michaelni> i intend to make a branch first then wait a few hours and then make the release
[22:31:42 CET] <BBB> I really dont like that experimental tag
[22:32:03 CET] <BBB> well anyway
[22:32:06 CET] <BBB> Ill look tomorrow
[22:32:19 CET] <atomnuker> michaelni: alright, thanks, I'll append the entries at the bottom
[22:34:58 CET] <Timothy_Gu> atomnuker: you should add it where the changes are made
[22:35:03 CET] <Timothy_Gu> just to preserve consistency
[22:36:37 CET] <atomnuker> well, afaik the CFHD decoder was merged after the new DCA decoder, but the changelog lists CFHD first
[22:38:05 CET] <atomnuker> I can't see any chonological ordering tbh
[22:38:44 CET] <atomnuker> nnedi was merged after streamselect but the changelog is again showing it the other way around
[22:39:05 CET] <durandal_1707> huh?
[22:39:32 CET] <atomnuker> the changelog entreis are all seemingly out of order
[22:39:50 CET] <atomnuker> s/entreis/entries
[22:40:28 CET] <durandal_1707> the entries are in commit order
[22:40:48 CET] <durandal_1707> not in author order
[22:41:04 CET] <atomnuker> well, no, they're not
[22:41:48 CET] <atomnuker> CFHD was added a week after streamselect, but CFHD is first in the changelog
[22:44:55 CET] <durandal_1707> you are looking at author date, aren't you?
[22:45:25 CET] <atomnuker> I'm looking at commit date
[22:51:03 CET] <atomnuker> so do I add the VC-2 entries at the bottom or somewhere else?
[22:55:09 CET] <atomnuker> https://0x0.st/XGJ.diff <<- unless this doesn't look right I'll commit it in like half an hour
[22:56:39 CET] <jamrial> atomnuker: yeah, it's ok
[23:06:46 CET] <cone-139> ffmpeg 03Michael Niedermayer 07release/3.0:HEAD: Changelog/APIChanges Put 3.0 release marker
[23:10:45 CET] <cone-139> ffmpeg 03Rostislav Pehlivanov 07master:fbc96c50d72f: Changelog: add entries for the SMPTE VC-2 decoder and encoder
[23:12:34 CET] <atomnuker> michaelni: want me to push that commit to release/3.0?
[23:12:47 CET] <michaelni> atomnuker, yes, sure
[23:13:18 CET] <cone-139> ffmpeg 03Rostislav Pehlivanov 07release/3.0:380980e0d2b3: Changelog: add entries for the SMPTE VC-2 decoder and encoder
[00:00:00 CET] --- Mon Feb 15 2016
More information about the Ffmpeg-devel-irc
mailing list