[Ffmpeg-devel-irc] ffmpeg-devel.log.20160605
burek
burek021 at gmail.com
Mon Jun 6 02:05:03 CEST 2016
[01:01:24 CEST] <CoJaBo> durandal_170: I've managed to locate the errant frame; it was the same issue, it's still reading a BMP header in the middle of the image data
[01:01:33 CEST] <CoJaBo> I've updated the bug
[01:10:46 CEST] <BtbN> CoJaBo, ah, so if the image data just so happens to look like a bmp header, it explodes?
[01:10:59 CEST] <CoJaBo> BtbN: Completely :/
[01:11:33 CEST] <CoJaBo> In 722GB of video (which is only like 15 minutes..), that's about as likely as the sun rising tomorrow.
[01:12:36 CEST] <CoJaBo> What I'm rendering is actually the downsamped version; the original is just shy of 4K res :/
[02:11:30 CEST] <cone-816> ffmpeg 03Michael Niedermayer 07master:39c0b22df420: avcodec/mpegvideo: Deallocate last/next picture earlier
[03:16:46 CEST] <cone-816> ffmpeg 03Michael Niedermayer 07release/3.0:2fc7e5c1b553: avformat/ffmdec: Check pix_fmt
[03:16:47 CEST] <cone-816> ffmpeg 03Michael Niedermayer 07release/3.0:9491f47035bf: avformat/options_table: Add missing identifier for very strict compliance
[03:16:48 CEST] <cone-816> ffmpeg 03Michael Niedermayer 07release/3.0:ef2b8416d956: avcodec/mjpegdec: Do not try to detect last scan but apply idct after all scans for progressive jpeg
[03:16:49 CEST] <cone-816> ffmpeg 03Michael Niedermayer 07release/3.0:46360e36d928: avformat/oggparseopus: Check that granule pos is within the supported range
[03:16:50 CEST] <cone-816> ffmpeg 03Michael Niedermayer 07release/3.0:79181b97d477: avformat/utils: Check bps before using it in a shift in ff_get_pcm_codec_id()
[03:16:51 CEST] <cone-816> ffmpeg 03Chris Cunningham 07release/3.0:069eea16d975: libavformat/oggdec: Free stream private when header parsing fails.
[03:16:52 CEST] <cone-816> ffmpeg 03Will Kelleher 07release/3.0:7c43c48fda09: hevc: Fix memory leak related to a53_caption data
[03:16:53 CEST] <cone-816> ffmpeg 03Michael Niedermayer 07release/3.0:f6586db165da: swresample/rematrix: Use error diffusion to avoid error in the DC component of the matrix
[03:16:54 CEST] <cone-816> ffmpeg 03Michael Niedermayer 07release/3.0:1cd872a7d555: swresample/rematrix: Use clipping s16 rematrixing if overflows are possible
[03:16:55 CEST] <cone-816> ffmpeg 03Michael Niedermayer 07release/3.0:d7ae13d47934: swresample/resample: Fix division by 0 with tap_count=1
[03:16:56 CEST] <cone-816> ffmpeg 03Michael Niedermayer 07release/3.0:ed71759fd08b: ffmpeg: Check that r_frame_rate is set before attempting to use it
[03:16:57 CEST] <cone-816> ffmpeg 03Michael Niedermayer 07release/3.0:e5d167149d82: avformat/utils: Do not compute the bitrate from duration == 0
[03:16:58 CEST] <cone-816> ffmpeg 03Chris Cunningham 07release/3.0:145b18ce9a27: avformat/utils: Check negative bps before shifting in ff_get_pcm_codec_id()
[03:16:59 CEST] <cone-816> ffmpeg 03Michael Niedermayer 07release/3.0:b11900251fff: avformat/avidec: Detect index with too short entries
[03:17:00 CEST] <cone-816> ffmpeg 03Michael Niedermayer 07release/3.0:4d9fdca05319: avcodec/diracdec: Fix potential integer overflow
[03:17:01 CEST] <cone-816> ffmpeg 03Gregor Riepl 07release/3.0:241f1e603f5c: ffserver: fixed deallocation bug in build_feed_streams
[03:17:02 CEST] <cone-816> ffmpeg 03Michael Niedermayer 07release/3.0:69c3dfdd548f: doc/developer.texi: Add a code of conduct
[03:17:03 CEST] <cone-816> ffmpeg 03Michael Niedermayer 07release/3.0:cc1e01d8b67f: avformat/utils: avoid overflow in update_stream_timings() with huge durations
[03:17:04 CEST] <cone-816> ffmpeg 03Michael Niedermayer 07release/3.0:7f864badc01f: avformat/utils: avoid overflow in compute_chapters_end() with huge durations
[03:17:05 CEST] <cone-816> ffmpeg 03Thomas Guilbert 07release/3.0:dab82a2a7c90: avformat/oggparseopus: Fix Undefined behavior in oggparseopus.c and libavformat/utils.c
[03:17:06 CEST] <cone-816> ffmpeg 03Michael Niedermayer 07release/3.0:e5942c143631: avcodec/bmp_parser: Fix state
[03:17:07 CEST] <cone-816> ffmpeg 03Michael Niedermayer 07release/3.0:c6470d81939c: avcodec/mpegvideo: Deallocate last/next picture earlier
[08:19:54 CEST] <omerjerk> Hi.
[08:20:18 CEST] <omerjerk> I added a new .c encoder file in the source tree.
[08:20:44 CEST] <omerjerk> I added this entry properly - OBJS-$(CONFIG_ALS_ENCODER)
[08:20:55 CEST] <omerjerk> ^^ inside the Makefile of libavcodec
[08:21:45 CEST] <omerjerk> what else do I need to do so that the new .c file is compiled properly ?
[08:21:52 CEST] <andrey_turkin_> add an entry to allcodecs.c
[08:22:04 CEST] <andrey_turkin_> and rerun configure
[08:22:29 CEST] <omerjerk> oh okay. thanks a lot!!
[10:30:42 CEST] <BtbN> andrey_turkin_, did you see my messages about the cuvid decoder yesterday? My ZNC trashed all messages i got over night...
[10:34:46 CEST] <andrey_turkin_> I saw you talked about it; didn't see any messages directed to me though
[10:35:07 CEST] <BtbN> Iirc you have maxwell gen2 hardware
[10:35:13 CEST] <andrey_turkin_> yes
[10:35:17 CEST] <BtbN> I'd like to know if decoding HEVC works
[10:35:39 CEST] <BtbN> https://github.com/BtbN/FFmpeg/tree/cuvid
[10:44:47 CEST] <andrey_turkin_> how do I run it?
[10:46:52 CEST] <BtbN> -c:v hevc_cuvid in front of some hevc input
[10:47:11 CEST] <BtbN> for zero-copy to nvenc, add -hwaccel cuvid
[10:49:32 CEST] <BtbN> It's not truely zero-copy though, but the copy that happens is GPU->GPU
[10:49:38 CEST] <BtbN> so it never hits system RAM
[10:50:39 CEST] <nevcairiel> on nvidia cards the speed difference should be marginal tho
[10:51:03 CEST] <BtbN> yeah, in my tests with h264 it's the exact same speed
[10:51:22 CEST] <BtbN> gets more interesting with Hardware-Filters
[10:51:38 CEST] <andrey_turkin_> as soon as you slap some scaling things get different
[10:52:14 CEST] <andrey_turkin_> I did a bit of benchmarking few days ago and found out that hardware decoder gives more CPU savings than hardware encoder
[10:52:40 CEST] <andrey_turkin_> like a lot more. libx264 with ultrafast/superfast presets is fast
[10:52:57 CEST] <BtbN> No idea how well the current state of ffmpeg.c handles hardware-filters though.
[10:53:12 CEST] <BtbN> From a quick look, i'd guess it just purely disables all filtering once it detects a hw format?
[10:54:50 CEST] <andrey_turkin_> decoder seems to be working
[10:55:08 CEST] <andrey_turkin_> encoder OTOH doesn't. Even without cuvid stuff
[10:56:33 CEST] <BtbN> even for encoding h264?
[10:56:41 CEST] <BtbN> I didn't touch nvenc at all in those patches
[10:56:43 CEST] <andrey_turkin_> yes
[10:57:02 CEST] <andrey_turkin_> I built it on Windows; maybe something creeped in and didn't caught it earlier
[10:57:14 CEST] <BtbN> On Linux it works fine.
[10:57:18 CEST] <BtbN> for h264
[10:59:18 CEST] <andrey_turkin_> seems not everything got rebuilt; let me do a clean build
[11:06:50 CEST] <andrey_turkin_> that is interesting. Now SD encoding works, HD doesn't
[11:08:03 CEST] <BtbN> as in, h264 vs. HEVC, or resolution based?
[11:08:17 CEST] <andrey_turkin_> resolution based. I don't use cuvid at the moment
[11:08:32 CEST] <andrey_turkin_> altough SD is H264
[11:08:44 CEST] <andrey_turkin_> but it shouldn't matter, right? encoder gets raw frames
[11:13:05 CEST] <andrey_turkin_> well, I get some spamming on console "Past duration 0.999992 is too large"; otherwise it seems to work fine with 720p and UHD
[11:13:44 CEST] <andrey_turkin_> 4x speed for UHD HEVC->H264 with default preset; 2x speed for UHD HEVC->HEVC
[11:14:26 CEST] <andrey_turkin_> for 2Mbit/s UHD stream resulting quality is surprisingly good
[11:15:56 CEST] <andrey_turkin_> actually on-GPU transcoding works with 1080p sample too
[11:16:12 CEST] <andrey_turkin_> but encoder fails when I use software decoder
[11:19:35 CEST] <andrey_turkin_> probably related to yuv420p issue I was talking about few days ago.
[11:20:44 CEST] <BtbN> weird
[11:20:58 CEST] <andrey_turkin_> it works if I force nv12
[11:22:55 CEST] <BtbN> Might be worth adding a check for the hw revision, to block 420p then
[11:26:10 CEST] <andrey_turkin_> I've yet to test if M4000 is affected by this
[11:27:44 CEST] <andrey_turkin_> and if passing CUDA frames to nvenc makes any difference
[12:30:16 CEST] <kierank> is it possible to disable probing in ffmpeg
[12:30:28 CEST] <kierank> and force a demux/mux
[12:49:59 CEST] <BtbN> kierank, -f ?
[12:50:08 CEST] <kierank> will that guarantee no probing?
[12:50:11 CEST] <kierank> for any of the codecs?
[12:50:27 CEST] <BtbN> it will make it use that demuxer/muxer, no matter what.
[12:50:48 CEST] <BtbN> At least i can get the mpegts demuxer to fail at opening my mkvs
[12:51:26 CEST] <BtbN> For codecs it's -c
[12:56:48 CEST] <BtbN> Freenode seems to be very broken lately
[12:57:08 CEST] <DHE> s/lately/for the last couple years/
[12:57:49 CEST] <BtbN> tons of strange ping timeouts from people who definitely have a good connection.
[13:10:10 CEST] <iive> BtbN: ping might be blocked by these people.
[13:10:17 CEST] <BtbN> nope
[13:10:35 CEST] <iive> some clients do that by default.
[13:10:47 CEST] <BtbN> I'm sharing a BNC with a friend of mine, and while connected to the same Freenode server, he had a ping timeout, while I didn't.
[13:10:53 CEST] <BtbN> That's a server-side issue for sure.
[13:11:28 CEST] <BtbN> in some channels there are hundrets of people having a pint timeout in that way, immediately re-connecting afterwards.
[13:11:33 CEST] <BtbN> *ping
[13:13:30 CEST] <Shiz> no client blocks pings
[13:16:05 CEST] <DHE> there's CTCP (client-to-client) pings and there's server-to-client pings. blocking the latter would result in a completely non-functional client
[13:25:30 CEST] <BtbN> this mingw64 cross compiler is damn slow...
[13:45:42 CEST] <BtbN> https://bpaste.net/show/07b7b48cc926 uhm, what? Never seen those linking errors before.
[14:03:42 CEST] <andrey_turkin_> oh right, I saw those before
[14:04:01 CEST] <andrey_turkin_> apparently mingw and msvc linkers disagree on some things
[14:04:16 CEST] <andrey_turkin_> you might want to rebuild implibs using mingw toolchain
[14:04:54 CEST] <BtbN> rebuild implibs? For cuda? oO
[14:05:00 CEST] <andrey_turkin_> yep
[14:05:16 CEST] <BtbN> trying again with -mcmodel=medium at the moment
[14:05:23 CEST] <andrey_turkin_> or maybe that can help
[14:05:38 CEST] <andrey_turkin_> to fiddle with target ABI
[14:06:16 CEST] <BtbN> How do you rebuild the implibs without the library source?
[14:06:26 CEST] <andrey_turkin_> you only need dll
[14:06:47 CEST] <andrey_turkin_> first you dump all the exports from it and then ask binutils nicely to please do implib from that
[14:07:36 CEST] <BtbN> I'm not sure if that's such a good idea for CUDA. There seems to be quite a bit of logic in that .lib
[14:08:05 CEST] <andrey_turkin_> that's unusual
[14:08:28 CEST] <andrey_turkin_> if you see bunch of _imp_... stuff it usually means this is just an implib
[14:08:33 CEST] <BtbN> They are way larger than they'd be if it's just a plain import wrapper
[14:08:33 CEST] <andrey_turkin_> which doesn't have any real code
[14:09:34 CEST] <andrey_turkin_> I can't remember for sure but I think I actually did rebuild npp* implibs at some point in the past and it worked
[14:09:40 CEST] <andrey_turkin_> no idea about cuvid though
[14:10:32 CEST] <andrey_turkin_> and for added fun, same issue goes the other way around. E.g. I couldn't use zeranoe's shared build as is with msvc - I first had to rebuild implibs using MS toolchain from def files
[14:11:15 CEST] <nevcairiel> thats why my windows builds use the MS lib tool to create import libraries, but of course that wont work if your cross compile
[14:11:41 CEST] <BtbN> I don't even have those DLLs I'd need to create import libs from.
[14:12:09 CEST] <andrey_turkin_> they are in cuda toolkit in windows. Of course if you are on linux you are stuck
[14:12:21 CEST] <BtbN> The .lib ones are there
[14:12:25 CEST] <BtbN> Not the actual DLLs
[14:12:33 CEST] <andrey_turkin_> I mean dlls
[14:13:55 CEST] <andrey_turkin_> nevcairiel: I've set up cross compilation and then rebuilding implibs on Windows. I've tried to build ffmpeg with its dependencies on Windows but that is a nightmare to do in CI
[14:15:29 CEST] <andrey_turkin_> and all that mingw/native stuff with pathnames as an added fun!
[14:19:11 CEST] <BtbN> x86_64-w64-mingw32-dlltool: nppi64_75.dll: no symbols
[14:19:14 CEST] <BtbN> that didn't work too well
[14:19:37 CEST] <andrey_turkin_> it should've
[14:20:43 CEST] <andrey_turkin_> I mean there are exports in that dll
[14:21:33 CEST] <BtbN> x86_64-w64-mingw32-objdump: nppi64_75.dll: not a dynamic object
[14:21:50 CEST] <andrey_turkin_> are you sure this is a dll? )
[14:22:13 CEST] <BtbN> yes.
[14:22:24 CEST] <BtbN> It happens on all 4 involved DLLs
[14:22:29 CEST] <BtbN> they are not normal DLLs aparently
[14:26:57 CEST] <andrey_turkin_> ok I have no idea what is going on on your end but on my end objdump can see into that dll
[14:28:36 CEST] <BtbN> depends.exe can open it just fine
[14:28:40 CEST] <BtbN> objdump and dlltool can't.
[14:29:12 CEST] <andrey_turkin_> weird. has to be something with your toolchain
[14:30:25 CEST] <BtbN> Happens with the x86_64-w64 build via gentoo crossdev, cygwin native, cygwin mingw64
[14:30:29 CEST] <BtbN> It's not the toolchain.
[14:34:43 CEST] <BtbN> objdump has the symbols.
[14:47:56 CEST] <BtbN> Manually created the def files now.
[15:30:19 CEST] <BtbN> hm, filtering with scale_npp is impossible.
[15:40:53 CEST] <BtbN> andrey_turkin_, any idea how to use scale_npp without modifying ffmpeg.c?
[15:42:56 CEST] <cone-446> ffmpeg 03Michael Niedermayer 07master:f90c9c306f4e: Check av_dup_packet() return code
[15:50:13 CEST] <andrey_turkin_> is it a problem?
[15:55:49 CEST] <BtbN> andrey_turkin, Impossible to convert between the formats supported by the filter 'graph 0 input from stream 0:0' and the filter 'auto-inserted scaler 0'
[15:55:53 CEST] <BtbN> I'd say so, yes.
[15:56:23 CEST] <BtbN> even with filter_complex that stuff comes up
[15:56:53 CEST] <BtbN> https://bpaste.net/show/c64692537a76
[15:57:13 CEST] <andrey_turkin> I don't really know how ffmpeg wires decoder and filters
[15:57:26 CEST] <andrey_turkin> filterchain itself works fine with input cuda frames
[15:57:45 CEST] <BtbN> well, there is only exactly one cuda aware filter
[15:57:55 CEST] <BtbN> And without that, it does not create a filter chain at all
[15:59:08 CEST] <andrey_turkin> log shows input frames are supposed to be in software format nv12
[15:59:33 CEST] <BtbN> yes, because the presence of the filter confuses something.
[16:01:00 CEST] <jkqxz> Try setting the global hw_device_ctx in ffmpeg. Or hack this line: <http://git.videolan.org/?p=ffmpeg.git;a=blob;f=ffmpeg_filter.c#l431>.
[16:01:33 CEST] <andrey_turkin> I remember recently some work was done in libav regarding filtering
[16:02:06 CEST] <BtbN> jkqxz, i do set that.
[16:02:39 CEST] <BtbN> https://github.com/BtbN/FFmpeg/blob/cuvid/ffmpeg_cuvid.c#L112
[16:02:46 CEST] <jkqxz> Where is your auto-inserted scaler coming from then?
[16:03:23 CEST] <andrey_turkin> it is at input side
[16:03:48 CEST] <BtbN> jkqxz, hm, according to gdb it's NULL
[16:03:48 CEST] <andrey_turkin> scale_npp is not compatible with nv12 pixelformat offered by buffersrc
[16:05:32 CEST] <BtbN> yeah, but those shouldn't exist.
[16:05:42 CEST] <BtbN> It seems like the cuvid_init_ stuff is never fully run
[16:06:31 CEST] <jkqxz> It was this change <http://git.videolan.org/?p=ffmpeg.git;a=commit;h=172d3568b38c6d0c872293bbffa947a43a8d86ec> that made the changes to ffmpeg to allow hardware transcode with vaapi.
[16:06:58 CEST] <jkqxz> (Including that line above to avoid the scale instance being added.)
[16:07:25 CEST] <BtbN> well, the vaapi stuff doesn't have native filters though
[16:07:42 CEST] <andrey_turkin> i think it does now
[16:07:47 CEST] <jkqxz> Like scale_vaapi?
[16:09:06 CEST] <BtbN> I'm not sure what's happening here though. Both cuvid_init and transcode_init are never called
[16:11:05 CEST] <andrey_turkin> that would break things for sure
[16:11:06 CEST] <jkqxz> "./ffmpeg_g -vaapi_device /dev/dri/renderD128 -hwaccel vaapi -hwaccel_output_format vaapi -i in.mp4 -an -vf hwupload,scale_vaapi=w=1280:h=720:format=nv12 -c:v h264_vaapi out.mp4" gives you hardware decode-scale-encode, say.
[16:12:12 CEST] <BtbN> ./ffmpeg_g -v trace -hwaccel cuvid -c:v h264_cuvid -i /mnt/union/videos/whatislife-creditroll.mkv -an -sn -c:v nvenc -preset slow -global_quality 24 -y out.mkv
[16:12:16 CEST] <BtbN> that works as expected
[16:12:31 CEST] <BtbN> Adding -filter:v "scale_npp=1280:720" completely breaks it.
[16:13:04 CEST] <andrey_turkin> that is strange. You should at least see cuvid_transcode_init call
[16:13:23 CEST] <jkqxz> Are you sure it isn't actually running with the frames going via CPU memory in the middle?
[16:13:36 CEST] <BtbN> Yes.
[16:13:48 CEST] <BtbN> It's not smart enough to automatically download/upload them.
[16:13:50 CEST] <andrey_turkin> why is there
[16:14:03 CEST] <andrey_turkin> ... a check for null filter in cuvid_transcode_init?
[16:14:03 CEST] <BtbN> andrey_turkin, yes, cuvid_transcode_init is called
[16:14:22 CEST] <andrey_turkin> I'd say that can break things
[16:15:50 CEST] <kierank> is there any way to decode dvb_teltext in ffmpeg
[16:15:53 CEST] <kierank> so I can fuzz it
[16:15:55 CEST] <BtbN> You mean ist->nb_filters?
[16:16:54 CEST] <andrey_turkin> https://github.com/BtbN/FFmpeg/blob/cuvid/ffmpeg_cuvid.c#L87
[16:18:13 CEST] <andrey_turkin> kierank: libzvbi_teletextdec ?
[16:18:23 CEST] <kierank> yes but I had to write a program to decode it to something
[16:18:40 CEST] <BtbN> andrey_turkin, aren't those diffrent filters? The interesting one should be associated with the InputStream
[16:19:12 CEST] <andrey_turkin> but why would you expect null filter there?
[16:23:26 CEST] <BtbN> guess the check for filters has to be a bit more intelligent, checking for cuda compatible filters.
[16:25:57 CEST] <andrey_turkin> depending on a time this check is done, supported pixel formats might already be fill in in the output filter
[16:26:01 CEST] <jkqxz> I suggest getting it working without checking anything. It's really user error if they try to use something which won't work there, so you are only checking in order to provide an error message anyway.
[16:26:25 CEST] <BtbN> I need those checks.
[16:26:40 CEST] <andrey_turkin> not really
[16:26:43 CEST] <jkqxz> Unless you actually want to make it work automatically, but that sounds like a recipe for confusion over whether stuff happens in hardware or not.
[16:26:56 CEST] <BtbN> Well, it needs some trigger to actually go for cuvid transcoding
[16:26:59 CEST] <andrey_turkin> you can check both encoder and decoder support CUDA frames.
[16:27:13 CEST] <BtbN> transcode_init is called before the hwaccel cli option is evaluated i think
[16:27:38 CEST] <jkqxz> Just hwaccel_output_format cuda? The user can add a hwdownload if they want later.
[16:28:11 CEST] <BtbN> hwaccel_output_format is entirely unused by cuvid
[16:28:16 CEST] <BtbN> it's not a classic hwaccel like vaapi
[16:28:18 CEST] <BtbN> more like QSV
[16:29:52 CEST] <BtbN> Isn't that to select the sw_format?
[16:30:08 CEST] <BtbN> That's kinda pointless for cuvid, there is only NV12
[16:31:01 CEST] <jkqxz> No, it selects the real format, to push a traditional hwaccel into doing the right thing in the retrieve_data call.
[16:31:26 CEST] <jkqxz> Though yes, not relevant here because your hwaccel is just a placeholder. I was forgetting that.
[16:35:54 CEST] <BtbN> Seems like removing that "null" check was enough.
[16:36:25 CEST] <BtbN> Also added another check, that aborts if those pre-checks fail but -hwaccel cuvid was given
[16:36:58 CEST] <BtbN> Not entirely sure if that's a good idea though, it might only be wanted for one of the streams?
[16:37:01 CEST] <cone-446> ffmpeg 03Michael Niedermayer 07master:5fb6e39dd1c3: avcodec/cfhd: clear idwt_buf on allocation
[16:37:11 CEST] <BtbN> Oh, wait. That hwaccel thing is per-input-stream.
[16:42:50 CEST] <BtbN> Yeah, that seems to work better.
[16:49:01 CEST] <BtbN> yep, looking good https://bpaste.net/show/c0f264f1671b vs. https://bpaste.net/show/47ce388d189a
[16:49:43 CEST] <cone-446> ffmpeg 03Michael Niedermayer 07master:2ccf9ae6ccc8: avformat/format: Print debug info when probe score is increased due to mime type
[16:51:56 CEST] <andrey_turkin> I'd expect bigger increase in performance
[16:52:39 CEST] <jkqxz> Get a slower CPU :P
[16:52:42 CEST] <nevcairiel> i'm surprised its even this big, something seems inefficient
[16:52:55 CEST] <nevcairiel> nvidia cards are pretty good at copying to sysmem
[16:53:21 CEST] <andrey_turkin> there should be non-linear effects though
[16:53:47 CEST] <BtbN> During sw scale one core is at 100%, so i'd guess that's what's limiting it.
[16:53:52 CEST] <andrey_turkin> if you try to run several transcoding sessions in parallel those gpu->sysmem transfers are going to serialize executions
[16:54:00 CEST] <nevcairiel> oh right, software scaling
[16:54:19 CEST] <nevcairiel> without scaling i dont expect any difference really
[16:54:29 CEST] <BtbN> without scaling it's the exact same speed
[16:54:35 CEST] <BtbN> the encoder performance is limiting it
[16:54:37 CEST] <andrey_turkin> which sound strange to me
[16:54:59 CEST] <nevcairiel> like i said, nvidia cards are very efficient when copying data from/to the GPU
[16:55:05 CEST] <nevcairiel> much better than intel or amd
[16:56:01 CEST] <andrey_turkin> still, there should be some impact on dev->host copy at least; that part has to be synchronized before rest of the pipeline can go
[16:57:17 CEST] <andrey_turkin> also if your goal is not to transcode single file as soon as possible but to do N real-time transcodings in parallel, priorities shift. I don't really care if on-GPU transcoding speed is the same as on-CPU because I've got free CPU which can do some other work
[16:58:16 CEST] <nevcairiel> the sysmem intermediate step doesnt even use much cpu power
[17:00:19 CEST] <BtbN> both cuvid and nvenc in ffmpeg are using delays to streamline the copying process
[17:01:19 CEST] <andrey_turkin> yeah, I just realized that. As long as encoding runs asynchronously enough and slowly enough, everything else will be absorbed
[17:06:55 CEST] <cone-446> ffmpeg 03Michael Niedermayer 07master:134cba728bc6: Seperate x264rgb encoder and only enable when its actually supported
[17:30:16 CEST] <BtbN> andrey_turkin, btw., for your systems without cuda, have you tried just generating a delay-load implib with dlltool?
[17:30:49 CEST] <BtbN> Or are those linux systems?
[17:36:09 CEST] <andrey_turkin> Windows. I have not
[17:36:28 CEST] <andrey_turkin> it is an interesting idea
[17:37:15 CEST] <BtbN> Having such a feature on linux would be very nice, too.
[17:37:48 CEST] <BtbN> Some tool that generates a static lib from a shared like, that exposes the same symbols, but redirects through dlopen/dlsym on first call
[17:38:13 CEST] <BtbN> I wonder how hard that is to make, without knowledge about the parameters.
[17:38:40 CEST] <andrey_turkin> But I'd rather have an error if someone tried to use cuda when it's not present, and not a crash
[17:39:15 CEST] <BtbN> add some more intelligence to the cuInit function then
[17:40:48 CEST] <andrey_turkin> I am ok with current state of things there I have to patch ffmpeg. It's a small patch (it is a bit different from one I sent to ML - function pointers are in CUDA's hw_device_ctx
[17:43:12 CEST] <andrey_turkin> I'm not really familiar with delay load but IIRC there is a linker switch to mark a library as delay-load
[17:43:19 CEST] <BtbN> on windows.
[17:43:22 CEST] <andrey_turkin> yes
[17:43:23 CEST] <BtbN> for MSVC
[17:43:25 CEST] <BtbN> Not on linux
[17:43:39 CEST] <BtbN> You can have lazy symbols, but not lazy libraries.
[17:44:03 CEST] <BtbN> The same principle MSVC uses for its delay-load would work on linux though. Nobody has made it yet though.
[17:44:30 CEST] <andrey_turkin> We don't yet target Linux+Nvidia
[17:44:42 CEST] <andrey_turkin> or any hw acceleration for that matter
[17:44:44 CEST] <BtbN> It basically auto-generates a shim .lib, with all the symbols, but using LoadLibrary/GetProcAddress on call
[17:44:58 CEST] <BtbN> I don't see why that wouldn't work on linux.
[17:45:04 CEST] <andrey_turkin> it probably would
[17:45:15 CEST] <BtbN> The only issue I can thing of is forwarding a function call without having a clue about the parameters/return type.
[17:45:32 CEST] <BtbN> Probably needs some assembler magic
[17:46:13 CEST] <BtbN> like, just jumping to the final address instead of doing a full call
[17:47:37 CEST] <andrey_turkin> you don't really need to know parameters or return type. Only thing you get from dlsym is an address; the shim just has to leave things on stack as they were and jump to the intended target
[17:47:55 CEST] <andrey_turkin> things become problematic if you need to return an error
[17:48:05 CEST] <BtbN> and then replace its own exported symbol with that address, so future calls don't cause indirection
[17:48:30 CEST] <BtbN> Yes, if you want to enhance the functionality, you need knowledge about the parameters.
[17:48:47 CEST] <BtbN> So a delay-load coulda would only be partially auto-generated.
[17:48:52 CEST] <BtbN> *CUDA
[17:49:08 CEST] <andrey_turkin> so, patch is easier )
[17:49:53 CEST] <BtbN> The "fun" part with CUDA is, that the functions are stdcall
[17:50:07 CEST] <andrey_turkin> anyway, I am excited to see all the cuvid/nvenc/vaapi/qsv work being done. I started looking at HW accel options this January and things were so much worse
[17:50:08 CEST] <BtbN> But that also shouldn't matter if you just plain jmp to the original function
[17:50:40 CEST] <BtbN> just have to be carefull what registers you mess with
[17:51:04 CEST] <cone-446> ffmpeg 03Michael Niedermayer 07master:3bc060f36676: doc/examples/transcoding: Use the decoders pixel format if the encoder does not list which are supported
[17:51:05 CEST] <cone-446> ffmpeg 03Michael Niedermayer 07master:13aae86a2b65: avutil/frame: Assert that width/height/channels is 0 for the destination of av_frame*_ref()
[17:58:39 CEST] <BtbN> Hm, I guess just exporting everything as cdecl should work? As the caller does all work.
[18:00:36 CEST] <andrey_turkin> you also has to modify headers for that
[18:00:44 CEST] <BtbN> hm?
[18:01:23 CEST] <andrey_turkin> I mean it would work if your target calls already were cdecl
[18:01:34 CEST] <BtbN> If the shim function itself is cdecl, and then just does a plain asm("jmp ..."); it shouldn't matter if the caller expects stdcall
[18:01:44 CEST] <andrey_turkin> it does
[18:01:59 CEST] <BtbN> cdecl doesn't insert any cleanup into the callee
[18:02:32 CEST] <BtbN> so just a plain forward should be fine, as the jumped to function does the stdcall cleanup
[18:02:48 CEST] <BtbN> You just have to be extremely carefull not to mess with the stack or registers
[18:04:32 CEST] <andrey_turkin> it doesn't really matter whether wrapper is cdecl or stdcall - you can't really expect it not to mess with the stack.
[18:06:12 CEST] <BtbN> of course you can, that's the only way this works at all
[18:07:51 CEST] <andrey_turkin> only sure way this works if "trampoline" is written in assembler. Otherwise you have to worry about compiler saving some registers to stack, or setting up new stack frame, or just allocating space for something. And cdecl and stdcall are identical for function without any arguments
[18:08:33 CEST] <andrey_turkin> under some circumstances you can even trick compiler into making that jmp for you in plain C, but that is really fragile
[18:09:09 CEST] <nevcairiel> There is some pragmas to tell it to do that
[18:09:20 CEST] <nevcairiel> No clue how widely supported
[18:09:26 CEST] <andrey_turkin> naked?
[18:11:37 CEST] <andrey_turkin> well maybe. If trampoline is declared naked, and all the processing is in separate function, and if you can be sure call to actual API will be tail-call optimized
[18:13:04 CEST] <andrey_turkin> or whip out clang and start parsing API declarations and generating compatible wrappers for all the functions
[18:16:40 CEST] <BtbN> Well, MSVC/dlltool is able to generate a perfect delay-load shim with just a def file.
[18:18:45 CEST] <andrey_turkin> sure
[18:19:15 CEST] <BtbN> It shouldn't be too hard to get it to do that for linux
[18:19:18 CEST] <andrey_turkin> they probably use that small assembly trampoline (which is a good idea to do)
[18:22:38 CEST] <BtbN> https://github.com/zerovm/binutils/blob/master/binutils/dlltool.c#L2900
[18:25:22 CEST] <andrey_turkin> look at lines 500-580
[18:25:59 CEST] <andrey_turkin> these are exactly those assembly thunks for standard import and for delayload import
[18:28:00 CEST] <andrey_turkin> and then there is delayimp.lib which does all the work on actually loading the library
[19:02:35 CEST] <omerjerk> I need to merge some old codebase into the new repository.
[19:02:51 CEST] <omerjerk> So, in the new one, there's no https://github.com/justinruggles/FFmpeg-alsenc/blob/alsenc/libavcodec/dsputil.h
[19:03:08 CEST] <omerjerk> I'm sure there will be some alternatives to this API.
[19:03:17 CEST] <omerjerk> Any idea whatsoever anyone ?
[19:08:17 CEST] <jamrial> omerjerk: dsputil was split into several different dsp modules. if what you need doesn't fit into any existing one, just create a new one for your codec
[19:46:00 CEST] <cone-446> ffmpeg 03Thomas Mundt 07master:2e395bbccffe: avfilter/vf_colormatrix: increase precision
[19:51:51 CEST] <omerjerk> jamrial: thanks. I just needed lpc_compute_autocorr, and grep tells that it's available inside lpc.c file.
[20:01:39 CEST] <cone-446> ffmpeg 03Thomas Mundt 07master:a0a4a4b37010: avfilter/vf_colormatrix: add bt.2020 colorspace
[20:22:10 CEST] <cone-446> ffmpeg 03Michael Niedermayer 07master:87c53e53545f: avcodec/mpeg4videodec: Print low_delay value with -debug 1 in decode_vol_header()
[20:25:04 CEST] <omerjerk> what is preffered between av_log and dprintf for normal logging ?
[20:25:57 CEST] <omerjerk> or should I go with av_dlog ?
[20:30:10 CEST] <BtbN> hm, i think those cuvid patches are somewhat ready now. Will send them to the ML for a first round of comments.
[20:35:34 CEST] <rcombs> omerjerk: never [fd]?printf in library code
[20:36:26 CEST] <omerjerk> I need to merge some old code from 2010 in to the current codebase.
[20:36:33 CEST] <omerjerk> So I was cleaning it up.
[20:36:37 CEST] <rcombs> use av_log
[20:36:43 CEST] <rcombs> (av_dlog is deprecated)
[20:36:44 CEST] <omerjerk> I replaced dprintf with av_dlog
[20:36:51 CEST] <omerjerk> oh okay.
[20:37:04 CEST] <omerjerk> what should be the argument of the log level ?
[20:37:12 CEST] <omerjerk> AV_LOG_DEBUG I guess ?
[20:37:41 CEST] <rcombs> whatever level is applicable to the log line in question
[20:37:49 CEST] <omerjerk> okay!!
[22:41:18 CEST] <cone-446> ffmpeg 03Michael Niedermayer 07master:f730367a60e3: avcodec/mpeg4videodec: Fix default low_delay flag value if not coded
[00:00:00 CEST] --- Mon Jun 6 2016
More information about the Ffmpeg-devel-irc
mailing list