[Ffmpeg-devel-irc] ffmpeg-devel.log.20190902
burek
burek at teamnet.rs
Tue Sep 3 03:05:05 EEST 2019
[00:42:04 CEST] <xmichael> Hello all
[00:42:10 CEST] <xmichael> Hello all, I have run into an issue when using HLS and fragmentation, and also I think a bug with respect to the use_temp_file parameter
[00:42:14 CEST] <xmichael> I posted about it here on reddit https://www.reddit.com/r/ffmpeg/comments/cyahv0/fragmented_hls_outputting_as_vod_not_event/
[00:42:58 CEST] <xmichael> I've made a really budget code change to work around the use_temp_file issue. I do not think a temp file should be exclusively enforced if the output is to file. Many file systems support reading and writing at the same time
[00:43:29 CEST] <xmichael> The ability to write to disk and read the file being created is desirable for low latency goals
[00:48:45 CEST] <nicolas17> xmichael: event vs VOD is written into the playlist header, what does your playlist say?
[00:49:59 CEST] <xmichael> #EXTM3U#EXT-X-VERSION:7#EXT-X-TARGETDURATION:2#EXT-X-MEDIA-SEQUENCE:0#EXT-X-PLAYLIST-TYPE:EVENT#EXT-X-MAP:URI="init.mp4"#EXT-X-DISCONTINUITY
[00:50:23 CEST] <nicolas17> so it *doesn't* think you're trying to make a vod
[00:50:53 CEST] <xmichael> Correct. I've updated from git without any luck
[00:51:19 CEST] <xmichael> Err, sorry let me restate that. It gets that I want an event, yet no matter the flags the playlist only appends
[00:51:51 CEST] <xmichael> current command is ffmpeg -y -loglevel info -i rtsp://192.168.1.241 -g 60 -c libx264 -tune zerolatency \-hls_segment_type fmp4 \-hls_time 2 \-hls_list_size 3 \-hls_flags delete_segments+append_list+split_by_time \-hls_playlist_type event /opt/www/east/east.m3u8
[00:52:11 CEST] <nicolas17> what do you mean "it only appends"? isn't that what append_list is for?
[00:53:35 CEST] <xmichael> with or without append_list as a flag, the playlist grows indefinitely
[00:53:54 CEST] <nicolas17> huh
[00:54:00 CEST] <nicolas17> "hls_playlist_type event" sets hls_list_size to 0
[00:54:21 CEST] <nevcairiel> EVENT type playlists are defined that way, nothing can ever be deleted from them
[00:54:29 CEST] <nevcairiel> if you want stuff to be removed, dont use event
[00:54:40 CEST] <nicolas17> so there's three types?
[00:54:43 CEST] <nicolas17> vod, event, and nothing?
[00:54:58 CEST] <xmichael> I am only aware of event and vod, according to the documentation. I don't see a 3rd type.
[00:55:22 CEST] <nevcairiel> a live sliding-window playlist h as no type entry
[00:55:39 CEST] <xmichael> It is my understanding that VOD would have a playlist that grows until the entire transcode is created, and a EVENT would be a live stream that circles through a playlist using the specified number of segments defined by hls_list_size
[00:55:48 CEST] <nevcairiel> thats wrong
[00:55:53 CEST] <nicolas17> xmichael: looks like what you want is no type at all
[00:56:14 CEST] <xmichael> I see in the documentation what nevcairiel just noted, hls_playlist_type_event sets hls_list_size to 0, but that doesn't seem to make sense with my understand at least (:
[00:56:36 CEST] <nevcairiel> a VOD playlist has to be complete 100% before its being served to the user, a EVENT type playlist can still be growing while you already serve it, but you can never remove from it
[00:56:49 CEST] <nevcairiel> a "live" playlist has no type indicator, and you can remove from it
[00:57:23 CEST] <xmichael> thank you folks, I did not see that in the documentation.
[00:57:36 CEST] <xmichael> I don't quite understand what an EVENT is, but fair enough, without that defined I am good
[00:57:40 CEST] <nevcairiel> serving HLS assumes that one already knows the basics of HLS playlists
[00:58:00 CEST] <xmichael> Ok, so my second discovery and my little hack with respect to the use_temp_file flag
[00:58:11 CEST] <xmichael> The evaluator says that if it is outputting to file, then always use temp file
[00:58:43 CEST] <xmichael> As I mentioned in my initial question as well, I don't think that is the most desirable way to operator. No reason I can think that a temp file should be forced simply because it is writing to a file system???
[00:59:26 CEST] <nevcairiel> I'm not quite sure a HLS client would necessarily like getting a partial segment file
[00:59:32 CEST] <nicolas17> looks like it uses a temp file for non-VOD by default
[01:00:03 CEST] <nevcairiel> i believe the segment is only added to the playlist once its complete, so its mood anyway
[01:00:28 CEST] <xmichael> /int use_temp_file = is_file_proto && ((hls->flags & HLS_TEMP_FILE) || hls->master_publish_rate);
[01:00:36 CEST] <xmichael> is_file_proto is hit in all cases
[01:00:57 CEST] <nicolas17> that's && so it has to meet the second condition too
[01:02:51 CEST] <xmichael> Sure. let me try once more without my modified code
[01:03:12 CEST] <xmichael> and of course without hls_playlist_type event :)
[01:05:01 CEST] <xmichael> It looks like even without hls_playlist_type defined
[01:05:04 CEST] <xmichael> [hls @ 0x5c56900] Opening '/opt/www/east/east2.m4s' for writingA dup=28 drop=5 speed=1.13x[hls @ 0x5c56900] Opening '/opt/www/east/east.m3u8.tmp' for writing
[01:05:23 CEST] <nicolas17> I don't get why you don't want temp files
[01:05:37 CEST] <nevcairiel> using it for the playlist itself is rather important, otherwise you might get a partial playlist as its being updated, and that would be very bad
[01:05:39 CEST] <xmichael> I don't want a temp file, because I want to be able to serve the file that is being written if necessary
[01:05:47 CEST] <nevcairiel> you can't anyway
[01:05:52 CEST] <xmichael> why not?
[01:05:55 CEST] <nicolas17> the playlist won't reference those files until they are done being written anyway
[01:06:07 CEST] <nevcairiel> because serving a partial segment would be illegal according to HLS
[01:06:33 CEST] <nevcairiel> segment-based streaming formats are not ideal for low latency due to that fact
[01:07:31 CEST] <xmichael> the file being written to disk is in the play list
[01:07:48 CEST] <xmichael> I've just made the segment size 30 seconds and I can see the file actively being written in the playlist
[01:08:16 CEST] <nicolas17> so the playlist references a file that doesn't exist?
[01:09:14 CEST] <xmichael> the play list appears to be updated to include the new file, and then the file is created and data is being written to it
[01:09:27 CEST] <xmichael> the file is 100% in the list while it is still be written to
[01:09:48 CEST] <nicolas17> you mean while its corresponding .tmp file is being written to?
[01:11:05 CEST] <xmichael> uhh
[01:12:09 CEST] <xmichael> Not using the code in github no. However if I force use_temp_file to 0, the file is in the playlist
[01:12:19 CEST] <nicolas17> if ffmpeg is using temp files, it should be creating a .tmp, writing to it, and then renaming it to remove the .tmp extension
[01:12:32 CEST] <nicolas17> you'd never see a partial .ts file
[01:12:54 CEST] <xmichael> but that is what I want to do :)
[01:13:08 CEST] <nevcairiel> you cant serve partial files for HLS, its simply not going to be valid. what if a client downloads it, is it supposed to check back later if it grew after it got it? that just doesnt make any sense
[01:13:14 CEST] <nevcairiel> its just not possible
[01:13:15 CEST] <nicolas17> my question is if you're seeing the .ts referenced in the playlist while it doesn't exist (because it didn't get renamed yet)
[01:13:37 CEST] <xmichael> nevcairiel; yes I serve the file using http chunk-encoding. So the client will read until EOF
[01:13:58 CEST] <nicolas17> you don't know if the webserver will hit EOF and terminate the transfer before ffmpeg finishes writing the file
[01:13:59 CEST] <nevcairiel> but neither the webserver nor the client can know if the segment is complete or not
[01:14:02 CEST] <nevcairiel> so that doesnt work.
[01:14:07 CEST] <xmichael> nicolas17: not in the code from github, only in my hacked code where I force use_temp_file = 0
[01:14:15 CEST] <nicolas17> so you broke it :P
[01:14:27 CEST] <xmichael> hahah
[01:15:07 CEST] <xmichael> You are both right and I am wrong, however I still want to do it. The logic I have is that serving a partially done file is still more desirable than the player having nothing to play
[01:15:20 CEST] <nicolas17> playlists don't reference partial files, or files that don't exist because they weren't renamed from .tmp yet, guess your half-change broke that invariant
[01:15:32 CEST] <xmichael> Since playback is in realtime, and the source is real time. The player can't really beat the encoder at creating the file.
[01:16:12 CEST] <nevcairiel> playing a partial file is absolutely worse then playing nothing, because if the player doesnt get a complete file, it m ight just miss content entirely
[01:16:46 CEST] <nevcairiel> downloads from the webserver are not in realtime, they go as fast as the browser can do it
[01:17:11 CEST] <nevcairiel> so if it creates a new file with one frame in it only, and a browser would g rab it, it would not come back and see if there is more to grab later
[01:17:14 CEST] <nevcairiel> it would just bail
[01:17:16 CEST] <xmichael> umn, right that little detail the file will be downloaded as fast as possible
[01:17:43 CEST] <nicolas17> it would also think it's time to get the next one and it would *also* be a partial file
[01:17:52 CEST] <xmichael> I guess buffering my chunk-encoded transfer wouldn't be to stellar either hah
[01:18:08 CEST] <nicolas17> (only way for the client to get "in sync" and start receiving complete files is if it pauses for a second)
[01:18:40 CEST] <nevcairiel> one could do a very low-latency live streaming thing with HLS if one would control the encoder, segmenter and web server in one piece of software, so that you can make the webserver block as long as you know there is more content coming
[01:18:41 CEST] <xmichael> I guess my grand dream of providing slightly lower latency HLS by making that last file available if necessary might not be happening. I felt that chunk-encoding was going to be my answer as then I can deliver a file of unknown size
[01:18:47 CEST] <nevcairiel> but alas, you can't do that with a lot more work :p
[01:19:00 CEST] <nevcairiel> without*
[01:19:33 CEST] <xmichael> I did make the webserver, there are not really any off the self webservers that allow you to serve files already open using chunk-encoding
[01:19:45 CEST] <nicolas17> if you want lower latency, use a smaller segment; if you want <2s latency, HLS might not be the right solution ^^
[01:20:15 CEST] <nicolas17> man I remember when Twitch had a latency of like 30s
[01:20:30 CEST] <xmichael> See I am ok with ~4 seconds of latency, however three x 2 second segments don't always work. Simply because the 3rd file is not available
[01:20:42 CEST] <cone-529> ffmpeg 03Raphaël Zumer 07master:8821d1f56e18: avutil/pixfmt: Add EBU Tech. 3213-E AVColorPrimaries value
[01:20:42 CEST] <cone-529> ffmpeg 03Raphaël Zumer 07master:08dfd57fd83a: avfilter: Support EBU Tech. 3213-E primaries values
[01:20:42 CEST] <cone-529> ffmpeg 03Raphaël Zumer 07master:a12b629ae129: avcodec: Support EBU Tech. 3213-E primaries values
[01:20:53 CEST] <xmichael> This was my great idea, 3 x 2 second segments produced from ffmpeg, and use chunk encoding to server the 3rd file as it is being written if necessary
[01:22:11 CEST] <xmichael> Actually, I would technically slow the file down on the transfer. Since I know the approximate size by evaluating the other files on disk.
[01:22:19 CEST] <xmichael> thought that is getting pretty hacky (:
[01:23:31 CEST] <nicolas17> if you wrote your own webserver, you can make requests to 123.ts read from 123.ts.tmp instead, and you can wait for the rename as a reliable indication that it's complete
[01:23:43 CEST] <nicolas17> but I still think this is all a bad idea :P
[01:24:28 CEST] <xmichael> ohh, that is actually kind of brilliant
[01:25:41 CEST] <xmichael> I use a loop to read the files while (bytesRemaining > 0) currently
[01:26:20 CEST] <xmichael> thank you I might not have to abandon this crazy idea just yet! ;D
[01:26:20 CEST] <BradleyS> i'm not sure why you'd need to serve the file as it's being written, just keep writing chunks of a few seconds or more, and serve those as they become available and are needed
[01:27:01 CEST] <BradleyS> akamai chops up streams into multiple smaller .ts, not sure if it's mpeg-dash specifically
[01:27:13 CEST] <xmichael> bradleys: the issue is the latency. realistically to serve globally with any level of reliability you need 3 segments 3 seconds long
[01:27:38 CEST] <nicolas17> tbh hls_list_size 3 seems low
[01:28:07 CEST] <xmichael> right, I've experimented with 1 second segments and 5 segments total, but still not ideal
[01:28:17 CEST] <nicolas17> the reason to have few segments in the playlist is to avoid the playlist itself getting too large
[01:28:36 CEST] <nicolas17> pretty sure clients will try to start from the last segment anyway
[01:28:54 CEST] <xmichael> depending on the player, they often only play the last available file
[01:29:53 CEST] <nevcairiel> the spec at least says that in live playlists you should start at the end
[01:30:15 CEST] <xmichael> -hls_segment_type fmp4
[01:30:20 CEST] <nevcairiel> so making it bigger has no real downside except perhaps allow clients that have a bit of buffering to be a bit more graceful
[01:30:44 CEST] <xmichael> this type still seems to be governed by they key frame interval
[01:33:40 CEST] <nevcairiel> HLS mandates that every segment must start with a keyframe
[01:34:31 CEST] <xmichael> Ok, so one last crazy idea then. Why can't my web server make a play list with a single file and a length of 1 million seconds. When a player comes along I feed using chunked encoding the latest data produced by ffmpeg into that session?
[01:35:11 CEST] <xmichael> open ffmpeg's playlist, grab the next file in the list and output to the player, append, append, append the same session using chunked-encoding. the player never disconnects
[01:35:15 CEST] <nicolas17> if you do that you don't need HLS
[01:35:18 CEST] <nicolas17> or a playlist
[01:35:28 CEST] <BradleyS> i meant have the entire payload be 3-second long .ts
[01:35:42 CEST] <nevcairiel> many HLS devices will try to cache segments, and will absolutely die on such a hack
[01:35:46 CEST] <BradleyS> or 2x3, then 10x10, then ...
[01:36:20 CEST] <BradleyS> so you're not reading bytes from an open file as they hit the disk
[01:36:55 CEST] <xmichael> the cache segments, that will be the burn... grr
[01:37:06 CEST] <xmichael> bradleys: I'm not sure what you mean?
[01:37:27 CEST] <nevcairiel> if you want low latency just use progressive mpegts streaming
[01:37:45 CEST] <BradleyS> "use chunk encoding to server the 3rd file as it is being written if necessary"
[01:37:54 CEST] <BradleyS> seems brittle
[01:37:56 CEST] <xmichael> nevcairiel: is there a way to do that without getting into the mess of WebRTC?
[01:38:00 CEST] <BtbN> Or copy what Twitch did to HLS. They do HLS with sub second streamer to viewer latency
[01:39:07 CEST] <BtbN> iirc Mixer just opens a WebSocket connection to the server, and the server sends some (fragmented) mp4 stream the Browser can feed directly to the Media API
[01:39:50 CEST] <nevcairiel> HLS and DASH had one primary design goal, easy CDN support - not low latency
[01:40:42 CEST] <BtbN> Twitch basically just adds 3 "future segments" to the playlist, with special tags for the player. And the player then goes ahead and tried to pre-load them early
[01:40:58 CEST] <xmichael> BtbN: That is pretty easy to do using jsmpeg, however with h264 that because next to impossible
[01:41:07 CEST] <BtbN> what?
[01:41:21 CEST] <xmichael> Twitch as I understand it, serves the last file as it is being written
[01:41:37 CEST] <BtbN> I doubt Twitch writes any files anywhere.
[01:41:48 CEST] <nevcairiel> if you control both server and client, you can get away with a bunch of extra nonsense
[01:41:50 CEST] <BtbN> They are doing some heavy magic in their CDN
[01:43:27 CEST] <xmichael> Thanks everyone for the comments, I'm sad now though :b
[01:44:18 CEST] <xmichael> It just feels there should be an easier way to serve relatively low latency video without so many hoops. The commercial options all tend to have some tie in with WebRTC but the RFC 's associated with it are just a mess
[01:44:26 CEST] <nicolas17> BtbN: I think they do write files, the player can get like 30 seconds behind
[01:44:48 CEST] <BtbN> nicolas17, I'd assume their server just has an in-memory buffer and serves from there
[01:44:53 CEST] <BtbN> there is simply no time to write to files
[01:45:16 CEST] <xmichael> why would there not be time to write to files?
[01:45:25 CEST] <nicolas17> actually longer, users can create clips from the last minute or two of the live stream
[01:45:59 CEST] <BtbN> Because they manage to have streamer to viewer latency of less than one second
[01:46:15 CEST] <BtbN> That means the server to browser latency must be closer to <0.2 seconds, which is insane
[01:47:31 CEST] <nicolas17> segments are 2s long
[01:47:43 CEST] <nicolas17> and yeah now I see they have an "EXT-X-TWITCH-PREFETCH" thing
[01:47:49 CEST] <nicolas17> with future segments
[01:47:58 CEST] <xmichael> well Skype and the like do it without much effort. What twitch doesn't isn't grown breaking, but what I can't understand is how they use some variation of HLS to do it
[01:48:08 CEST] <BtbN> Skype does not use HLS lol
[01:48:16 CEST] <BtbN> That's WebRTC and custom protocols
[01:48:19 CEST] <nicolas17> stuff like Skype can even be p2p
[01:48:45 CEST] <nicolas17> you don't have to deal with CDNs for one-to-one video
[01:48:47 CEST] <BtbN> rabb.it also uses WebRTC, with quite impressive results
[01:48:55 CEST] <nicolas17> RIP rabb.it
[01:58:21 CEST] <xmichael> kast.gg eh
[01:59:08 CEST] <xmichael> that has me way of topic.. I just wanted to make a nice little open source solution to provide slightly lower latency hls
[01:59:30 CEST] <xmichael> I thought I had it all beat with the idea of making the last file available thanks to chunk encoding...
[07:34:47 CEST] <cone-046> ffmpeg 03Limin Wang 07master:75aea52a1051: lavf/hlsenc: refine the get_relative_url function to avoid extra malloc for relation path
[07:36:45 CEST] <cone-046> ffmpeg 03Steven Liu 07master:2183def1a54c: avfilter/vf_delogo: support expr in delogo filter
[07:42:20 CEST] <cone-046> ffmpeg 03Steven Liu 07master:2a21487b9ea1: avformat/dashdec: start from the root uri when baseURL is start with '/'
[10:53:31 CEST] <cone-046> ffmpeg 03Nick Renieris 07master:40abff05d245: lavc/mjpegdec: Decode Huffman-coded lossless JPEGs embedded in DNGs
[10:53:32 CEST] <cone-046> ffmpeg 03Nick Renieris 07master:c31c70892978: lavc/tiff: Decode embedded JPEGs in DNG images
[10:53:33 CEST] <cone-046> ffmpeg 03Nick Renieris 07master:4c8c4f2d43d5: lavc/tiff: Convert DNGs to sRGB color space
[10:53:34 CEST] <cone-046> ffmpeg 03Nick Renieris 07master:6763192cff8f: lavc/tiff: Apply color scaling to uncompressed DNGs
[10:53:35 CEST] <cone-046> ffmpeg 03Nick Renieris 07master:03f95403eb11: lavc/jpegtables: Handle multiple mappings to the same value
[10:53:36 CEST] <cone-046> ffmpeg 03Nick Renieris 07master:f98a8666de6c: lavc/tiff: Fix edge case with full-length/width tiles
[10:53:37 CEST] <cone-046> ffmpeg 03Nick Renieris 07master:33b6752a708f: lavc/tiff: Don't apply strips-related logic to tiled images
[10:53:38 CEST] <cone-046> ffmpeg 03Nick Renieris 07master:c510ed2ee8b3: lavc/tiff: Force DNG pixel data endianness on an edge case
[10:53:39 CEST] <cone-046> ffmpeg 03Nick Renieris 07master:a75a9e8f64ec: lavc/mjpegdec: Enable decoding of single-component bayer images
[10:53:40 CEST] <cone-046> ffmpeg 03Nick Renieris 07master:31acdf4351a1: lavc/tiff: Support decoding of DNGs with single-component JPEGs
[10:53:41 CEST] <cone-046> ffmpeg 03Nick Renieris 07master:c44aa7f1761b: lavc/tiff: Decode 10-bit and 14-bit DNG images
[10:53:42 CEST] <cone-046> ffmpeg 03Nick Renieris 07master:fcf0ebc4a95e: lavc/mjpegdec: Skip unknown APPx marker on bayer images
[10:53:43 CEST] <cone-046> ffmpeg 03Nick Renieris 07master:9280e4b2918c: lavc/tiff: Support DNGs with striped (non-tiled) JPEGs images
[10:53:44 CEST] <cone-046> ffmpeg 03Nick Renieris 07master:15776ca18298: lavc/tiff: Default-initialize WhiteLevel DNG tag value
[10:53:45 CEST] <cone-046> ffmpeg 03Nick Renieris 07master:63689b16ad72: lavc/tiff: Enable decoding of LinearRaw images
[10:53:46 CEST] <cone-046> ffmpeg 03Paul B Mahol 07master:d7529b03bac4: avcodec/tiff: set color_trc, remove sRGB conversion
[10:53:47 CEST] <cone-046> ffmpeg 03Paul B Mahol 07master:cae29820777f: avcodec/tiff: rewrite lut handling
[10:53:48 CEST] <cone-046> ffmpeg 03Paul B Mahol 07master:30f4464e220b: avfilter/vf_v360: rename fb format to barrel
[10:53:49 CEST] <cone-046> ffmpeg 03Paul B Mahol 07master:6037dfa47ad1: avfilter/vf_v360: extend description of eac format
[10:53:50 CEST] <cone-046> ffmpeg 03Paul B Mahol 07master:067e6323492b: avfilter/vf_v360: fix some small code style issues
[10:53:51 CEST] <cone-046> ffmpeg 03Paul B Mahol 07master:e0fab59624c6: avfilter/vf_v360: set much smaller limit to w/h
[12:04:25 CEST] <J_Darnley> What restrictions does h264 put on slice shape?
[12:04:59 CEST] <J_Darnley> Isn't there something about them being rectangualr?
[12:05:18 CEST] <J_Darnley> Whole rows of macroblocks?
[12:08:21 CEST] <jkqxz> In baseline profile they can be absolutely anything with no restrictions (any subset of macroblocks in the frame).
[12:08:34 CEST] <thardin> even non-contiguous?
[12:08:40 CEST] <jkqxz> Yes.
[12:08:48 CEST] <J_Darnley> lol
[12:08:52 CEST] <thardin> doesn't being that general waste bits?
[12:09:23 CEST] <J_Darnley> I'm sure this is constrained baseline or the usual supersets (main, high, etc)
[12:09:29 CEST] <jkqxz> In other profiles they have to be in raster scan order.
[12:10:08 CEST] <thardin> http://iphome.hhi.de/wiegand/assets/pdfs/DIC_H264_07.pdf slides 11-12
[12:10:13 CEST] <jkqxz> thardin: It's generally recommended to make your slices vaguely sane, but the standard lets you do stupid stuff if you want.
[12:10:31 CEST] <jkqxz> This is probably the main reason why noone supports baseline profile, of course.
[12:13:33 CEST] <kierank> J_Darnley: afaik no restrictions
[12:14:03 CEST] <kierank> hence why I can only support rectangular slices I think
[12:14:20 CEST] <kierank> which I think is fine, it never worked before
[12:15:29 CEST] <jkqxz> The DVD/Bluray and I think some broadcast standards require slices to be rectangular.
[12:16:43 CEST] <nevcairiel> baseline profile is full of so much insanity
[12:24:50 CEST] <J_Darnley> It must be restrictions from other sources that I am thinking of.
[12:24:57 CEST] <J_Darnley> not the spec
[12:57:17 CEST] <kierank> J_Darnley: I left my notebook at home but I drew out all the combinations that were possible
[12:57:28 CEST] <kierank> the only type we are sure to support is rectangular
[12:57:39 CEST] <kierank> for any other combination you can have a slice along the row that you don't know has finished yet
[13:05:16 CEST] <J_Darnley> sample with 4 bframes
[13:05:23 CEST] <J_Darnley> let us see how it decodes
[13:07:32 CEST] <J_Darnley> segfault
[13:07:33 CEST] <J_Darnley> lol
[13:08:14 CEST] <J_Darnley> memcpy in the test program
[13:08:38 CEST] <J_Darnley> oh, yes...
[13:08:50 CEST] <J_Darnley> I'd better find the cif file and use that
[13:08:56 CEST] <J_Darnley> not 4cif
[13:12:04 CEST] <J_Darnley> thanks media.xiph.org for sending that y4m as test/plain
[13:12:11 CEST] <J_Darnley> *text/plain
[13:15:47 CEST] <kierank> oh yeah that's so annoying
[13:16:05 CEST] <kierank> I have trolled xipjh
[13:17:28 CEST] <J_Darnley> out of 10 frames I got 2 decoded
[13:18:38 CEST] <J_Darnley> frame 0 matches (as it should being an iframe
[13:18:55 CEST] <J_Darnley> frame 1 is blank green (zero filles I guess)
[13:18:57 CEST] <kierank> did you use --slice-threads?
[13:19:08 CEST] <J_Darnley> yes
[13:19:27 CEST] <kierank> we probably could just warn if it has bframes
[13:19:36 CEST] <kierank> and draw_horiz is enabled
[13:20:28 CEST] <J_Darnley> frame 2 has misplaced macroblocks and is out of order
[13:21:04 CEST] <J_Darnley> was frame 1 for the normal received frame
[13:21:53 CEST] <J_Darnley> and the md5 sum was the same
[13:27:12 CEST] Action: J_Darnley will check whether normal decoding works
[13:32:47 CEST] <kierank> I think non reordered worked
[13:33:41 CEST] <J_Darnley> I meant brames with normal frame decoding, just to ensure nothing else broke
[13:34:06 CEST] <kierank> Ah
[13:34:27 CEST] <J_Darnley> so I just run fate in the end
[13:56:30 CEST] <vel0city> durandal_1707: nice, so it was possible (and easy) to auto apply colorspace conversion after all
[13:56:57 CEST] <vel0city> durandal_1707: about the LUT changes, why remove the size & offset checks?
[13:58:03 CEST] <durandal_1707> not needed now
[13:58:32 CEST] <vel0city> oh, right, you're not iterating with count
[13:58:36 CEST] <vel0city> cool
[13:59:27 CEST] <durandal_1707> and there is bytestream2 used
[14:01:45 CEST] <durandal_1707> note that most players ignore color trc from avframes
[14:02:03 CEST] <JEEB> "what? it's not gamma?! what is this blasphemy!"
[14:02:06 CEST] <durandal_1707> at least ffplay
[14:03:06 CEST] <JEEB> you could in theory utilize zscale or something to convert it
[14:04:07 CEST] <durandal_1707> swscale is so old and abandoned
[14:04:46 CEST] <JEEB> it was made in early 2000s :P also it has the stuff that nobody would probably want to reimplement in alternative libraries (the less common things like palette formats etc)
[14:08:23 CEST] <Lynne> I was wanting to write a swscale replacement that worked on hardware frames as well, but kind of gave up on it
[14:19:01 CEST] <durandal_1707> why you gave up on it?
[14:21:46 CEST] <BtbN> swscale to work on arbitrary hardware frames seems like quite a task, given that there's like half a dozen entirely different kinds
[14:22:10 CEST] <nevcairiel> software algorithms in mapped hardware frames would also be icnredibly slow
[14:22:23 CEST] <nevcairiel> memory is not made equal afterall
[14:23:07 CEST] <BtbN> I wonder if for the pad/crop/transpose CUDA filters it would make sense to put them into the scale file.
[14:23:34 CEST] <BtbN> Cause otherwise, if each gets its own entire vf_*_cuda.c file, it will duplicate massive amounts of code
[14:25:13 CEST] <durandal_1707> file != filter
[14:25:35 CEST] <BtbN> Yes, it's more of a stylistic question really
[14:25:47 CEST] <BtbN> Potentially one could also write a CUDA-filter-helper, that abstracts away most of the boilerplate
[14:27:30 CEST] <Lynne> it would only work on opencl and vulkan frames, because correctness would be very important
[14:27:48 CEST] <Lynne> and it wouldn't have been fast either as I'd do everything in XYZ colorspace
[14:27:51 CEST] <BtbN> Are there even "OpenCL frames"?
[14:28:28 CEST] <Lynne> yeah? they're pretty normal, images with some memory bound to them
[14:28:47 CEST] <BtbN> I remember the OpenCL filter all doing upload->proc->download
[14:38:47 CEST] <Lynne> that must be the old opencl filters you're talking about, the ocl hwcontext does uploading/downloading/mapping, its normal
[14:48:26 CEST] <Lynne> the cuda->vulkan interop could work though, if the cuda context allocates all memory with vulkan and then stores a ref somewhere so it can map it back
[14:49:13 CEST] <Lynne> seems to be how nvidia expects you to do it with all interops, just import into cuda and then rely on memory aliasing
[14:49:19 CEST] <BtbN> I don't think there is any Vulken<->CUDA interop as of yet
[14:49:29 CEST] <BtbN> *a
[14:49:42 CEST] <BtbN> Also, CUDA/NVENC has pretty specific pixel format and alignment requirements
[14:53:42 CEST] <Lynne> there is, you can import vulkan into cuda, just not the other way around
[14:54:35 CEST] <Lynne> mpv uses it
[14:56:56 CEST] <BtbN> For decoding that is sadly not useful
[14:57:08 CEST] <BtbN> but if you software-decode and then hw-upload to a vulken frame, you could map those to CUDA
[14:57:29 CEST] <BtbN> For CUDA to Vulkan you would be forced to copy
[14:57:59 CEST] <BtbN> We might be able to repurpose the strange dedicated hwupload_cuda to be able to take Vulken frames as input
[14:59:03 CEST] <BtbN> It's impossible to decode straight to a Vulken frame, since nvdec/cuvid just gives you a CUdevptr that's mapped to the frame
[15:00:34 CEST] <Lynne> oic, I thought you allocated a cuda frame and gave it that to decode into
[15:01:03 CEST] <BtbN> A on-gpu memcpy is pretty cheap though, and doing it once isn't a big issue
[15:01:37 CEST] <BtbN> The legacy cuviddec decoder actually does just that anyway, either to a frame in ram or another CUDA frame
[15:02:00 CEST] <BtbN> But the actual nvdec hwaccel constructs an AVFrame out of the mapped NVDEC output
[15:04:09 CEST] <Lynne> how would you do a gpu memcpy with vulkan and cuda though?
[15:04:56 CEST] <BtbN> You do whatever you need to do to get a CUdevptr that maps to a Vulkan frame, and then call memcpy?
[15:05:17 CEST] <Lynne> oh, right
[15:05:30 CEST] <BtbN> or cuMemcpy2D rather
[15:56:06 CEST] <BtbN> Hm, one interesting thing is... a potential CUDA transpose filter could operate in-place with no copy
[15:56:39 CEST] <BtbN> But the benefits of doing that seem not worth it, given the issues that would create with the entire rest of libavfilter
[15:58:24 CEST] <Lynne> does it? I've tested in-place filtering, I know it worked without issues
[16:01:58 CEST] <J_Darnley> Does ffmpeg have a typical style for multi-line comments which aren't doxy ones?
[16:10:06 CEST] <JEEB> seems to differ between even functions in a single module
[16:13:10 CEST] <J_Darnley> Okay, I'll just do something nest and compact here
[16:13:16 CEST] <J_Darnley> *neat
[16:14:36 CEST] <JEEB> (basically just look at the mix of /* */ and multi-line // in movenc.c for example :)
[16:19:39 CEST] <cone-083> ffmpeg 03Paul B Mahol 07master:6b0903075694: avfilter/vf_delogo: unbreak fate
[16:41:05 CEST] <BtbN> You'd end up with a frame where its dimensions potentially mismatch those of its hwframes_ctx. I don't think that's a good idea to do.
[16:48:37 CEST] <cone-083> ffmpeg 03Paul B Mahol 07master:fbaa395917e1: avfilter/vf_v360: remove not needed items from ThreadData
[16:56:04 CEST] <Lynne> BtbN: its fine as long as they dont change then, right?
[16:56:18 CEST] <Lynne> the speed gains are significant
[17:10:47 CEST] <durandal_1707> anybody wants to write SIMD assembly for me?
[17:32:50 CEST] <kierank> durandal_1707: no, learn yourself
[17:38:32 CEST] <Lynne> durandal_1707: if its interesting
[17:48:15 CEST] <J_Darnley> durandal_1707: for what do you want simd this time? the 360 filter?
[17:50:38 CEST] <durandal_1707> bilinear interpolation remap for slice
[17:51:55 CEST] <durandal_1707> i wonder whats max speedup one could expect
[17:52:53 CEST] <durandal_1707> i guess nearest function cant be made faster
[17:53:31 CEST] <durandal_1707> yes for v360 filter
[18:03:18 CEST] <Lynne> not interested. close to having the fastest ever avgblur shader written for reals this time once I refactor more
[18:03:30 CEST] <durandal_1707> lol
[18:07:12 CEST] <BtbN> Lynne, nvenv for example get the dimension information from the hwframes ctx
[18:07:16 CEST] <BtbN> *nvenc
[18:07:23 CEST] <philipl> BtbN: yeah. You'd write a filter that takes the cuda frame and gpu copies it to a vulkan frame using the interop as a kind of hwupload. The equivalent hwdownload would be easier, as you could actually zero-copy it
[18:08:49 CEST] <Lynne> anyone wants to try writing that?
[18:09:04 CEST] <philipl> I could try. I wrote interop for mpv.
[18:09:25 CEST] <philipl> Doing it properly requires a lot of annoying semaphore management though.
[18:10:13 CEST] <cone-083> ffmpeg 03Michael Niedermayer 07master:daf92cc074c5: avcodec/vp3: Check for end of input in 2 places of vp4_unpack_macroblocks()
[18:10:14 CEST] <cone-083> ffmpeg 03Michael Niedermayer 07master:b54031a6e93d: avcodec/bgmc: Check input space in ff_bgmc_decode_init()
[18:10:37 CEST] <philipl> Lynne: Do you have a git repo somewhere with your changes, or just those patches you linked to yesterday?
[18:10:56 CEST] <BtbN> ffmpeg would first need Vulkan hwcontext support before it even makes sense
[18:11:11 CEST] <philipl> BtbN: that's why it would have to go on top of Lynne's patches
[18:11:36 CEST] <Lynne> just a patch, https://0x0.st/z4IJ.patch
[18:11:54 CEST] <BtbN> Is that based on Atomnukers old patches?
[18:11:58 CEST] <Lynne> you'll need to add the download to vulkan_map_to, and the upload (to cuda) to vulkan_map_from
[18:12:02 CEST] <Lynne> yes
[18:12:23 CEST] <Lynne> oh, and it would be nice to add a derive_device case for cuda
[18:12:32 CEST] <philipl> Definitely needs derive_device
[18:13:13 CEST] <philipl> Lynne: what's the use-case for the non-vaapi drm interop? I don't know what other interesting drm source/sinks are.
[18:13:38 CEST] <Lynne> kmsgrab
[18:13:52 CEST] <philipl> ah
[18:14:03 CEST] <Lynne> for quick and easy derive_device you can just set the vendor_id in the search stuct to nvidia
[18:14:29 CEST] <philipl> You're supposed to do UUID matching, which is what I put in mpv.
[18:14:44 CEST] <Lynne> is it identical between vulkan and cuda?
[18:14:59 CEST] <philipl> Yes. It's the only thing that nvidia guarantee
[18:15:27 CEST] <philipl> I'll look at this in the next few days.
[18:15:49 CEST] <philipl> Fun isn't quite the right work, but I can't help myself.
[18:20:08 CEST] <Lynne> btw how do you guarantee that the gpu memcpy has finished?
[18:20:16 CEST] <Lynne> for cuda->vulkan
[18:22:04 CEST] <philipl> that is the semaphores.
[18:22:19 CEST] <philipl> you import a semaphore from vulkan and signal it.
[18:22:44 CEST] <philipl> so many lines of code.
[18:23:09 CEST] <Lynne> so imported semaphores are signalled correctly on both apis? nice
[18:23:25 CEST] <philipl> yes. and you can signal in both directions.
[18:24:14 CEST] <philipl> I needed all of that to make the mpv interop work as the vulkan images are reused. so have to sync write and then make sure the read completes before writing the next frame.
[18:26:59 CEST] <Lynne> at least nvidia care about sync, drm does not expose any semaphores used internally, and nor does vaapi
[18:27:33 CEST] <durandal_1707> kierank: do not be so mean
[18:27:41 CEST] <kierank> durandal_1707: lol
[18:36:18 CEST] <philipl> Lynne: I know you wrote all that code, but it might not be crazy to use libplacebo for the hwcontext. It wraps a bunch of the vulkan in a useful way.
[18:37:03 CEST] <philipl> haasn: paging hassn.
[18:43:27 CEST] <Lynne> that's just not happening.
[18:46:04 CEST] <durandal_1707> why?
[18:48:55 CEST] <Lynne> durandal_1707: many reasons. no one packages it, the abstractions are mostly rendering-side biased, probably slower, needs more code written for interops
[18:52:31 CEST] <durandal_1707> where are jamrial and mkver when you need them?
[18:53:17 CEST] <jamrial> ?
[18:53:56 CEST] <durandal_1707> see mkv bug?
[18:54:25 CEST] <durandal_1707> apparently slow seeking
[19:04:34 CEST] <nevcairiel> its probably the same bug that was already reported a few days ago
[19:08:34 CEST] <jamrial> unlikely, the commit they tested is after the latest fix
[19:08:39 CEST] <cone-083> ffmpeg 03Andrey Semashev 07master:7ea2710ec4d1: configure: Update libmysofa check with a new symbol.
[19:28:12 CEST] <durandal_1707> i can not reproduce this
[19:29:07 CEST] <cone-083> ffmpeg 03Pavel Koshevoy 07master:6b57a294a328: lavc/v4l2_m2m: don't close the file descriptor we don't own
[19:31:44 CEST] <jamrial> i get a delay of a couple seconds, but nowhere near the 20+ seconds mentioned in the ticket
[19:39:30 CEST] <jamrial> there are like 87k cuepoints in this output
[19:45:45 CEST] <durandal_1707> i get no delay at all
[20:28:40 CEST] <Lynne> avgblur_opencl = 480fps, avgblur_vulkan = 491 fps, I now for reals have the fastest ever avgblur shader, and I can make it faster still
[20:29:50 CEST] <Lynne> first I need to find a heisenbug where a third allocation makes descriptor templates explode though
[20:30:13 CEST] <JEEB> :)
[20:30:16 CEST] <JEEB> 'grats
[20:31:30 CEST] <Lynne> meh, I have the nvidia machine up, I'll try my luck at the interop, it is never not time to fast
[20:41:47 CEST] <philipl> Lynne: check out the mpv code if you want to chase down the cuda interop.
[20:42:45 CEST] <philipl> One thing I'll predict. The mapping API is expensive - they really want you to re-use images. So you'll want to map frames from the pool and keep them mapped and re-use them as much as possible.
[21:04:25 CEST] <durandal_1707> whats point giving giberish text into fuzz fixing commits if such text is not freely available to wider audience
[21:24:45 CEST] <Lynne> can I get a CUdevice from a CUcontext or CUstream?
[21:25:26 CEST] <philipl> Lynne: I don't think so. Never tried.
[21:26:12 CEST] <philipl> No, you can.
[21:26:14 CEST] <philipl> https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX
[21:27:31 CEST] <philipl> You'd need to add the function prototype to our headers.
[21:28:06 CEST] <Lynne> cuCtxGetDevice?
[21:28:33 CEST] <BtbN> Why would you need that though? We always carry the device around, don't we?
[21:28:42 CEST] <Lynne> cuCtxGetDevice(*device) so the context is a global state?
[21:28:51 CEST] <nevcairiel> its thread-local state
[21:29:02 CEST] <Lynne> yeah, I can add CUdevice to libavutil/hwcontext_cuda_internal.h
[21:29:17 CEST] <BtbN> Is it really not in there somewhere already?
[21:29:53 CEST] <BtbN> Doesn't look like it, no. What do you need it for?
[21:31:52 CEST] <Lynne> cuDeviceGetUuid
[21:32:31 CEST] <durandal_1707> if nobody gonna help me do SIMD i will do it, so you all can be disapointed by not helping
[21:33:18 CEST] <kierank> durandal_1707: just learn
[21:33:22 CEST] <kierank> I can teach
[21:33:25 CEST] <kierank> I taught atomnuker
[21:33:45 CEST] <durandal_1707> you cant ride bike?
[21:34:07 CEST] <kierank> eh?
[21:35:30 CEST] <durandal_1707> i learned SIMD already, but i need much more experienced SIMD developer to show me little tricks of trade
[21:37:01 CEST] <kierank> ask Gramner
[21:37:06 CEST] <kierank> write code, ask Gramner to help improve
[21:38:12 CEST] <cone-083> ffmpeg 03Michael Niedermayer 07master:64ac8a6e697e: avcodec/apedec: Fix integer overflow in filter_fast_3320()
[21:38:14 CEST] <cone-083> ffmpeg 03Michael Niedermayer 07master:8ae5d2cbb254: vcodec/apedec: Fix integer overflow in filter_3800()
[21:38:14 CEST] <cone-083> ffmpeg 03Michael Niedermayer 07master:361b3c873ee0: avcodec/pngdec: Optimize has_trns code
[21:38:15 CEST] <cone-083> ffmpeg 03Michael Niedermayer 07master:0ee886988e75: avcodec/ralf: fix undefined shift
[21:38:16 CEST] <cone-083> ffmpeg 03Michael Niedermayer 07master:4778407ab3b5: avcodec/ralf: fix undefined shift in extend_code()
[21:38:19 CEST] <durandal_1707> he is dav1d busy
[21:46:08 CEST] <tmm1> jkqxz: re: dummy drm device, for v4l2 the decoder support both drm and software formats, so using alloc/init to force a dummy drm device like rkmpp doesn't work. there's also v4l2 filters and encoders which would ideally share one global hwcontext instead of each creating their own dummy
[22:00:43 CEST] <cone-083> ffmpeg 03Marton Balint 07master:73e0035812cc: docs/formats: fix max_interleave_delta default
[22:00:44 CEST] <cone-083> ffmpeg 03Marton Balint 07master:f4eb7d84a7c2: avformat/mpegtsenc: fix flushing of audio packets
[22:01:38 CEST] <Lynne> do I need to call cuCtxPushCurrent/Pop before/after the cuMemcpy2DAsync?
[22:14:21 CEST] <philipl> Lynne: yes.
[22:22:36 CEST] <cone-083> ffmpeg 03Marton Balint 07release/4.2:3a17fe2bdd57: avformat/mpegts: fix teletext PTS when selecting teletext streams only
[22:22:37 CEST] <cone-083> ffmpeg 03Marton Balint 07release/4.2:b4e910370992: avformat/avidec: add support for recognizing HEVC fourcc when demuxing
[22:33:04 CEST] <tmm1> jkqxz: so for example you can decode without hwaccel to get pix_fmt=nv12, or with hwaccel for pix_fmt=drm_prime: https://paste.ubuntu.com/p/268XmYcMF9/. if you think it would be cleaner to add a new v4l2 hwaccel i can do that instead, but it doesn't seem like there's a standard api to map/hwupload frames since each implementation uses its own gpu memory allocator apis
[22:48:17 CEST] <Lynne> tmm1: own gpu allocation apis? as in proprietary ones?
[22:52:47 CEST] <cone-083> ffmpeg 03Aman Gupta 07master:b022d9ba288a: avcodec/omx: fix xFramerate calculation
[22:53:34 CEST] <cone-083> ffmpeg 03Aman Gupta 07release/4.2:0f8e2a0b8644: avcodec/omx: fix xFramerate calculation
[22:56:07 CEST] <tmm1> Lynne: i believe so, for instance on android there's a new ION allocator that standardizes nvidia/nvmap ti/cmem qualcomm/pmem
[22:57:00 CEST] <tmm1> on rpi you need the CMA allocator
[22:57:42 CEST] <tmm1> my understanding of this stuff is still rudimentary, but it seems like there is no one way to get gpu memory that you can copy the frame into
[23:08:38 CEST] <Lynne> philipl: I need the WidthInBytes stride to do a memcpy, and there's no way to get it in vulkan
[23:09:14 CEST] <Lynne> vkGetImageSubresourceLayout only works for linear images and the tiling must be optimal to import into cuda
[23:11:05 CEST] <Lynne> er, srcPitch and dstPitch
[23:12:00 CEST] <cone-083> ffmpeg 03Andriy Gelman 07master:ef43a4d6b38d: avformat: Add ZeroMQ as a protocol
[23:21:35 CEST] <xmichael> https://developer.apple.com/videos/play/wwdc2019/502/
[23:21:50 CEST] <xmichael> https://developer.apple.com/documentation/http_live_streaming/protocol_extension_for_low-latency_hls_preliminary_specification
[23:22:42 CEST] <philipl> Lynne: so, you are doing a memcpy to a cuda array. That means you have no dstPitch, and the width in bytes is the width of the actual image (srcPitch captures the actual stride)
[23:23:40 CEST] <philipl> https://github.com/mpv-player/mpv/blob/master/video/out/hwdec/hwdec_cuda.c#L228
[23:26:28 CEST] <philipl> and just generally: https://github.com/mpv-player/mpv/blob/master/video/out/hwdec/hwdec_cuda_vk.c
[23:32:02 CEST] <Lynne> philipl: its not a cuda array though, its a standard CUdeviceptr I get from cuExternalMemoryGetMappedBuffer
[23:32:32 CEST] <Lynne> couldn't I just do a 1D memcpy?
[00:00:00 CEST] --- Tue Sep 3 2019
More information about the Ffmpeg-devel-irc
mailing list