[FFmpeg-devel] [PATCH 4/4] avformat/libopenmpt: Probe file format from file data if possible
Jörn Heusipp
osmanx at problemloesungsmaschine.de
Sun Jan 7 16:40:45 EET 2018
On 01/06/2018 04:10 PM, Carl Eugen Hoyos wrote:
> 2018-01-06 11:07 GMT+01:00 Jörn Heusipp <osmanx at problemloesungsmaschine.de>:
>> libopenmpt file header probing is tested regularly against the FATE
>> suite and other diverse file collections by libopenmpt upstream in
>> order to avoid false positives.
>
> You could also test tools/probetest
I was not aware of that tool. Thanks for the suggestion.
It currently lists a failure related to libopenmpt:
Failure of libopenmpt probing code with score=76 type=0 p=FDC size=1024
Looking at tools/probetest.c, that would be a file with very few bits
set. libopenmpt detects the random data in question as M15 .MOD files
(original Amiga 15 samples .mod files), and there is sadly not much that
can be done about that. There are perfectly valid real-world M15 .MOD
files with only 73 bits set in the first 600 bytes
(untouchables-station.mod,
<https://modarchive.org/index.php?request=view_by_moduleid&query=104280>).
The following up to 64*4*4*63 bytes could even be completely 0 (empty
pattern data) for valid files (even without the file being totally
silent). The generated random data that tools/probetest complains about
here contains 46 bits set to 1 in the first 600 bytes. What follows are
various other examples with up to 96 bits set to 1. Completely loading a
file like that would very likely reject it (in particular, libopenmpt
can deduce the expected file size from the sample headers and, with some
slack to account for corrupted real-world examples, reject files with
non-matching size), however, that is not possible when only probing the
file header.
The libopenmpt API allows for the file header probing functions to
return false-positives, however false-negatives are not allowed.
Performance numbers shown by tools/probetest are what I had expected
(measured on an AMD A4-5000, during normal Xorg session (i.e. not 100%
idle)):
1110194971 cycles, aa
24986722468 cycles, aac
26418545168 cycles, ac3
1484717267 cycles, acm
1627888281 cycles, act
2109884646 cycles, adp
2316235992 cycles, ads
1244706028 cycles, adx
1132390431 cycles, aea
1729241354 cycles, aiff
1728288238 cycles, aix
2662531158 cycles, amr
16189546067 cycles, amrnb
10342883200 cycles, amrwb
1487752343 cycles, anm
2268900502 cycles, apc
1140814303 cycles, ape
2181170710 cycles, apng
18698762054 cycles, aqtitle
2656908730 cycles, asf
2402762967 cycles, asf_o
18148196647 cycles, ass
1392503829 cycles, ast
1774264703 cycles, au
1807159562 cycles, avi
1745391230 cycles, avr
1370939762 cycles, avs
1555620708 cycles, bethsoftvid
1459171160 cycles, bfi
2640635742 cycles, bink
2022320986 cycles, bit
1664933324 cycles, bfstm
1588023172 cycles, brstm
1769430536 cycles, boa
2294286860 cycles, c93
1022646071 cycles, caf
9063207678 cycles, cavsvideo
1898790300 cycles, cdxl
1037718383 cycles, cine
3358938768 cycles, concat
2367399953 cycles, dcstr
1795803759 cycles, dfa
1454750468 cycles, dirac
1167905836 cycles, dnxhd
2076678208 cycles, dsf
1226761232 cycles, dsicin
1157816261 cycles, dss
31466350564 cycles, dts
1357475606 cycles, dtshd
15626181281 cycles, dv
12227021709 cycles, dvbsub
1747998309 cycles, dvbtxt
1941371107 cycles, dxa
1988122049 cycles, ea
1395161162 cycles, ea_cdata
21013119067 cycles, eac3
1282697126 cycles, epaf
1658521102 cycles, ffm
2919787300 cycles, ffmetadata
3786264941 cycles, fits
2700385826 cycles, flac
1840776863 cycles, flic
1317695853 cycles, flv
1511756606 cycles, live_flv
1135064427 cycles, 4xm
1830871233 cycles, frm
3011403748 cycles, fsb
1462985803 cycles, gdv
1708440935 cycles, genh
3480430744 cycles, gif
2533542048 cycles, gsm
2412598563 cycles, gxf
21637989787 cycles, h261
22268834035 cycles, h263
22135718754 cycles, h264
13939886275 cycles, hevc
1979375582 cycles, hls,applehttp
1658646375 cycles, hnm
1507634977 cycles, ico
2534774499 cycles, idcin
1684324336 cycles, idf
1353664382 cycles, iff
2978779893 cycles, ilbc
1892353081 cycles, alias_pix
2456259645 cycles, brender_pix
2077466815 cycles, ingenient
11281657144 cycles, ipmovie
1840789384 cycles, ircam
2455541614 cycles, iss
1114518907 cycles, iv8
1750327098 cycles, ivf
3803895407 cycles, ivr
30510491919 cycles, jacosub
1271391143 cycles, jv
1504674165 cycles, lmlm4
28284647311 cycles, loas
2746771768 cycles, lrc
1630546444 cycles, lvf
2198871369 cycles, lxf
15210250791 cycles, m4v
2074024051 cycles, matroska,webm
1756348463 cycles, mgsts
13894318111 cycles, microdvd
15146276963 cycles, mjpeg
13215378411 cycles, mjpeg_2000
21505153187 cycles, mlp
1623684275 cycles, mlv
2009009898 cycles, mm
1401453493 cycles, mmf
3614852044 cycles, mov,mp4,m4a,3gp,3g2,mj2
37065167696 cycles, mp3
2003306237 cycles, mpc
1695842377 cycles, mpc8
20922947044 cycles, mpeg
26950626806 cycles, mpegts
12903395151 cycles, mpegvideo
1861191163 cycles, mpjpeg
11292546869 cycles, mpl2
10904909514 cycles, mpsub
2556705558 cycles, msf
14520727615 cycles, msnwctcp
1513345014 cycles, mtaf
1498181103 cycles, mtv
2100567692 cycles, musx
1398481833 cycles, mv
3839928046 cycles, mxf
1084340183 cycles, nc
2260039804 cycles, nistsphere
1557302811 cycles, nsp
14077588650 cycles, nsv
12804865958 cycles, nut
3498085105 cycles, nuv
2785399093 cycles, ogg
2800628120 cycles, oma
2241873172 cycles, paf
11630567717 cycles, pjs
1538360044 cycles, pmp
1966776985 cycles, pva
2051297210 cycles, pvf
1464824135 cycles, qcp
1395151376 cycles, r3d
13872717447 cycles, realtext
1648061451 cycles, redspark
1881530375 cycles, rl2
1865198787 cycles, rm
1848791502 cycles, roq
3141932957 cycles, rpl
2379252069 cycles, rsd
31146518791 cycles, s337m
7497815228 cycles, sami
24830800138 cycles, sbg
15351196732 cycles, scc
9758760073 cycles, sdp
2159674057 cycles, sdr2
1555316250 cycles, sds
1533405328 cycles, sdx
1681270049 cycles, film_cpk
2303851902 cycles, shn
1761647489 cycles, siff
1510520120 cycles, smk
2859907925 cycles, smjpeg
1643498999 cycles, smush
1545689291 cycles, sol
1912740702 cycles, sox
17486361594 cycles, spdif
20080502425 cycles, srt
2659637846 cycles, psxstr
17633213722 cycles, stl
8032855323 cycles, subviewer1
8572013351 cycles, subviewer
2043897951 cycles, sup
2980746200 cycles, svag
1617398584 cycles, swf
2842115745 cycles, tak
5320163051 cycles, tedcaptions
1884107745 cycles, thp
4320119922 cycles, 3dostr
2018755118 cycles, tiertexseq
1714617022 cycles, tmv
21456317423 cycles, truehd
1050826275 cycles, tta
2065773077 cycles, txd
1577829281 cycles, ty
3450802460 cycles, vag
19179500628 cycles, vc1
1860036853 cycles, vc1test
2035593194 cycles, vivo
1518758455 cycles, vmd
2696860615 cycles, vobsub
2762235280 cycles, voc
1957794567 cycles, vpk
15280000639 cycles, vplayer
1763355055 cycles, vqf
1879310121 cycles, w64
1717961542 cycles, wav
2095837026 cycles, wc3movie
2960188092 cycles, webvtt
1922356839 cycles, wsaud
1978715237 cycles, wsd
1468438585 cycles, wsvqa
2668937770 cycles, wtv
3193222838 cycles, wve
1744694735 cycles, wv
1677278541 cycles, xa
1759862474 cycles, xbin
2077217647 cycles, xmv
2161496331 cycles, xvag
2330794326 cycles, xwma
1103137131 cycles, yop
2154690280 cycles, yuv4mpegpipe
1842301899 cycles, bmp_pipe
2039875920 cycles, dds_pipe
1627504710 cycles, dpx_pipe
1463019740 cycles, exr_pipe
1539585051 cycles, j2k_pipe
1187861714 cycles, jpeg_pipe
1682815484 cycles, jpegls_pipe
1840465166 cycles, pam_pipe
1755858395 cycles, pbm_pipe
1211589601 cycles, pcx_pipe
2002446954 cycles, pgmyuv_pipe
1818965412 cycles, pgm_pipe
1654095834 cycles, pictor_pipe
1404252441 cycles, png_pipe
1211120882 cycles, ppm_pipe
1205883539 cycles, psd_pipe
1764091290 cycles, qdraw_pipe
1091809273 cycles, sgi_pipe
2994663150 cycles, svg_pipe
1348938514 cycles, sunrast_pipe
1464347337 cycles, tiff_pipe
1142572756 cycles, webp_pipe
1412715104 cycles, xpm_pipe
3550700989 cycles, libmodplug
109589637233 cycles, libopenmpt
2672917981 libopenmpt (per module format)
At first glance, libopenmpt looks huge here in comparison. However one
should consider that libopenmpt internally has to probe for (currently)
41 different module file formats, going through 41 separate probing
functions internally.
Dividing 109589637233 by 41 gives 2672917981, which is in the ballpark
of all other probing functions in ffmpeg.
>> +#if OPENMPT_API_VERSION_AT_LEAST(0,3,0)
>> + if (p->buf && p->buf_size > 0) {
>> + probe_result = openmpt_probe_file_header_without_filesize(
>> + OPENMPT_PROBE_FILE_HEADER_FLAGS_DEFAULT,
>> + p->buf, p->buf_size,
>> + &openmpt_logfunc, NULL, NULL, NULL, NULL, NULL);
>> + if (probe_result == OPENMPT_PROBE_FILE_HEADER_RESULT_FAILURE) {
>> + score = score_fail;
>
> What's wrong with return 0;?
Nothing. If preferred, I can get rid of all score_* constants and use 0
or AVPROBE_SCORE_* directly.
>> + } else if (probe_result == OPENMPT_PROBE_FILE_HEADER_RESULT_SUCCESS) {
>> + score = FFMAX(score, score_data);
>
> What does OPENMPT_PROBE_FILE_HEADER_RESULT_SUCCESS mean?
It is documented as "OPENMPT_PROBE_FILE_HEADER_RESULT_SUCCESS: The file
will most likely be supported by libopenmpt." (see
<https://lib.openmpt.org/doc/group__libopenmpt__c.html#ga92cdc66eb529a8a4a67987b659ed3c5e>).
An ultimately precise answer is never possible as that would require
actually trying to load the complete file in some cases:
* Not all module file formats store feature flags in the file header.
* Some module file formats provide very little file magic numbers,
and/or file magic numbers at strange offsets (like at 1080 for M.K. .MOD).
* Some formats store header-like information in the file footer, which
is not accessible during probing.
* The extreme case of M15 (original 15 samples Amiga .MOD files)
provides absolutely no true file header or magic numbers. libopenmpt
implements heuristics to reliably identify and probe even those, however
there is only so much it can do.
* Some container formats (Unreal Music .UMX, which can contain module
music files) theoretically potentially require seeking to arbitrary
locations in the file in order to determine the format.
> Why not return MAX?
For all the reasons listed above, even though libopenmpt tries to be as
pessimistic as possible, false positives fundamentally cannot be avoided
completely. As the libopenmpt probing logic is code outside of ffmpeg,
the effects of such a false positive could potentially cause
mis-detection of other formats supported by ffmpeg, which would not be
immediately or easily fixable by ffmpeg itself. I used the lowest
possible score that makes sense in order to reduce the risk of potential
impact.
The probing result in this case is deduced from looking at the actual
file data, as opposed to just trusting a mime-type which is external to
the file and could be inconsistent/wrong, which is why I used a score
higher than AVPROBE_SCORE_MIME.
I opted for AVPROBE_SCORE_MIME+1, which seemed reasonable to me.
Should I add a comment explaining the reasoning to the code?
>> + } else if (probe_result == OPENMPT_PROBE_FILE_HEADER_RESULT_WANTMOREDATA) {
> > I believe this should return 0 but maybe you found that this is bad?
Would 0 be semantically right here?
OPENMPT_PROBE_FILE_HEADER_RESULT_WANTMOREDATA means that libopenmpt
requires more data to come to any usable conclusion, which is what I
thought AVPROBE_SCORE_RETRY would mean.
I do not see any particular problem with returning 0 in this case
either, given the probing logic in av_probe_input_format() (and it would
reduce the whole probe_result ==
OPENMPT_PROBE_FILE_HEADER_RESULT_WANTMOREDATA block to a single line).
However, if client code directly calls .read_probe() on AVInputFormat
ff_libopenmpt_demuxer, I think returning AVPROBE_SCORE_RETRY (or
similar) makes more sense.
>> + if (score > score_fail) {
>> + /* known file extension */
>> + score = FFMAX(score, score_ext_retry);
>> + } else {
>> + /* unknown file extension */
>> + if (p->buf_size >= openmpt_probe_file_header_get_recommended_size()) {
>> + /* We have already received the recommended amount of data
>> + * and still cannot decide. Return a rather low score.
>> + */
>> + score = FFMAX(score, score_retry);
>> + } else {
>> + /* The file extension is unknown and we have very few data
>> + * bytes available. libopenmpt cannot decide anything here,
>> + * and returning any score > 0 would result in successfull
>> + * probing of random data.
>> + */
>> + score = score_fail;
>
> This patch indicates that it may be a good idea to require libopenmpt 0.3,
The amount of #ifdef needed to support 0.2 and 0.3 is rather small, I think.
I understand that the current (and future libopenmpt 0.2) way of solely
relying on the file extension is far from optimal, but I do not see any
reason to drop libopenmpt 0.2 support right now; in particular,
continuing 0.2 support as is would be no regression. Additionally,
libopenmpt 0.2 can be built with C++03 compilers while libopenmpt 0.3
requires a C++11 compiler, thus, libopenmpt 0.3 cannot easily be built
on older platforms.
libopenmpt 0.2 also allows for file probing, however the API and code
path is very heavy-weight (goes through the normal file loader and
discards unneeded data), which I fear would be way too heavy
performance-wise for ffmpeg.
> when was it released, which distributions do not include it?
The first version of libopenmpt 0.3 was released 2017-09-28.
I am not aware of any stable, non-rolling distribution that ships
libopenmpt 0.3 as of now.
Debian 9 has libopenmpt 0.2.7386~beta20.3-3+deb9u2
Ubuntu 17.10 has libopenmpt 0.2.8760~beta27-1
Ubuntu 16.04 LTS has no libopenmpt at all
even openSUSE Tumbleweed only has libopenmpt 0.2.8461~beta26
Debian Testing and Ubuntu Bionic both have libopenmpt 0.3.4.
I do not think ffmpeg should drop libopenmpt 0.2 support at the moment.
Regards,
Jörn
More information about the ffmpeg-devel
mailing list