[FFmpeg-devel] Ideas to replace the options system
Nicolas George
george at nsup.org
Fri Dec 4 15:33:39 CET 2015
Hi.
This is a rather long explanation on ideas I have to replace the options
system with something better. I will not work on it before I have made
significant progress on de-recursiving lavfi, but I can still think of it
and mature its design while walking in the street or waiting for the subway.
And of course, if someone else wants to start working on something similar,
so much the better.
TL;DR: I have put small summaries marked by this tag at the end of most long
sections.
Why do we need a new options system?
Most importantly: escaping hell
What is escaping hell?
Read this as "escaping" the noun, not the verb: the task of adding a
backslash in front of every special character in a string. It becomes
escaping hell when you have to add backslashes in front of the
backslashes that protect the backslashes that protect the special
characters. See this bit in the documentation:
-vf "drawtext=text=this is a \\\\\\'string\\\\\\'\\\\: may contain one\\, or more\\, special characters"
And it does not even use the % expansion in drawtext. Using single
quotes can help a little, but it works only once.
Why do we have escaping hell?
Because we are using strings all over the place. This is somewhat of a
paradox since multimedia processing rarely uses strings at all. There
are reasons to use strings, and they drove the current options system.
A lot of our user-interface is based on strings. There is a bad reason
for that and a good one. But the real problem is that it does not stop
there.
Strings for the user interface
The bad reason we use strings everywhere for the user interface is
that most of the user interface is thought with the command-line
tool ffmpeg in mind rather than high-level applications using the
API. I am not very fond of microsoft's products, but I think that if
ffmpeg had been designed for powershell in mind from the start, the
user interface would use much less strings (there would be a whole
lot of other problems, though).
The good reason is that strings are good. Reading and writing is an
universal skill amongst users that need options. So whenever there
is no specific way of showing or entering a value, a string will do:
the users can read it and understand it and write it.
For example, imagine a GUI application that shows all the options of
a codec: numbers are spin buttons or sliders, enumerated values are
drop-down menus, flags are check buttons, etc. But sometimes we add
new types; applications must be ready for unknown types. For
example, if we add a DATE type, the next version of the application
would probably use a calendar widget, but the current version can
not. For these fallback cases, free-form strings are the solution.
Two side notes on that example: first, the application have no way
of validating the syntax of a free-form string (think wireshark
showing invalid filters in red) without actually setting the option;
second, before Clément added type BOOL, boolean options were shown
as 0-1 spin buttons: not optimal but fine enough, but after BOOL was
introduced, they become strings as a fallback until the applications
implements BOOL: still functional but less fine.
TL;DR: strings for the user interface are good.
Strings as intermediate storage for internal structures
Since most of the user interface is based on strings, we have a lot
of internal APIs to handle strings: string -> string maps
(AVDictionary), key-values parsers, etc. These APIs are robust: they
handle escaping, and thus allow users to type any string at any
level. But they are at the same time too generic and too low level.
For example, the key-value parser reads the value until the
delimiter, handling escaping, and returns a string. Then the string
is usually parsed according to the type of the option. This is
simple but inefficient: consider "w=max(iw,1024),h=max(ih,768)",
even though there is no ambiguity, the parser requires escaping the
inner commas.
The most prevalent case of this is the use of AV_OPT_TYPE_STRING:
the string field is almost always re-parsed and dispatched into
other fields. Sometimes, someone gets fed up with it and implements
a new AV_OPT_TYPE: IMAGE_SIZE, COLOR, CHANNEL_LAYOUT, etc. But that
can only be done with types that are generic enough to be present in
lavu; it can never happen for types that are only used in one codec
or filter for example.
(Note: some people suggested to have the token parser check the
balancing of asymmetric delimiters in order to fix the example
above. I think this would be a very bad idea: it only fixes a few
cases and sacrifices uniformity (think: text="2) a) Prove that x is
in [0,5[."). The ideas I will propose below take care of this case
and more.)
TL;DR: we use AV_OPT_TYPE_STRING and parse later instead of using
parsers that are aware of the type being parsed.
Clumsy syntaxes
Remember before Anton moved filter contexts to AVOption (which was a
good move), some filter used a specific parser, with a syntax similar to
the usual key=value:key=value but fine-tuned to the filter. When the
move was done, it had to be dropped somehow, usually using a single
string option and a different delimiter. For example, you could write
pan=stereo:L=L+FC:R=R+FC, now you have to use | to separate channels.
Similar cases can arise with filters with a variable number of inputs or
outputs: we would want to be able to write in0=...:in1=..., but it is
not possible to have an unlimited number of options like that.
Inextensible options sets
When a component wraps an external libraries, each option of the library
must have a corresponding AVOption. If there are many, that takes a lot
of work, and if the library has frequently new options, the wrapper will
always lag behind. Many libraries like that have an introspection
system. If they used AVOption itself, we could declare their objects as
child objects, that is what the scale filter does with the options for
libsws. But we can not easily wrap a different introspection system.
Instead, we use a string that we re-parse as key=value pairs, like
x264opts.
Inextensible API
The current API uses arrays of AVOption, making sizeof(AVOption) part of
the ABI.
A new options system
AVType
The AVOption system uses the AV_OPT_TYPE enum to describe the type of
options. Parsing and printing is done using big switch statements in
opt.c. That makes it impossible to define new types and parsers from
codecs or filters to handle specific types.
Instead, I suggest to use (pointers to) an AVType structure (all names
are of course just proposals) that holds pointers to functions that do
the parsing and printing, and also initing and freeing.
Of course, lavu must provide AVType structures for all the basic types:
integers, floats, etc., anything that already has an AV_OPT_TYPE. But
lavc/lavf/lavfi can define their own types, and any codec/muxer/filter
can do too.
TL;DR: a structure with function pointers to parse and print a
particular type.
AVTypeTrait
It is not possible to use pointers in switch statements. But making
switch statements on the type is a bad idea. Remember the example with
boolean options: they used to be 0-1 integers, but now that Clément
introduced AV_OPT_TYPE_BOOL, until the applications are updated they
fallback to the default case of the switch statement. This is not
efficient: boolean options can still be treated as 0-1 integers.
Instead, I propose a system similar to Rust's trait system. For those
more familiar with Java, a Rust trait is similar to a Java interface
without the object-oriented sales-pitch.
AVTypeTrait is a structure whose main purpose is to let the linker and
libc create globally unique identifiers: its address.
AVTypeTraitImpl is a structure with a pointer to AVTypeTrait as the
only field.
AVTypeTraitImplSomething (Something = Int / Float / ...) is the same as
AVTypeTraitImpl, but with extra fields, mostly (only?) function
pointers. C guarantees (6.7.2.1 #12 or #13) that a pointer to
AVTypeTraitImplSomething can be cast to a pointer to AVTypeTraitImpl.
The first field of AVTypeTraitImplSomething must point to the unique
instance of AVTypeTrait identifying Something.
The way to use it is like that: av_type_trait_something_check(ti) checks
the first field and returns true or false if it is the correct
AVTypeTrait. If it returns true, then I know that ti is actually a
pointer to AVTypeTraitImplSomething, and I can access its fields.
An AVType holds a (short) list of pointers to AVTypeTraitImpl that
provide the functions to handle an option of this type in a particular
way.
For example, the AVType for boolean options will have (at least) an
AVTypeTraitImplBoolean and an AVTypeTraitImplInteger.
AVTypeTraitImplInteger contains fields for set, get, get_range, etc.
This was quite complex to explain, but it is actually rather simple to
implement, and even easier to use from an application point of view:
test than an option / object behaves like an integer with
av_obj_is_int(), then use integer functions on it:
av_obj_int_get_range() for example. And always test the more specific
trait first: if boolean, create check box, else if number create spin
button, else create text entry box.
TL;DR: a structure with function pointers to handle a particular type
with a generic API, and pointer magic to make the API optional.
AVTypeInstance
This is a structure holding an AVType pointer plus a few extra fields
that give information specific to an instance of the type, for example
initial value or range, plus opaque fields that are specific to the
type.
Giving types to context options
AVClass gets a new field: AVTypeInstance *get_type(void *obj). It
returns the AVTypeInstance of the corresponding context as a whole.
(Note: it can not be AVType* directly, as it may cause problems for
static initializers, especially with shared libraries.)
It allows to init the context as a whole from a single string, which we
do not currently do, but that is not the point.
Most types used for contexts would implement the Fields trait. That
means there is an API to query them for named fields, each with its
corresponding type instance. This is very similar to what we have now,
except the type system allows more possibilities for the types and
parsers.
In particular, if the context's AVClass does not have a get_type()
callback but has an AVOption array, then the fallback function is a
wrapper around the AVOption system. Components that have not yet been
upgraded still work exactly as before.
Context-aware parsing
How does it solve escaping hell? Of course, it requires a bit of syntax
change. Getting rid of escaping hell is in itself a big syntax change,
hopefully so much for the better that people will not complain. It will
be probably possible to keep compatibility with the current syntax when
no escaping is used, i.e. most of the time.
Also, the parsers for base type need to be adapted to be able to work
nicely together.
Substring parsers
Parsers need to be able to operate on a substring, and stop when they
reach the delimiters for the surrounding syntax. This is, in fact,
rather easy to achieve.
Think how strtol() works: consume the string while there are digits
and return a pointer to the end of it. Then the surrounding parser can
continue parsing at this point. Actually, all parsers should behave
that way anyway, irregardless of escaping hell, because it is more
convenient.
And while we are at it, we should change them to accept strings as
pointer+length or pointer+end instead of zero-terminated C strings.
TL;DR: all parsers must be designed to work in the middle of strings.
Self-delimiting syntax
The syntax must allow parsers to know when their span of text is
finished without relying on the next character(s) in the string. That
way, the next character can be the delimiter for the surrounding
syntax without requiring escaping.
For example, consider a list of subexpressions separated by '+', and
two of the subexpression happens to be a math expression: how do we
know whether 1+2+3 means 1+2 and 3 or 1 and 2+3 or anything else?
The syntax can be anything, and can be fine-tuned for the particular
type at had. For key=value lists, it can be a double delimiter at the
end, or surrounding braces. Actually, surrounding markers should
probably preferred because of of the next point.
Note that we probably do not need a self-delimiting syntax against
alphanumeric delimiters: nobody will have the stupid idea of making a
'4'-separated list of numbers, and if they do, they deserve a taste of
escaping hell. Therefore, numbers and symbol names are already
self-delimiting. For times, since we use commas as delimiters all over
the place, we should allow "5h42m22" (with or without a final "s") on
top of "5:42:22".
Also, the delimiters do not have to be mandatory. For example, braces
around a key=value list can be completely optional. And for times,
5:42:22 is still accepted.
TL;DR: individual syntaxes must be tuned to avoid ambiguity.
List of forbidden characters
Parsers for AVType accept a list of forbidden characters, typically
delimiters for the surrounding syntax. If they encounter one of these
characters, they should stop parsing, just as if they encountered the
end of the string.
Except if they have good reason not to.
The obvious good reason is that the character is prefixed with a
backslash. That is escaping, escaping is really unavoidable. But that
is not escaping hell, since there is only one level.
Another good reason is that the character appears inside balanced
delimiters: parentheses, braces. This is valid because the parsing
would otherwise fail. Consider the example I used before, slightly
extended: "w=8+max(iw,1024),...". If the parser stops at the inner
comma, then it returns the successful parsing of the expression "8",
the surrounding parsers will see a '+' instead of their expected
delimiter and stop, all the way to the top.
Note that it is up to parsers to decide what constitute a good reason,
and in particular balanced delimiters. A XML parser (unfortunately, we
will need one at some point for some web formats) shall consider <...>
as balanced delimiters, and thus require no escaping for ':' in
namespaced attributes, but not parentheses. And conversely, the
expression parser shall consider balanced parentheses, but certainly
not comparison operators (hopefully, at some point we will be able to
write "x<42?40:50" instead of "if(lt(x,42),40,50)").
TL;DR: parsers take an argument telling them what the delimiters are
for the surrounding syntax.
Backward compatibility with AVOption
If a context is designed to use the new system, it will appear to have
no AVOption of its own. It already happens, it is not an API break; we
should be careful removing existing AVOption arrays, though. On the
other hand, av_opt_set() will work, just setting the string (possibly
with a parser in back-to-escaping-hell mode, for maximum compatibility)
through the new system; the other av_opt_set_xxx() function will work if
the field implements the corresponding trait.
Extra features
This system allows a few new interesting features. Some of them just
thanks to no longer worrying about sizeof(AVOption).
Special syntaxes
Any component can define its own syntax. It should not be abused,
since consistency is also good, but it will be useful sometimes.
Polymorphism
A field can accept different types, both at API level and for the
user. For example, a video size can be both a whole to accept size
names ("hd720") or individual numbers w and h.
Namespaced sub-structures
If the field "f" is itself a structure made of fields, including "a"
and "b", then several syntaxes can be allowed to set it:
"f.a=5:f.b=3", "f={a=5:b=3}", and optionally just "a=5:b=3" if there
are no field "a" and "b" in the parent structure.
Hooks
Fields can have a type that wraps their real type to perform extra
actions. For example set another field to indicate whether the option
was set by the user or left to default.
Varying options
New options can appear or disappear according to previously set
options, like the number of inputs for a filter. For example, a codec
context could accept "codec=libx264:crf=20" (but not
"crf=20:codec=libx264").
Embedded documentation
Types and fields can contain documentation, more than the simple
string currently in AVOption. An API should be available to build a
single documentation page for a given set of elements, pulling the
necessary dependencies (description for the syntax of fields) only
once, and at various detail levels: short summary for a tooltip or
full text with examples for the web page.
Syntax validation and autocompletion
Parsers should have a dry-mode run where they read the string but do
not set values, to allow applications to check fields early. They
could even return suggested completions or corrections. (This is
somewhat incompatible with varying options, we can live with that.)
Conclusion
This has been a very lengthy exposition. Actually, I believe
implementation would not be that long. Well, longer than text, of course,
but not as gigantic as the explanation suggests. And a lot of steps can be
made incrementally.
IMHO, the result would be both a better design and an enhanced user
experience.
Personal note: if you skimmed through the whole thing and did not find it
completely uninteresting, I would appreciate even short quick feedback,
even "looks interesting, will read more carefully later".
Regards,
--
Nicolas George
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20151204/20dc60cb/attachment.sig>
More information about the ffmpeg-devel
mailing list