[FFmpeg-devel] [RFC] Built-in documentation API

Mon Aug 24 06:38:42 EEST 2020

On 2020-08-23 08:21, Nicolas George wrote:
> Since the idea of documentation built in the libraries seems popular, I
> have tried to outline an API to access it.…
>
> See the attached file […`documentation.c` omitted…].
>
> The idea would be to have the build system convert the documentation
> into a C file with initialization for one or several AVDocNode
> structures.
>
> Note that since all this must be in .rodata, we must get it right on the
> first try, because of inter-libraries compatibility issues.
> …The most important question IMHO is which format we adopt for the doc in
> the library.…

Text is superficially simple, but in a multicultural world, text is in 
reality very complex.

All text strings should have a character encoding defined. I suggest 
that all the text fields be specified by the format as UTF-8 encoded. No 
need to offer other options.

All human-readable strings should have their human language described. 
Either define in the format that the string is written in the English 
language (and decide if you want to require US or UK spelling), or add 
language attributes to each text string identifying the human language 
in which it is written (suggest using BCP 47[1] tags), or add a single 
language attribute for the whole AVDocNode and require that all text 
strings in that node be written in the same human language.

Assuming UTF-8 encoding, is `char *` the right data type?  Does your 
profile of the C language offer something more precisely targeted? 
Something analogous to `std::string` of C++, perhaps?

Does this format allow documentation in multiple languages at the same 
time? Might you ever want to ship an FFmpeg binary which has 
documentation in, say, both English and Chinese?

Consider if you want to limit some text fields to a subset of UTF-8. For 
instance, are the strings in the "Name" field limited to the ASCII 
subset of UTF-8?  Are emoji permitted?

What is the line wrapping model of these text objects?  Are line endings 
encoded with '\n' or '\r' or '\r\n' or any?  What effect does '\t' have? 
What about formfeed, or page eject?

Does this architecture permit markup which defines tables?  How does it 
display such markup?

This structure only stores marked-up text. Does that mean it is 
impossible to store diagrams and pictures in the documentation? Are you 
comfortable giving up that expressive power?

Will the overall documentation system be limited to the expressive power 
of this mechanism?  If not, then when you define the document compiler 
which generates this format, you will need to define what gets done with 
parts of the mechanism which this architecture cannot support. Are they 
thrown out? Simplified somehow?

Does this structure permit markup with font choices?  If the markup 
calls for heading style, or italic, or preformatted style, how will the 
display system invoke the correct fonts?

Font choices are also part of correctly displaying character style for 
the language. The Unicode standard encodes Traditional Chinese, 
Simplified Chinese, Japanese, and parts of Korean and Vietnamese with 
unified Han codepoints. The text display uses a font choice to get the 
correct character style for the language. Do you want to permit 
documentation to appear in these languages with the correct character 
style?  How will that happen?

How will this API display text?  Will it emit plain text with no 
markup?  Will it emit the internal markup language used by this data 
structure (eg "FFMTHML") and not attempt to format it?

One risk of this architecture is that you are faced with a choice of 
making a mechanism which is well-defined but limited (e.g. to English 
and ASCII), or well-defined and terribly complex to define and to 
implement, or simply designed and implemented, but poorly defined 
outside of a core usage pattern. What is the value you are trying to 
unlock with this architecture?  How will you ensure this architecture 
gives a positive return (value) on investment (design and implementation 
and content authoring)?

[1] https://tools.ietf.org/html/bcp47