[FFmpeg-devel] [RFC][PATCH] ffmpeg: add option to transform metadata using iconv
Nicolas George
george at nsup.org
Thu Sep 24 20:37:54 CEST 2015
Le tridi 3 vendémiaire, an CCXXIV, James Darnley a écrit :
> As far as I understand the iconv API, it doesn't appear to do this for
> you. So adding this feature would require writing code to handle more
> errors returned from the iconv() function. That means a more
> complicated argument handling structure is needed.
>
> I don't mind trying to write this but it would be better to do it behind
> the API you propose.
Of course. Actually, it is already there in the API, although I am not quite
satisfied because it can not be set as an option.
> I will help you with it as best I can because I
> seem to have involuntarily volunteered myself.
I need some feedback to know if this kind of API is useful in FFmpeg (other
people are welcome to give advice too!), and to know if the actual API I
propose is suitable for various needs. But as for writing the code, I expect
it to be quite straightforward.
The question where I most need feedback is this: shall I make an API that
allows to convert from any encoding to any encoding, or an API that can
convert from any encoding to UTF-8 and from UTF-8 to any encoding?
There are pros and cons for each case. UTF-8 to/from anything is enough for
the needs of any sane program, and makes the handling of the replacement
character easier (because it can be specified in UTF-8 directly). OTOH,
any-to-any is more generic.
> I don't know what to say here. I know the encodings needed for iconv
> because I arrived at them by brute force. I wrote a short Lua script to
> iterate over a list of encodings supported by my iconv and arrived at
> this answer. The command line tool called iconv is too clever for this
> because it returns an error when it can't convert. As for ending in
> GBK, it is what the script told me.
Could you share the script and enough input to run it and reproduce the
results?
> This feature would not work if there was a misinterpretation in the
> middle. As you say that would need A->B and C->D where B != C. Perhaps
> this is why my solution isn't perfect, because there should be an
> assumption in the middle.
>
> I could rework my code to allow for assumptions in the middle. My case
> would then use "CP1252,UTF-8,UTF-8,GBK" as an argument.
I must say, I do not like your approach very much because it manipulates
text encoding in the middle of the program. All strings inside the program
should be in UTF-8.
I can propose this: add an option "metadata_text_encoding" to
AVFormatContext. If it is set on a demuxer, the demuxing framework uses it
to convert from it to UTF-8; and similarly, if it is set on a muxer, the
muxing framework uses it to convert from UTF-8 to it.
Then we can have a special syntax for it to specify bogus conversions.
Possibly: -metadata_text_encoding "[CP1252>UTF-8]GBK" to specify that the
text must first be converted from CP1252 to UTF-8 then considered to be GBK
(and converted to UTF-8). (Well, I consider the feature evil, so I will
probably not volunteer to implement it, but I will not oppose as long as it
can not be triggered too easily by an unsuspecting user.
What do you think of it?
Regards,
--
Nicolas George
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20150924/9233caee/attachment.sig>
More information about the ffmpeg-devel
mailing list