[FFmpeg-devel] Suggestion for a centralized language-tag facility in libavformat
Michael Niedermayer
michaelni
Fri Apr 17 01:21:55 CEST 2009
On Thu, Apr 16, 2009 at 12:15:31PM +0200, cyril comparon wrote:
> Hi
> Hope these patches are ok now.
>
> > also its not true that there are no T code for some, rather they are
> > identical
>
> Can you point me the spot in the standard where this is stated? Thanks.
http://www.loc.gov/standards/iso639-2/faq.html
"In the ISO 639-2 standard, two code sets are provided in which the language codes are the same except for 22 of the 450+ languages that have alternative codes. One set is for bibliographic applications, often referred to as ISO 639-2/B, and the other for terminology applications, referred to as ISO 639-2/T. The choice of the set used must be made clear by exchanging partners prior to information interchange.
"
> However, the solution I suggest does not make the bibliographic vs
> terminologic distinction public (see the updated doxies)
>
> Regards
> Cyril
[...]
> ===================================================================
> --- libavformat/avlanguage.c (revision 0)
> +++ libavformat/avlanguage.c (revision 0)
> @@ -0,0 +1,544 @@
> +/*
> + * Cyril Comparon, Larbi Joubala, Resonate-MP4 2009
> + *
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +#include "avlanguage.h"
> +#include <string.h>
> +#include <assert.h>
> +
> +typedef struct TableEntry {
> + const char iso6392bibl[4]; /* 3-char bibliographic language code as per ISO-IEC 639-2 (always exists) */
> + const char iso6392term[4]; /* 3-char terminologic language code as per ISO-IEC 639-2 (may be empty) */
> + const char iso6391[3]; /* 2-char code of language as per ISO/IEC 639-1, (may be empty) */
> +} TableEntry;
> +
> +static TableEntry languageTable[] = {
> + { "aar", "" , "aa" },
> + { "abk", "" , "ab" },
> + { "ace", "" , "" },
> + { "ach", "" , "" },
> + { "ada", "" , "" },
> + { "ady", "" , "" },
> + { "afa", "" , "" },
> + { "afh", "" , "" },
> + { "afr", "" , "af" },
> + { "ain", "" , "" },
> + { "aka", "" , "ak" },
> + { "akk", "" , "" },
> + { "alb", "sqi", "sq" },
> + { "ale", "" , "" },
> + { "alg", "" , "" },
> + { "alt", "" , "" },
> + { "amh", "" , "am" },
> + { "ang", "" , "" },
> + { "anp", "" , "" },
> + { "apa", "" , "" },
[...]
> +/**
> + * Returns the 2-char ISO639-1 code associated with a given bibliographic or
> + * terminologic 3-char ISO639-2 code, or NULL if the latter is null or invalid,
> + * or has no ISO639-1 representation.
> + * ISO639-1 and ISO639-2 codes are lower case.
> + */
> +const char *av_langISO6392toISO6391(const char *lang);
> +
> +/**
> + * Returns the bibliographic 3-char ISO639-2 code associated with a given 2-char
> + * ISO639-1 code, or NULL if the latter is null or invalid.
> + * ISO639-1 and ISO639-2 codes are lower case.
> + */
> +const char *av_langISO6391toISO6392(const char *lang);
this API is a little unpractical
first it is not extendible because for x language code sets you need x*(x-1)
functions (it really would need 6 already if you honored the distinction
of B and T codes)
second it is rather rigid as one needs to know what kind of code the
source has and this is something probably not always known exactly,
broken files using codes from the wrong table are likely not non existent.
third the system is slow, linear search just hurts my eyes when it can
trivially be avoided.
So let me suggest a better alternative that you can easy change things to
with a little bit of copy & paste and sort and seach & replace
1.
struct lang_code{
char string[4];
uint16_t next;
}
2.
a table split in 3
1. one with all 3 letter B codes
2. one with the 3 letter T codes that are different from the B codes
3. one with the 2 letter codes.
each of the 3 would be sorted alphabetically within itself
the next pointer would point to the next entry representing the same
language.
that is
i= table[i].next
could be used to find all language code that represent the same laguage
3.
a static function finding a string in a sorted subtable using libc bsearch()
and returning its index
4.
enum{
AV_LANG_ISO639_2B,
AV_LANG_ISO639_2T,
AV_LANG_ISO639_1,
}
const char *av_lang_to(const char *lang, enum code_set){
int idx, idx2;
idx= find_lang(table + start[AV_LANG_ISO639_2B], count[AV_LANG_ISO639_2B]);
if(idx<0) idx= find_lang(table + start[AV_LANG_ISO639_2T], count[AV_LANG_ISO639_2T]);
if(idx<0) idx= find_lang(table + start[AV_LANG_ISO639_1 ], count[AV_LANG_ISO639_1 ]);
if(idx<0) return NULL;
idx2=idx;
do{
if(idx2 >= start[code_set] && idx2 < start[code_set] + count[code_set])
return table[idx2].string;
idx2= table[idx2].next;
}while(idx2 != idx);
if(code_set == AV_LANG_ISO639_2T)
return av_lang_to(lang, AV_LANG_ISO639_2B);
return NULL;
}
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Its not that you shouldnt use gotos but rather that you should write
readable code and code with gotos often but not always is less readable
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090417/2bb6c411/attachment.pgp>
More information about the ffmpeg-devel
mailing list