[MPlayer-dev-eng] [PATCH] ftp charset selection support
Andrew Savchenko
Bircoph at list.ru
Sat Apr 12 20:29:14 CEST 2008
Hi,
On Saturday 12 April 2008 14:30, Alban Bedel wrote:
> Andrew Savchenko <Bircoph at list.ru> wrote:
> > Hello,
> >
> > Currently there are a lot of ftp servers with different
> > charsets and it is a problem to work with these hosts if local
> > encoding differs, only for Russian there are over 10 different
> > encoding and 3 of them are widely used. So I made a patch
> > which allows to select remote server's encoding and recode URL
> > into it.
>
> Just out of interest, how is this problem handled in HTML? Is it
> needed to write the document in the encoding used by the FTP
> server one link to?
Yes, it is required to be written in ftp server's encoding. I have
an ffsearch server on my host and encountered this problem long
time ago...
Originally ftp supported only ASCII charset (rfc959) and there was
no problems, but afterwards file names in national encodings began
to occur, and all this mess begins. Well, UTF-8 will be a fix
after all, but this wouldn't be soon...
> Generally I'm OK with the idea of this patch, but we might want
> to also take a look at the URL/filename encoding problem in
> general. The problem have 2 side, source encoding and
> destination encoding. As source we currently have:
>
> * command line: the terminal encoding will do
> * slave command: terminal encoding or fs encoding?
IMHO if mplayer reads slave commands from stdin, it should use
terminal encoding. If it reads them from file it should use FS
encoding. Maybe it will be better to introduce some input_charset
internal variable to make input-encoding-dependent functions to be
independent of type of input: where we use -slave from file or
normal stdin, or from command line. Perhaps it is worthy to make
this variable configurable by the user.
> * playlist: some format/transport protocol might specify it,
> fallback on local fs encoding?
Perhaps in case of transport protocol it will be reasonable to use
server's encoding as fallback, and for local playlist use local
encoding.
Also, ENCA seems to be a good choice as fallback for this and
previous issue. Currently it is used in subtitle routines and
saves me a lot of time. Only Russian subtitles I own are present
in 5 different encodings and it would be a pain to deal with them
if not ENCA. Of course this approach in not ideal and may fail,
but this is better than nothing and seems to work perfectly in the
most of cases.
> On the destination side only the stream itself can tell what is
> needed. Using a common format, like UTF-8, would be tempting;
Are you talking about network streams or about an internal streams
inside mplayer? Most likely that network streams are the subject.
I dunno exactly, but I'm not sure that all type of streams support
encoding information and all network servers set it properly.
> but AFAIK back and forth conversion is not reliable with all
> encoding.
At least it seems to be reliable for cyrillic and CJK charsets, I
can't tell about the rest.
> The best would then probably be to simply pass the
> encoding along with the URL/filename when opening the stream.
> Suggestions welcome :)
And why not introduce new command line option? Inventing new URL
schemes seems not good from my standpoint, but this may be
implemented optionally. And do not forget about ENCA. We can use
the same approach as in subcp:
enca[:fallback language[:fallbach charset]].
> > +#ifdef USE_ICONV
> > +#include <iconv.h>
> > +#endif /* USE_ICONV */
> > +
> > +/* maximum message length of mp_msg */
> > +#define MSGSIZE_MAX 3072
> > +
> > +const char* filename_recode_common(iconv_t inv_msgiconv,
> > + const char* filename,const char *tocode, const char
> > *fromcode); +
>
> This is going to break if iconv is unavailable.
Will fix it.
> The function
> name could also be better. This function is not specific to
> filenames and _common say very little. mp_recode_to()?
Yes, it is a good choice. And what is more this function becomes a
general i/o routine rather than mp_msg specific stuff. I think it
is better to move this function somewhere else, but where exactly?
I'm in doubt in finding appropriate file to place this function.
> I would suggest making this function static and adding a wrapper
> that hide iconv. That would allow mp_msg to keep using a static
> iconv context and make things simpler for other uses.
Isn't filename_recode() is an appropriate wrapper to mp_recode_to()
(using new proposed function name)? It handles iconv context for
mp_recode_to() currently. I have no objections against making
these functions static, so it will be done. Also I want to extract
storage buffer for recoded_name from mp_recode_to() in order to
make this function thread safe.
> > -/* maximum message length of mp_msg */
> > -#define MSGSIZE_MAX 3072
> > -
>
> Why do you move this define out of mp_msg.c?
It is a remnant from earlier code changes. Will be removed.
> > +/* defined in stream_ftp.c */
> > +#ifdef USE_ICONV
> > +extern char *ftp_charset;
> > +#endif /* USE_ICONV */
>
> No need to put extern declarations under #ifdef.
Ok, but I'm surprised by this language feature ;-).
> > +#ifdef USE_ICONV
> > +#include <iconv.h>
> > +#include <string.h>
>
> The need for iconv should be abstracted away.
>
> > +#ifdef USE_ICONV
>
> Get rid of the #ifdef.
ok
> > + //recode filename if needed
> > + if (ftp_charset) {
> > + mp_msg(MSGT_OPEN, MSGL_V, "[ftp] recoding filename %s
> > from %s to %s\n", + p->filename,
> > get_term_charset(), ftp_charset); +
> > + iconv_t iconv_filename = (iconv_t)(-1);
> > + /* There is no guarantee that filename_recode_common
> > wouldn't + * be called during playback. Thus buffer must
> > be copied to + * p->filename with appropriate memory
> > reallocation. */ + char *buf =
> > filename_recode_common(iconv_filename, +
> > p->filename, ftp_charset, get_term_charset());
>
> Don't use declarations in middle of code blocks.
Will be enough to place declaration just after "{" clause after
"if" statement?
> > + size_t len = strlen(buf) + 1;
> > + p->filename = realloc(p->filename, len);
> > + memcpy(p->filename, buf, len);
> > + }
> > +#endif /* USE_ICONV */
>
> This is leaking memory.
Why?! This is realloc(), not malloc(). It will reuse (or free() and
malloc()) memory of p->filename. At the end, close_f() will free
this memory inside m_struct_free() call in the same way as it will
be freed in the case of original p->filename pointer. From the
point of m_option_free() it should not be any difference between
handling the old and the new pointer.
> > Currently user must specify encoding on his/her own, but in
> > several cases it is possible to autodetect encoding:
> > 1) Negotiation with server using LANG feature (rfc2640) if
> > server supports it.
> > 2) Alternatively try to server's guess encoding by
> > descendingly parsing directories in URL (starting from /) in
> > order to identify encoding using ENCA.
> >
> > This will be implemented later, I hope.
>
> Patch welcome :)
I'll work on patches later. Today I feel myself too tired...
It would be great to hear from you some words about (possible)
memory leak and mp_recode_to() issues before the new patch will be
made.
Well, I have look over some RFC's concerning ftp, and some real
servers... I have no idea what ftp server supports LANG extension.
Even my favourite vsftpd doesn't do this. Anyway I can use
reference code from lftp.
Also I have found another interesting feature which will be
usefull: ff server supports FEAT and OPTS UTF8 ON, the last
command can be used to established utf8 filename transmission for
sure; vsftpd do this by default.
With best regards,
Andrew
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/mplayer-dev-eng/attachments/20080412/d4702ac7/attachment.pgp>
More information about the MPlayer-dev-eng
mailing list