[NUT-devel] [nut]: r613 - docs/nutissues.txt

Tue Feb 12 21:24:01 CET 2008

Michael Niedermayer <michaelni at gmx.at> writes:

> On Tue, Feb 12, 2008 at 07:17:07PM +0000, Måns Rullgård wrote:
>> Michael Niedermayer <michaelni at gmx.at> writes:
>> 
>> > On Tue, Feb 12, 2008 at 07:37:53PM +0100, Alban Bedel wrote:
>> >> On Tue, 12 Feb 2008 17:57:03 +0100
>> >> Michael Niedermayer <michaelni at gmx.at> wrote:
>> >> 
>> >> > On Tue, Feb 12, 2008 at 05:47:13PM +0100, Alban Bedel wrote:
>> >> > > On Tue, 12 Feb 2008 16:00:10 +0100 (CET)
>> >> > > michael <subversion at mplayerhq.hu> wrote:
>> >> > > 
>> >> > > > Modified: docs/nutissues.txt
>> >> > > > ==============================================================================
>> >> > > > --- docs/nutissues.txt	(original)
>> >> > > > +++ docs/nutissues.txt	Tue Feb 12 16:00:09 2008
>> >> > > > @@ -162,3 +162,8 @@ How do we identify the interleaving
>> >> > > >  A. fourcc
>> >> > > >  B. extradata
>> >> > > 
>> >> > > I would vote for this with a single fourcc for pcm and a single
>> >> > > fourcc for raw video. Having infos about the data format packed in
>> >> > > the fourcc is ugly and useless. That just lead to inflexible lookup
>> >> > > tables and the like. 
>> >> > 
>> >> > > Instead we should just define the format in a way similar to what
>> >> > > mp_image provide for video (colorspace, packed or not, shift used
>> >> > > for the subsampled planes, etc). That would allow implementations
>> >> > > simply supporting all definable format, instead of a selection of
>> >> > > what happened to be commonly used formats at the time the
>> >> > > implementation was written.
>> >> > 
>> >> > The key points here are that
>> >> > * colorspace/shift for subsampled planes, etc is not specific to RAW,
>> >> > its more like sample_rate or width/height
>> >> 
>> >> Sure, but when a "real" codec is used, it's the decoder business to tell
>> >> the app what output format it will use. NUT can provide infos about the
>> >> internal format used by the codec, 
>> >
>> > Only very few codecs have headers which store informations about
>> > things like shift for subsampled planes. Thus if this information
>> > is desired it has to come from the container more often than
>> > not. If its not desired then we also dont need it for raw IMHO.
>> 
>> With compressed video, the decoder informs the caller of the pixel
>> format.  With raw video, this information must come from the
>> container, one way or other.
>
> Yes, I agree for pixel format.
> But the decoder often does not know the fine details. Like as
> mentioned "shift for subsampled plane" or the precisse definition of
> YUV or if it uses full luma range or not. MPEG stores these yes, but
> for example huffyuv does not. So it would make some sense if this
> information could be stored for non raw as well.

Point taken, and I agree being able to transmit this information could
be useful.  Using extradata is obviously out of the question, which
leaves either stream headers or info packets.

>> >> > > On a related subject, it might also be useful to define the channel
>> >> > > disposition when there is more than one. Mono and stereo can go by
>> >> > > with the classical default, but as soon as there is more channels
>> >> > > it is really unclear. And imho such info could still be usefull
>> >> > > with 1 or 2 channels. Something like the position of each channel
>> >> > > in polar coordinate (2D or 3D?) should be enouth.
>> >> > 
>> >> > I agree
>> >> > What about that LFE channel thing?
>> >> 
>> >> I was thinking about simply setting the distance to 0, however a flag
>> >> for "non-directional" channels might be better.
>> >
>> > This is wrong, LFE is not about direction but about the type of speaker.
>> > LFE stands for "Low-frequency effects".
>> > If id move a other random speaker at disatnce 0 and the LFE one out and
>> > switch channels it wont sound correct ...
>> >
>> >> 
>> >> > And where do we put this info, The stream header seems the logic
>> >> > place if you ask me ...
>> >> 
>> >> I agree, this is essential information for proper presentation it
>> >> definitly belong there.
>> >
>> > Good, now we just need to agree on some half sane way to store it.
>> > for(i=0; i<num_channels; i++){
>> >     x_position                  s
>> >     y_position                  s
>> >     z_position                  s
>> >     channel_flags               v
>> > }
>> >
>> > CHANNEL_FLAG_LFE             1
>> >
>> > seems ok?
>> 
>> I'm not convinced this is the right way to go.  Consider a recording
>> made with several directional microphones in the same location.  Using
>> spherical coordinates could be a solution.
>
> The above was intended to specify the location of the speakers not
> microphones.

I'm having a hard time imagining a player moving my speakers around
depending on the file being played.

> And spherical coordinates would just drop the distance, thats the same
> as setting the distance to 1 and storing that as xyz.

Spherical coordinates without radius needs only two fields.

> Actually the main reason why i didnt use spherical is that with integers
> theres a precission to decide on or you end up with rationals. And this
> somehow starts looking messy ...

I don't see any fundamental difference.  If restricted to integer
coordinates, an arbitrary point can be described only with a certain
precision, regardless of coordinate system.

>> Whatever the coordinate system, the location and orientation of the
>> listener must be specified, even if there is only one logical choice.
>
> of course
> right_position               s
> forward_position             s
> up_position                  s
>
> And
> "the listener is at (0,0,0), (1,0,0) is right, (0,1,0) is forward,
> (0,0,1) is up"

You're forgetting the measurement unit, i.e. metres, feet, etc.

-- 
Måns Rullgård
mans at mansr.com