[NUT-devel] Info packets in NUT stream (spec bugs?)

Tue Nov 21 22:35:57 CET 2006

Hi

On Tue, Nov 21, 2006 at 02:32:14PM -0500, Rich Felker wrote:
> On Tue, Nov 21, 2006 at 06:32:03PM +0100, Michael Niedermayer wrote:
> > Hi
> > 
> > On Mon, Nov 20, 2006 at 08:57:36PM -0500, Rich Felker wrote:
> > [...]
> > > > > I've never actually tested it, but AFAIK libnut is completely safe and 
> > > > > non-breaking on this issue.
> > > > 
> > > > theres at least one issue with random start timestamps
> > > > try 1e9999 as start timestamp and tell me if that worked :)
> > > > while the fileformat of course has no problem with arbitrary integers,
> > > > implementations will ...
> > > > making it clear that 0 should be used as start where possible reduces
> > > > the issue but doesnt solve it
> > > 
> > > I think it's clear that if you use idiotic time values you'll have
> > > problems with implementation support. IMO it's fine to say just that
> > > implementations SHOULD NOT go out of their way to support excessively
> > > large values for any field in a NUT file.
> > 
> > what is excessively large? whats idiotic? thats not a good way to specify
> > the valid range of a value
> > >32bit is idiotic for many people iam pretty sure, still its not enough
> > if your input data is in nanosecond precission ...
> > 
> > and its neither reasonable to assume that everyone has to spend an hour
> > per field to guess what range of values would have to be supported to handle
> > all non idiotic cases
> 
> My idea is that what's idiotic changes with time. That's why we use
> vlc rather than fixed-size fields. Unlike other potential areas of
> abuse in the spec, I don't see any realistic issue with people
> intentionally choosing initial timestamps that will cause trouble with
> some implementations. Generally the only things people would choose
> for starting timestamps would be 0, the end timestamp of another file,
> or the current unix time (seconds since the epoch). All of these will
> fit ok in 64bit as long as a sane timebase is used.

for rtp the start timestamp is recommanded to be random() IIRC and for
transcoding people might choose to keep the start timestamp also when
taking some seconds since x and converting that to a "insane" timebase
problems will happen ...

> 
> > > It is always possible via linear search. If the demuxer SHOULD NOT
> > > search for them then we should not go out of our way to make it easy
> > > to search... Just my 2¢...
> > 
> > well there really are 2 cases IMHO
> > A. midstream info packets are not allowed in normal nut files
> > B. midstream info packets are allowed in normal nut files
> > 
> > for A i agree that the pointers and repeating shouldnt be required, there may
> > be other reasons though why repeating the info makes sense ...
> > 
> > for B i dont agree, simply because if info is there, then there are cases
> > where the user will want to have that info, think of some capture of odeds
> > radio stream, its not unlikely to think that the user would want to seek to
> > a specific song (she knows the song title but not the time to seek to)
> 
> Arrg, this is what I was saying way back about info streams and I got
> flamed to death. Anyway such file has no index already, so it's not
> meant to be searchable by index, and searching by chapter _name_ does
> not work with binary search so linear search is the natural
> requirement anyway.

it is possible to extend info packets so that O(log n) search for
names can be done ill explain how (and no iam not saying i actually propose
doing that, its just a random thought)
1. add a hash table to every info packet which contains pointers to info
   packets for all names in the X previous info packets
2. X is the largst power of 2 which divides n which is the number of the
   current info packet, (n=5 -> X=1, n=6 -> X=2, n=8 -> X=8)
3. add a pointer to the Xth previous info packet

the space requirement for this is O(n log n) for n info packets
now to search for your favorite name, start with the last info packet
search its hash table, if theres a match you have your name, if not follow
the pointer from 3. and retry (X must be at least twice as large after each
retry so this is guranteed to terminate after log n steps)

another random thought
100 music videos with midstream info back pointers need 100 seeks to read all
with 10ms per seek thats 1 second, if we assume 3min playtime per music video
the whole would be 300min and at a realistic bitrate that will take much
longer to search without the pointers

and yet another random tought
if we now repeat the last X different info packets with each info packet
similarely to the hash table mess then we could read all with log n seeks
(and n log n space instead of n for n info packets) while the complexity
would be very low on the demuxer side, just linear search + follow the
pointer

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

In the past you could go to a library and read, borrow or copy any book
Today you'd get arrested for mere telling someone where the library is