[FFmpeg-devel] [PATCH] RealAudio 14.4K encoder

Sun May 23 20:52:30 CEST 2010

On Sun, 2010-05-23 at 00:51 +0200, Michael Niedermayer wrote:
> On Sat, May 22, 2010 at 07:33:13PM +0200, Francesco Lavra wrote:
> > > > Floating point, with orthogonalization, with gain quantization done the
> > > > fast way
> > > > stddev:  818.14 PSNR: 38.07 bytes:   200320/   200334
> > > > stddev:  986.48 PSNR: 36.45 bytes:   144000/   144014
> > > > stddev:  811.68 PSNR: 38.14 bytes:   745280/   745294
> > > > stddev: 3762.86 PSNR: 24.82 bytes:  5370880/  5370880
> > > > stddev: 2635.10 PSNR: 27.91 bytes:   814400/   814400
> > > > stddev: 3647.02 PSNR: 25.09 bytes:   432640/   432640
> > > > stddev: 2862.79 PSNR: 27.19 bytes:  1741440/  1741440
> > > 
> > > some files loose quality by enabling orthogonalization, thats odd but
> > > possible.
> > > assuming there is no bug in the orthogonalization then you could try to
> > > run the quantization with both codebooks found with and without
> > > orthogonalization, this should always be better. And or avoid codebook
> > > choices that would need quantization factors that are far away from
> > > available values
> > 
> > The first 3 files are uncompressed recordings, while the last 4 files
> > are RealAudio decoded samples, so statistics for the latter probably are
> > not that meaningful.
> > If you are wondering why PSNR values are so low for the last 4 files
> > (ideally, they should approach infinity), the problem is that I couldn't
> > come up with an exact method of calculating the frame energy (assuming
> > one exists, because from the current decoder output I'm not sure we can
> > reconstruct the encoded stream exactly as it was), so having an energy
> > value different form what it ought to be influences negatively the
> > codebook searches.
> 
> how far away is the correct value from what you choose?
> (if its just +-1 maybe bruteforce search might be an option)

I chose the formula to calculate the energy such that in most cases it
is either the correct value or +-1. But a brute force approach on the
energy value would be extremely slow: you have to re-encode the whole
frame as many times as the number of energy values you want to try.
Also, there are the LPC coefficients, whose values don't correspond
exactly to those of the original encoded stream, so I don't know how
much improvement a brute force approach on the energy value could bring.
Last but not least, yesterday a made some mistakes getting the PSNR
values, messing up with the shift and skip arguments to tiny_psnr: now
the results are far better :) see below.

> orthogonalization is a win and should be done of course.
> the 5 entry quantization needs work, there should be no quality
> loss. What about 10 or 20 entries?

Below are the correct results (a bug in the floating point code has been
fixed too, and PSNR has benefited from that). As you can see, the fast
gain quantization is as good as the brute force one, so there is no need
to worry about a mixed approach.

Fixed point, without orthogonalization, with brute force gain
quantization
stddev:  424.27 PSNR: 43.78 bytes:   200000/   200320
stddev:  263.80 PSNR: 47.90 bytes:   143680/   144000
stddev:  380.05 PSNR: 44.73 bytes:   744960/   745280
stddev:  854.26 PSNR: 37.70 bytes:  5370560/  5370880
stddev:  472.50 PSNR: 42.84 bytes:   814080/   814400
stddev:  548.55 PSNR: 41.54 bytes:   432320/   432640
stddev:  428.05 PSNR: 43.70 bytes:  1741120/  1741440

Floating point, without orthogonalization, with brute force gain
quantization
stddev:  422.45 PSNR: 43.81 bytes:   200000/   200320
stddev:  268.66 PSNR: 47.75 bytes:   143680/   144000
stddev:  381.76 PSNR: 44.69 bytes:   744960/   745280
stddev:  851.79 PSNR: 37.72 bytes:  5370560/  5370880
stddev:  486.95 PSNR: 42.58 bytes:   814080/   814400
stddev:  568.53 PSNR: 41.23 bytes:   432320/   432640
stddev:  436.89 PSNR: 43.52 bytes:  1741120/  1741440

Floating point, with orthogonalization, with brute force gain
quantization
stddev:  210.49 PSNR: 49.86 bytes:   200000/   200320
stddev:  201.69 PSNR: 50.24 bytes:   143680/   144000
stddev:  200.49 PSNR: 50.29 bytes:   744960/   745280
stddev:  784.77 PSNR: 38.43 bytes:  5370560/  5370880
stddev:  422.10 PSNR: 43.82 bytes:   814080/   814400
stddev:  484.69 PSNR: 42.62 bytes:   432320/   432640
stddev:  392.32 PSNR: 44.46 bytes:  1741120/  1741440

Floating point, with orthogonalization, with gain quantization done the
fast way
stddev:  210.14 PSNR: 49.88 bytes:   200000/   200320
stddev:  202.50 PSNR: 50.20 bytes:   143680/   144000
stddev:  196.30 PSNR: 50.47 bytes:   744960/   745280
stddev:  786.06 PSNR: 38.42 bytes:  5370560/  5370880
stddev:  422.29 PSNR: 43.82 bytes:   814080/   814400
stddev:  495.53 PSNR: 42.43 bytes:   432320/   432640
stddev:  396.24 PSNR: 44.37 bytes:  1741120/  1741440

Floating point, with orthogonalization, with gain quantization done
taking into account the rounding error of the 5 best entries
stddev:  210.49 PSNR: 49.86 bytes:   200000/   200320
stddev:  201.69 PSNR: 50.24 bytes:   143680/   144000
stddev:  200.05 PSNR: 50.31 bytes:   744960/   745280
stddev:  786.22 PSNR: 38.42 bytes:  5370560/  5370880
stddev:  419.41 PSNR: 43.88 bytes:   814080/   814400
stddev:  497.65 PSNR: 42.39 bytes:   432320/   432640
stddev:  395.23 PSNR: 44.39 bytes:  1741120/  1741440

I'd say we should go for the fast gain qantization, and in attachment is
an cleaned up patch for it, with code duplication removed.
I still have to try the iterative method, will do that in a few days I
think.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 05_ra144enc.patch
Type: text/x-patch
Size: 22162 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100523/102631f8/attachment.bin>