[FFmpeg-devel] [PATCH] WMA Voice postfilter
Vitor Sessak
vitor1001
Mon Apr 12 01:48:07 CEST 2010
Ronald S. Bultje wrote:
> Hi,
>
> On Thu, Apr 1, 2010 at 11:52 AM, Reimar D?ffinger
> <Reimar.Doeffinger at gmx.de> wrote:
>> On Thu, Apr 01, 2010 at 11:25:25AM -0400, Ronald S. Bultje wrote:
>>> I'm probably asking silly questions here, but can I allocate aligned
>>> memory on the stack? Or do they have to be in the struct?
>> You better avoid it on the stack, it won't work reliably on all
>> supported configurations.
>
> Thanks for the advice, this patch is tested with yasm and does not
> crash, so I think alignment is OK now. By putting synth buffer in the
> context, I was also able to remove one memcpy(). (One or two more of
> those are possible in the original code, previously already suggested
> by Vitor, so I might go ahead and do that in a future patch also.)
Some comments:
> +static int kalman_smoothen(WMAVoiceContext *s, int pitch,
> + const float *in, float *out, int size)
> +{
> + int n;
> + float optimal_gain = 0, dot;
> + const float *ptr = &in[-FFMAX(s->min_pitch_val, pitch - 3)],
> + *end = &in[-FFMIN(s->max_pitch_val, pitch + 3)],
> + *best_hist_ptr;
> +
> + /* find best fitting point in history */
> + do {
> + dot = ff_dot_productf(in, ptr, size);
> + if (dot > optimal_gain) {
> + optimal_gain = dot;
> + best_hist_ptr = ptr;
> + }
> + } while (--ptr >= end);
> +
> + if (optimal_gain <= 0)
> + return -1;
> + dot = ff_dot_productf(best_hist_ptr, best_hist_ptr, size);
> + if (dot <= 0) // would be 1.0
> + return -1;
> +
> + if (optimal_gain <= dot) {
> + dot = 0.5 / (0.5 + 0.3 * optimal_gain / dot); // 0.0625-1.0000
This can be done with one less division:
dot = (0.5 * dot) / (0.5 * dot + 0.3 * optimal_gain); // 0.0625-1.0000
> +/**
> + * Derive denoise filter coefficients from the LPCs.
> + */
> +static void calc_coeffs(WMAVoiceContext *s, float *lpcs,
> + int fcb_type, float *coeffs, int remainder)
I think this would be better named "calc_input_response()"
> + /* Create frequency power spectrum of speech input (i.e. RDFT of LPCs);
> + * we shift each value to an offset +1 so we don't have to create temp
> + * values. */
> + ff_rdft_calc(&s->rdft, lpcs);
> +#define log_range(x, assign) do { \
> + float tmp = log10f(assign); lpcs[x] = tmp; \
> + max = FFMAX(max, tmp); min = FFMIN(min, tmp); \
> + } while (0)
> + for (n = 1; n < 0x40; n++)
> + log_range(n + 1, lpcs[n * 2] * lpcs[n * 2] +
> + lpcs[n * 2 + 1] * lpcs[n * 2 + 1]);
> + log_range(0x41, lpcs[1] * lpcs[1]);
> + log_range(1, lpcs[0] * lpcs[0]);
> +#undef log_range
I think that readability could be improved using a single temp value
(float last_coef = lpcs[1]) and avoiding the shifts.
> + /* Now, use this spectrum to pick out these frequencies with higher
> + * (relative) power/energy (which we then take to be "not noise"),
> + * and set up a table (still in lpc[]) of (relative) gains per frequency.
> + * These frequencies will be maintained, while others ("noise") will be
> + * decreased in the filter output. */
> + irange = 64.0 / range; // so irange*(max-value) is in the range [0, 63]
> + gain_mul = range * (fcb_type == FCB_TYPE_HARDCODED ? (5.0 / 13.0) :
> + (5.0 / 14.7));
> + angle_mul = gain_mul * (8.0 * M_LN10 / M_PI);
> + for (n = 1; n <= 0x41; n++) {
> + float pow;
> +
> + idx = FFMAX(0, lrint((max - lpcs[n]) * irange) - 1);
> + pow = wmavoice_denoise_power_table[s->denoise_strength][idx];
> + /* note how we shift it back to offset +0 here */
> + lpcs[n - 1] = angle_mul * pow;
> +
> + idx = (pow * gain_mul - 0.0295) * 70.570526123;
> + if (idx > 0x7F) {
> + coeffs[n] = wmavoice_energy_table[127] *
> + powf(1.0331663, idx - 127);
> + } else
> + coeffs[n] = wmavoice_energy_table[FFMAX(0, idx)];
I think a comment saying that 70.570526123 == 1./log10(1.0331663) would
be useful (and the if() is just a failback for some value that didn't
fit in the table).
> + /* calculate the Hilbert transform of the gains, which we do (since this
> + * is a sinus input) by doing a phase shift (in theory, H(sin())=cos()). */
> + ff_dct_calc(&s->dct, lpcs);
> + ff_dct_calc(&s->dst, lpcs);
I'd say the point of doing the Hilbert transform (that should be said in
some comment) is that
Hilbert_Transform(RDFT(F)) == Laplace_Transform(F)
which is what you need to do a Wiener filter (unless I'm missing something).
> + /* Project values as coefficients - note how this shifts them back
> + * from offset=1 to offset=0. */
More importantly, before you calculated the norm of the coefficient
indexes (which discards all phase data). Now, here, you are putting back
a (different) phase...
> + idx = 255 + av_clip(lpcs[64], -255, 255);
> + coeffs[0] = coeffs[1] * s->cos[idx];
> + idx = 255 + av_clip(lpcs[64] - 2 * lpcs[63], -255, 255);
> + coeffs[1] = coeffs[0x41] * s->cos[idx];
> + for (n = 0x3F;; n--) {
> + idx = 255 + av_clip(-lpcs[64] - 2 * lpcs[n - 1], -255, 255);
> + coeffs[n * 2 + 1] = coeffs[n + 1] * s->sin[idx];
> + coeffs[n * 2] = coeffs[n + 1] * s->cos[idx];
> +
> + if (!--n) break;
> +
> + idx = 255 + av_clip( lpcs[64] - 2 * lpcs[n - 1], -255, 255);
> + coeffs[n * 2 + 1] = coeffs[n + 1] * s->sin[idx];
> + coeffs[n * 2] = coeffs[n + 1] * s->cos[idx];
> + }
> +
> + /* fix scale and tilt correction */
> + ff_rdft_calc(&s->irdft, coeffs);
... and doing the inverse RDFT.
> + memset(&coeffs[remainder], 0, sizeof(coeffs[0]) * (0x80 - remainder));
> + if (s->denoise_tilt_corr) {
> + float tilt_mem = 0;
> +
> + coeffs[remainder - 1] = 0;
> + ff_tilt_compensation(&tilt_mem,
> + -1.8 * tilt_factor(coeffs, remainder - 1),
> + coeffs, remainder);
> + }
> + sq = (1.0 / 64.0) * sqrtf(1 / ff_dot_productf(coeffs, coeffs, remainder));
> + for (n = 0; n < remainder; n++)
> + coeffs[n] *= sq;
> + ff_rdft_calc(&s->rdft, coeffs);
> +}
This last RDFT does not really belong to this function. It really would
make much more sense in the caller, since it will read like
ff_rdft_calc(&s->rdft, synth_pf);
ff_rdft_calc(&s->rdft, coeffs);
for(i = 0; i < N ; i++)
[complex multiplication synth_pf[i] *= coeffs[i]];
ff_rdft_calc(&s->irdft, synth_pf);
Which makes it clear that we are convolving synth_pf with coeffs.
> + double i_lsps[MAX_LSPS];
i_?
Besides these comments, patch ok.
-Vitor
More information about the ffmpeg-devel
mailing list