[FFmpeg-devel] a64 encoder 7th round
Bitbreaker/METALVOTZE
bitbreaker
Thu Jan 29 13:47:52 CET 2009
> Surely the addition of P frames does not help in every case, one always can
> just encode unrelated frames but where it does help the smaller P frames
> allow the freed up space and bitrate to be used by something else to improve
> quality ...
In multicol mode i can only improve quality by either having a bigger
charset (impossible, 256 is already the maximum the c64 can handle) or
sending a charset more often. a lifetime of 4 frames however is already
a good tradeoff between quality and framerate. Also sending two screens
and thus making chars 8x4 pixels big would be an option. But all that
would make a single frame much bigger. So i can only take advantage of
that if i can load/decompress/whatsover them faster _even_ in worst
case. So a delta/rle/whatsoever must always, under any circumstances be
faster and have thus no cross over point with plain loading. If not
framerate will drop or plain loading will be enough for same quality and
speed.
> If this picture was the input then the use of the specific colors is optimal
> because that is how it is supposed to look.
> This really very strongly points towards something being wrong with how you
> choose the colors.
Sure, i can interlace #000000 and #ffffff to get a nice gray tone, but
well, try that at 50Hz refresh rate, it is flickering like hell and
really hurts your eyes :-) It even flickers on my TFT when i watch
things in an emulator. Also, there is another reason for transforming
color space: i get smoother gradients by that with colors that mix well.
If i'd stick to colors that represent the original pic best i end up in
a big flickering hell again and gradients need more dithersteps. As for
green for example there is just a normal green and a light green
available. The light green is okay to mix with white, but does flicker
quite a bit when being mixed with the normal green, and mixing any of
the colors with black is resulting in *heavy* flicker. All that doesn't
of course appear on the example jpegs i have shown, but on a real
machine this is a big thing to take care of. Sure, in multicolor mode, i
don't do any interlacing, so here things don't hurt as much, but still
dithering with pixels with high luminance difference doesn't look too
nice either (pixels are just too big and thus single pixel get very
present, as well as they do when they build a vertical or horizontal line).
Here of course i can also just choose a normal gray gradient, but for
better comparision i took that brown/pink/yellow color gradient, so
don't get confused by that, the resulting multicol video can of course
be displayed in plain gray tones. There is luckily black, dark grey
middle gray, light gray and white available in the 16 colors.
But for color modes like the ecmh mode, that is exactly why i do that
transform via HSV and then a lookup in color lines. Sure, colors don't
match the original colors anymore, but it improves the viewing
experience. Also non of the screens are calibrated (nor can be) and each
c64 displays colors a bit different. What you saw so far were excellent
palettes from an emulator.
> You simply have to simulate how something will look and compare that with a
> good comparission function against the input picture.
That is exactly what i am doing, except for the good compare function,
that is where i see potential, but it is not as easy as one might think,
as the attributes of the cells need to be respected and some attributes
apply for smaller, some for bigger cells that again include a multiple
of smaller cells. Thinking all over that is really causing headache at
times.
> iam not sure what these links are supposed to proof, they arent compareing
> error diffusion against ordered dither
> rather you could look at:
The links show how dithering is usually done on a c64. It is state of
the art on that platform to do it like this, that simply is how things
have developed over the past decades and it turned out that this looks
best on a c64. I just can't help. I have no doubt that your examples
look good as well, but on a PC.
Also i don't see any need of proofing. The encoder enables to do
something that was not possible so far, and the results are way better
than i ever expected, and still i did improvements. If someone is able
to do it even better, i am open towards that, but i don't see the
necessarity that i shall proof all that what i have done is the best
thing ever. It is just the best i could achive so far until someone
finds a better solution. But finding a better solution is more than just
throwing a bunch of concepts into discussion and stating that they will
perform better and thus pushing me into defense position by all that.
This really costs me a big amount of valueable time (yes, there is work
to do, i have a family to share time with, and if there is time left i
prefer to spend it into programming on one of my projects than into such
discussions) to reply with long mails and the urge to proof things.
People can feel free to contribute codewise.
> no, not at all, first you could use loops like:
> (i hope my wild guesses for the ASM are understandable)
>
> ldx #<dest
> stx a1+1 ;set highbyte of dest in code
> stx a2+1
> stx a3+1
> stx a4+1
> ldx $de01
> loop
> lda $de00
> a1 sta $0000,x
> a2 sta $0004,x
> a3 sta $0008,x
> a4 sta $000C,x
> ldx $de01
> bne loop
>
> to write 4 equal bytes, and a similar loop to write 4 different ones
I corrected your example a bit:
ldx #<dest
stx a1+1 ;set highbyte of dest in code
stx a2+1
stx a3+1
stx a4+1
;yet only highbytes are set, if you want to change the lowbyte
(0,4,8,c) additional code is needed here. We can only set 8 bit of a 16
bit address at once. With +1 that is the lowbyte (little endian).
ldy #$00
ldx $de00
stx count
lda $de01
loop
ldx #$00
a1 sta $0000,x
a2 sta $0004,x
a3 sta $0008,x
a4 sta $000c,x
inx
cpx count
bne loop
However this would only make sense for a count<4. Also note that $de00
and $de01 need to be read alternatingly. Example:
lda $de00 ;read byte 1
...
lda $de01 ;read byte 2 (after that the internal latch increases and the
next two bytes are offered on $de00/01, there is no way to read byte 1
or 2 again)
...
lda $de00 ;read byte 3
...
lda $de01 ;read byte 4 (latch increases)
Also you need to advance the lowbytes for the 4 sta after each loop,
increment is expensive. Indirect Y indexed adressing may help here, but
sta is even more expensive then, and changing the pointer as well. That
is why i choose this selfmodifying code, as it usually is faster, though
more complex.
Indirect Y indexed looks like this:
lda #lowbyte of destaddr
sta $fb
lda #highbyte of destaddr
sta $fc
ldy #offset
sta ($fb),y
stores at destaddr+offset
inc $fb ;inc lowbyte
bne *+4 ;need to inc highbyte?
inc $fc ;inc highbyte
this does a destaddr++;
Usually you increment the offset however until it reaches 0xff, then it
wraps around and you just increment $fc. Still, it is expensive, each
inc costs 5 cycles, each sta 7 cycles.
You see, things might appear easier and less complex than they are. I'd
also wish there were random access on the network chip buffer (when
sending and doing the checksum or size information for e.g.), but it is
not available. There is just always a bunch of restrictions we got to
live with and work around. That is exactly what makes things so fun and
challenging on that platform :-)
So i'd be happy you trust on my skills in that matter, coz explaining
the whole c64/6502 world will be endless, and i guess we will just loose
the focus in such discussions, leading nowhere but into more confusion
and a big waste of time. Endless was already the time i invested on
making these modes work on c64 at all and to produce encoded material
that suites that modes.
So let's better focus on how to speedup encoding or match pictures even
better in the encoder than on how to display and best load them on a
c64. The ELBG thing for example was a good thing, as it speeds up things
a lot for a similar result. But as for the format of a frame and the
restrictions that apply by the c64 hardware, there is nothing we can
change. As well, i have a good sense to evaluate what looks good on the
real machine, as i can try it on the real machine, and now what aspects
to take care of.
So eventually is there anything codewise that still needs to be fixed to
get first of all the multi modes submitted? Is there real interest to
get it included to ffmpeg? Coz after that i'd focus on the muxer and the
next mode to suit the ffmpeg requirements.
Kindest regards,
Toby
More information about the ffmpeg-devel
mailing list