The KimKlone: Bride of Son of Cheap Video
A “Smart” Register for 6502
6502 programmers are well aware of how to use a pair of zero-page bytes as a 16-bit indirect pointer. Memory-indirect addressing modes are central pillars of the 65C02's power! But incrementing that memory-resident two-byte pointer is a bit slow, and the alternative (ie, Indirect-Y post-indexing) is fast but has certain limitations. Wouldn't it be great if the pointer itself could just auto-increment?
The KimKlone has a pointer that can auto-increment. Although four '163's, a pair of '244's and some glue logic would've done the job, I saved board space (and amused myself immoderately) by handing the job over to an unused 16-bit counter/timer in one of the KimKlone's VIA's. (VIA, or versatile interface adapter, is Marketing-Speak for a 65C22 multi-function peripheral chip.) The VIA was mapped into zero-page anyway, and there was nothing to prevent it, an I/O device, from playing the role of a couple of bytes of RAM. (But somehow I doubt that the VIA's designers ever dreamed that their counter/timer might find use as an indirect pointer for addressing memory!)
To do this, a program first stores the initial address into the VIA's T2Low and T2High counter registers, exactly as quickly and easily as it might do using a couple of bytes of ordinary zero-page memory. The difference is that after each memory-indirect access (pointing via the VIA, so to speak), all that's needed to single-increment or double-increment the zero-page pair (the counter) is a speedy KimKlone SINC or DINC instruction. The microcode for these merely tickles VIA pin 16, configured as the counter input. That's all that's required to advance the pointer to the next byte or word to be accessed.
SINC and DINC are dramatically faster than conventional code that does the same job. SINC takes 2 cycles, whereas the equivalent would consume at least 8. DINC, also 2 cycles, replaces code that would take at least 13 — not too shabby a boost, for a pointer-increment operation that gets worked to death in the run-time hot-spots of many common algorithms! But there's an even stronger reason why I wanted an auto-increment register, and why I gave it even more capability.
Forth and Hardware-Accelerated NEXT
The KimKlone's ultimate gnarliness, and the most deceitful of all the pranks played on its long-suffering CPU, is the operation called NEXT. A brief digression is in order here.
NEXT is what a computer running the Forth programming language does in order to update its program counter and fetch its next instruction. (In Forth parlance the program counter is known as the Interpretive Pointer, or IP.) CPU's which "speak" Forth as their native code do exist, but a viable alternative is to use a Virtual Machine, an emulation of a Forth computer.
It doesn't take long to figure out that NEXT — an operation that needs to execute prior to every Forth instruction — could be a critical bottleneck when it comes to emulation. So, in order to better support a Forth Virtual Machine, the KK features hardware acceleration for NEXT. The KK's auto-increment register, mentioned above, has the ability to act as the IP for a Fig-Forth Virtual Machine. Incrementing is only part of the process.
The Fig-Forth VM (unlike certain modern Forth variants) uses a threaded interpreter, and the threading is of the kind known as double-indirect. That means that when NEXT fetches a pseudo-instruction pointed to by the IP, what's fetched is a pointer to a pointer to executable host-CPU machine-code. Altogether it amounts to triple indirection, since the process begins by using IP to fetch the pseudo-instruction itself. That may sound horrendous, but a standard 65C02 can emulate NEXT using about a dozen instructions that consume roughly 40 cycles.
KimKlone has a one-byte instruction that executes double-indirect NEXT
in just 9 cycles. KK NEXT expands
two Jump instructions plus the pointer increment mentioned above. The
actual play-by-play is spelled out below for anyone who's nerdy enough
to wonder; the rest of you may choose to skip ahead to the following
KimKlone Accelerated NEXT: just a little shell game
To be clear, here's how
the stage is set. (All addresses are 16 bit):
• The Forth Program Counter, IP, holds the address of the next Forth "instruction" to execute. A fetch via the IP will return the "instruction."
• the "instruction" is just a Code Field Address (CFA) indicating part of the header of the Forth word definition. A fetch via the instruction/CFA will return the so-called Code Field (CF).
• the Code Field is what ends up in the 65xx PC register. It is the address of the machine code routine which emulates the desired Forth operation.
Emulation routines conclude with NEXT; this is how the CPU jumps from one routine to the next. In the case of the KimKlone a one-byte instruction triggers it all. The KK op-code for NEXT is 3Bh — one of the xxxxx011 codes and therefore subject to substitution. When 3B is fetched the alias fed to the CPU is 4Ch, the op-code for a JMP Absolute. (See row 1 of the Table, lower left.)
The CPU continues fetching, expecting the 4C op-code to be followed by a two-byte operand indicating the destination of the jump. Microcode intervenes, and in the next two cycles what gets jammed onto the CPU bus is the value in the IP. (Microcode has hooks into the VIA chip-select logic that can override the usual address decoding and cause T2Low or T2High to be coughed out onto the bus at any time.)
Three cycles have elapsed, and the op-code 3Bh got spoofed into a JMP IP@. But there's no 65xx machine code at IP@, just a Forth instruction/CFA. Now comes the other half of the operation:
In cycle 4 the CPU tries to execute the CFA, but the disconnect between the data buses still prevails. The low-byte of the CFA is copied from the memory bus to one of a pair of 74HC574's that form the KK register known as W (see the diagram, left). Simultaneously in cycle 4 another circuit (not shown) drives the CPU bus with 6Ch — the op-code of the JMP Absolute Indirect instruction.
The CPU continues fetching, expecting the 6C to be followed by a two-byte operand. And what it receives in cycles 5 and 6 are the CFA bytes that were fetched onto the memory bus in cycles 4 and 5! KK uses the bytes of the W register to simulate a FIFO buffer, delaying the CFA bytes so the 6C op-code can be inserted ahead of them in the stream reaching the CPU. Preceded by 6C, the CFA makes perfect sense!
All the rest is routine.
In cycles 7, 8 and 9 the
CPU — free of meddlesome interference at last!
— uses the CFA to fetch the two bytes of the CF
into its PC, thereby effecting a jump to the emulation routine. (The
65C02 wastes one cycle during this process.) Cycle
10 will be the first op-code fetch of the emulation code.
Microcode has finished double-incrementing IP by this
time, and W conveniently retains the CFA, from which other
fields in the word header can be indexed.
Compared with a software-only approach, KimKlone more than quadruples the speed of NEXT. The scheme relies largely on microcode circuitry already included for the 16 MByte memory addressing.
|<Previous Page KK Index Next Page >|