LAUGHTON ELECTRONICS |
||
|
|
||
|
||
|
|
||
|
Cheap Video is a means of outputting video without the need for DMA hardware or a Video Controller chip. Instead what's used is programmed I/O: the video is generated as the output of an actual program running on the computer. This would ordinarily be impossible (due to the very high data rate required), but Cheap Video has a trick up its sleeve, something devious done in hardware to fool the CPU. But first let's look at the software. The video program takes the form of a loop within a loop. The
innermost loop does the actual video output. Each iteration outputs one
row of pixels, with timing to match one horizontal sweep of the CRT
monitor. The CPU begins by doing a Jump To Subroutine (JSR) to the
first address in a portion of memory defined as the video buffer.
Rather than the video data,
what it "sees" there is a subroutine composed almost entirely of NOPs.
The CPU executes all the NOPs and then a Return From Subroutine (RTS);
all this occurs simultaneously with one horizontal sweep of the CRT
monitor.
Following the RTS the CPU advances the buffer address (for the next
JSR) and repeats the loop; there's no exit until there have been enough
scans (horizontal lines) to refresh the entire screen from top to
bottom. (This description applies to bit-mapped displays,
the simplest case. If a Character Generator ROM is used then an intermediate
level of looping is required for the multiple pixel rows that form each
character.) The devious hardware hoax mentioned earlier is what causes the CPU to see memory as containing NOPs rather than what's really there (the video data). Here's how it's done, and why: The usual premise of computer operation is that when the CPU sends out an address, memory will faithfully reply with the byte stored at that address. But with Cheap Video a major connection — that between the data buses — gets temporarily severed. This lets Cheap Video "lie" about what's in memory. (See the diagrams on the left, Business as Usual and Cheap Video.) During each scan, the bytes fetched onto the memory data bus do not get relayed back to the CPU's data bus. Instead, the bytes of memory data (ie; the characters we needed to fetch) get merrily shipped off to the video display. Meanwhile, some Cheap Video flimflam logic feeds the CPU bus a brazen fabrication, a persistent NOP (and eventual RTS) which appear to reside at the addresses actually containing data! The rationale is as follows. Generating a video display requires that the microcomputer's memory be read byte-by-byte in a rapid sequence. Reading 32 bytes in a row is how you generate a video display which is 32 characters wide, for example, and the high speed is necessary to match the horizontal scan of the monitor's CRT. Don Lancaster realized that a microprocessor is easily capable of reading 32 or more bytes in a row, even though conventional processing of memory variables can only proceed sporadically and in much smaller chunks. But prolonged sequences of memory reads do occur as the chip fetches the bytes of its program. In fact, broadly speaking we can say that, if there are no branches in a program and no accesses to memory variables, sequential reads for instruction fetching will continue indefinitely. Therefore NOPs yield the desired "scan" behavior: an extended sequence of back-to-back reads of ascending memory locations. The CPU unwittingly mimics a 16-bit counter or a DMA controller, with its address bus outputting an ascending 16-bit count. I am indebted to Mr Lancaster for the lesson I learned from Cheap Video, namely that a microprocessor can readily be manipulated by hardware tricks in order to produce unusual behaviors that are useful. The KimKlone, of course, relies very heavily on this principle. KIM (the original) and the KimKloneMy preliminary Cheap Video exploit was on my KIM-1, a classic, 1-MHz 6502 board from MOS Technology. That was in 1980 or so, and I ventured to add an extra wrinkle that used undefined opcodes to let the CPU access 128K of memory. As with the KimKlone, the banks were a full 64K in size and selected precisely as needed according to each bus cycle. As I recall, the deal in that case was that each xxxxx011 op-code would load a certain bit pattern via a PROM into an 8-bit shift register. Each xxxxx011 op-code was a Prefix instruction, and the shift register would trot out the corresponding pattern, one bit per cycle, as the following instruction executed; this would be a normal 65xx memory reference instruction. The shift register output bit served as A16, the most-significant address line. The patterns were such that A16 would flip from one 64K bank to the other for a single cycle only, exactly as the memory reference instruction performed its fetch or store. (There were other capabilities as well: for instance you could JMP to the alternate bank and stay there. But the doc, and my memory of the exact details, are both vague and imperfect.) Unlike those of the KK scheme, the KIM-1's prefixes were "dumb" about instruction timing (which of course varies according to address mode). For KIM-1 that increased the number of prefixes that had to be defined, and when programming you had to be careful to choose the prefix that'd yield the timing that matched the address mode you were gonna use! It seemed a shame to use all those undefined op-codes so inefficiently, but with the KIM-1 it didn't really matter because there was nothing else going on. Later I proved that the envelope can be pushed a great deal further indeed, as the KimKlone, a clean-sheet-of-paper design, demonstrates. |
||
|
||
|
||