VGA DRIVER

VGA signal

Video processors are not typically embedded in microcontrollers, so using the external video display unit is considered in gaming consoles. As this is the minimalistic project, VGA video signal is generated by software, based on interrupt driven kernel.

The routine which generates VGA signal is the part of T2 (Timer 2 module) interrupt service routine. This routine also services vertical sync pulse, markers for monitor auto adjustment and the bottom line text routine. At this version, no other interrupts are active, but the user can add his own interrupt sources, as long as they have the lower priority level.

Timing details

VGA timings for resolution 800x600 in 56Hz refresh rate are represented on the drawing. Here are detailed timings data:

Horizontal timing:

Pixel clock: 36 MHz (13.89 ns)
Horizontal frequency/period: 35.16 KHz (28.44 us)
Visible area: 800 pixels (22.22 us)
Front porch: 24 pixels (0.67 us)
Sync pulse: 72 pixels (2 us)
Back porch: 128 pixels (3.56 us)

Vertical timing:

Vertical frequency/period: 56 Hz (17.86 ms)
Visible area: 600 lines (17.067 ms)
Front porch: 1 line (28.44 us)
Sync pulse: 2 lines (56.89 us)
Back porch: 22 lines (625.78 us)
Whole frame: 625 lines (17.78 ms)

Dot clock for 800x600 resolution @ 56 Hz vertical frequency is exactly 36 MHz, and the maximum execution speed for PIC24E family is 70 MIPS. So the MCU has to be slightly overclocked to 72 MHz to get the desired instruction/pixel clock rate. This overclocking is only 2.8%, which is negligible and will not noticeable affect operational safety or thermal dissipation.

As it was noted, each pixel takes the place of 2x2 pixels area, so the actual dot clock is not 36 but 18 MHz. That gives enough time to the processor to execute four instructions in one pixel timing. In addition, every scan line is displayed twice, so there is even more time for buffer setup during horizontal sync and porches.

RAM organization

Video memory is located in internal 48 KB RAM, where it occupies 45600 bytes. All video signal timings match VGA standard in 800x600 mode, but, due to RAM limitations, the actuual displayed resolution is only 380x240, and it is displayed on 760x480 pixels original screen area.

To use the whole 800x600 display area in 8-bit pixel mode, we need 800x600=480,000 bytes of memory, but in the best case, all that PIC microcontrollers offer at this time is only 48K (49,152 bytes), which is too far from what we need. There are some 16-bit PICs with 96K RAM, but they are too slow for video signal generation, and some 52K PICS, but they are in SMD 64-pin packages with 0.5 mm pitch, which is quite unconvenient for DIY projects. Although it is possible to add external RAM, it is of no practical use, as the access to the external RAM would be too slow. So we have to do it with 48K RAM MCUs somehow.

To do that, we have to make some copromises:

1. The colour of each pixel is defined by four bits only, so it works in 16-colour mode. In fact, only 15 colours are used, as one of them (binary represened as 0000) does not mean "black" but "transparent", which shalll be used in sprite handling. More about that later.

2. Each pixel from the video memory is displayed on 4-pixel (2x2) area of the VGA screen.

3. Actual displayed resolution is 380x240, which occupies 760x480 pixels on the screen. The 20 pixel wide margin on the top, left and right side of the monitor are not used and are left black. At the bottom there is one line (39 characters) of text. It needs no frame buffer, as the routine interprets text directly from the text buffer in RAM.

This organization gives 380x240=91,200 graphical pixels, but as each pixel is covered by 4 bits, the video memory needs only 91,200/2=45,600 bytes of memory. Bottom line text needs no video buffer and it occupies only 78 bytes (39 for text and 39 for colour attributes). So there are 49,152-45,678=3,474 free bytes, which is quite enough for housekeeping (internal buffers and general purpose registers).

Sprites

With the processing power of 72 MIPS, it would be easy to generate the video signal by software, if the only requirement is to show the contents of video memory. As there is no video processing unit here and the MCU has to handle one pixel at a time, such concept would be useful for static images or very small movable blocks of pixels, but not for the game, which requires real time processing of large memory blocks. To make things worse, more than 1/3 of the time CPU is busy generating video signal, which leaves only 1/3 for housekeeping and active memory handling.

The solution to this problem is to use sprites, which are 2D images located outside the video RAM, and somehow superimposed in the main scene. Video units in some of the first personal computers could handle sprites in hardware, but in this project it is realized in software. The sprites are in internal program memory of the MCU and they are combined with video RAM contents to generate the full video signal. That means that there is no way to manipulate the sprite contents, it can only be displayed at the desired location of the screen. As the most of characters in this game are animated, there is a large number of pixels, and each of them represents one frame of that pixel in the animation. Here is the example of Jack's jump. Note that X and Y absolute position on the screen is permanently adjusted during the jump, as well as the order of slides in the jump sequence (which is listed in the script table in the firmware), so it gives much more freedom in creating the Mise en scène for the game - this jump is, in reality, much higher and lasts longer than it may look while just watching those slides. So there is no need to draw the equal slides again, as each of them can be called repeatedly in the script table. In this example, the last five slides are repeated only because of the hair splash, otherwise slides 11, 12, 13, 14 and 15 could be ommited and listed as 9, 8, 6, 4 and 3 in the script. The same slides are used for jump up and for jump down to the lower floor, but with different script tables.

All that software has to do while servicing the video scenario, is to preset the special sprite registers, determining X and Y positions (relative to the left and upper border of the active portion of the screen), width and height of one slide image, and address of the current slide in program memory. Video firmware, located in the interrupt routine, will superimpose that sprite in the content of the background video memory during RGB signal generation.

One more thing to note is that the orange colour in sprites means "transparent" (there is no orange colour in the game pallete, only in the pallete of the PC drawing program during sprite design process). Each orange pixel on the sprite will be displayed from the video memory, which will typically hold the background image. Yet, there is one drawback of this princip. If two or more sprites are overlapping, then transparent (orange) pixels on the first of them (which holds the highest priority, that means which is located higher on the special sprite table in RAM) will partly covered the lower sprite, displaying the background instead of lower sprite's active pixels. The first (simulated) screenshot shows that example.

There is the way to solve this problem, but only for the limited number of sprites. Some sprites can be treated as "special", and they do not have that drawback (see the second screenshot). The only problem with those sprites is that they require 18 times more time for the video routine to execute, so programmer has to take care not to use this option if it is not necessarry, as it could result in losing scan lines on the screen.

How to tell to the video routine which sprite is special, and which is not? The sprite list (located in RAM and named SPRITELIST) holds pointers for active sprites. The video routine can place (or erase) any sprite at that list at any time, and at any table position which is not currently occupied. This table can hold the maximum of 20 sprites at the same time. Only four sprites (number 17, 18, 19 and 20) are treated as "special" ones - they are executed much slower, but they do not generate the described problem in overlapping conflict, or at least it is minimized so that it is not noticeable. In this game, only one sprite (Jack itself) has that privilege, as the game scenario is such that all other sprites will never be overlapped.

Theory of operation

The most significant part of the video routine uses SPRITEBUFFER, which is 190 bytes long (equal to one scan line in video memory), and in which the video routine prepares the sprite contents for the current line, before it merges it with background image and outputs that scan line. So the video memory has two layers: the lower layer is the large video memory itself, which mainly contains background picture, and the upper layer, which is only one scan line large and which contains pixels for that line. Those pixels are copied from the sprite tables located in program memory, before the video routine starts outputing data. So, this layer is dynamicaly changed for each scan line (more specifically, each two equal scan lines) during the horizontal sync and back and front horizontal proch.

Here is how the video routine outputs RGB video signal to the port pins B8, B10, B12 and B14 (Red, Green, Blue and Intensity, respectively). Four instructions (total of 55.55 ns) are used for single pixel, and this part of program (repeated 190 times) outputs 380 pixels. Odd pixels (1, 3, 5...) are generated when bits #0, #2, #4 and #6 from the corresponding byte of SPRITEBUFFER are copied to port pins B8, B10, B12 and B14 (red listing), and even pixels (2, 4, 6...) are generated the same way, except they are rotated, so that bits #1, #3, #5 and #7 are copied to the same pins (blue listing). W13 register already points to the high byte of LATB register (not shown on the listing), w3 register points to the start of SPRITEBUFFER minus 1, and w12 register contains offset from SPRITEBUFFER to the main background (video memory) buffer (it should be correctly calculated before each scan line execution). W7 and w14 are simple masks used for odd/even pixels separation.

If you have to redesign the hardware of this project, you must know that the remaining bits of high byte LATB portion (#9, #11, #13 and #15) can not be used for simple output function, as they will be corrupted in this routine (this does not apply to remapable pin functions, as they are not altered by witing to LATB). As you can se, each 4-instruction part (both blue and red) first fetch the single byte from SPRITEBUFFER and tests it for zero at the same time. If it is zero (if the pixel in the sprite contains "transparent colour"), it fetches the pixel content from the video memory. At last, the pixel (whether it is from sprite or background) is outputed to the port. Here follows the vital part of video routine:

    mov #0b10101010,w7   ; mask bits 1,3,5,7 to isolate even pixels
    mov #0b01010101,w14  ; mask bits 0,2,4,6 to isolate odd pixels
.rept 190
    and.b w14,[++w3],w0  ; get next byte from SPRITEBUFFER, test bits 0,2,4,6
    btsc SR,#Z          ; if bits (0|2|4|6)≠0, then skip next instruction
    mov.b [w3+w12],w0    ; ... else get background pixel from video mem
    mov.b w0,[w13]       ; *** ODD pixel out
    and.b w7,[w3],w0     ; get same byte from SPRITEBUFFER, test bits 1,3,5,7
    btsc SR,#Z          ; if bits (1|3|5|7)≠0, then skip next instruction
    mov.b [w3+w12],w0    ; ... else get background pixel from video mem
    lsr.b w0,[w13]       ; *** EVEN pixel out
.endr

Of course, SPRITEBUFFER must be properly loaded with sprite pixels before current scan line starts. This can be done only during horizontal sync and back and front horizontal porches, and it leaves 6.23 us (about 448 instructions) which can be used for SPRITEBUFFER preparation. In reality, som of those instructions will be spent on register presets and w12 (offset) calculation, horizontal sync synchronization and SPRITEBUFFER clearing at the beginning, so in the best case we can count on about 300 instructions. This is surely not enough time to test 20 possible sprites, to check if they exist in the current scan line, calculate position inside sprite lookup table and to move their contents from program memory to the SPRITEBUFFER. Most of the time will be spent on the last item, reading program memory and moving its contents to the SPRITEBUFFER. To make things worse, reading from program memory takes 5 instruction cycles for each word, but, luckily, if you use PSV (Program Space Visibility) mode, only the first word transfer will take 5 instruction cycles, and the others only one. This is, of course, used in this project, otherwise it would not be possible.

Unfortunately, this is valid only if you move 16-bit words in PSV mode (e.g. mov [w3++],[w4++]), but if you use the same technique in byte mode (e.g. mov.b [w3++],[w4++]) you still need 5 instruction cycles for every byte (this is not documented in Microchip's manuals, so I had to learn it the harder way). The consequence for this PIC24E drawback is that it is not possible to move the single byte (2 pixels) of video content, but only word by word, which is 4 pixels. So the X pointer for each sprite should point to 0, 4, 8, 12, 16, 20... and not to the locations which could not be divided by 4. This makes more headache to the programmer, even during slide design in sprite animation.

What is so special about the last four sprites in the table, so that they can correctly cover another lower priority sprite? They do not use fast (and "blind") PSV mode, but slow byte-by-byte comparision and transfer. This takes 18 times more time to handle one sprite, so it should be used with special attention, and for sprites which are not too wide (height does not matter). There is still one possible pixel of "error" in overlapping sprites, when the area between overlapped sprites could contain some single transparent pixel, but this is unnoticeable on the screen.

As it was noted, there is not enough time to handle all sprites before each scan line. Luckily, there are two equal scan lines for every video line, so if we use both of them, we shall have twice more time. The only problem is that there is no way to start preparing the SPRITEBUFFER before it is completely displayed in the second scan line. That is why, instead of SPRITEBUFFER, there are two independent sprite buffers - SPRITEBUF1 and SPRITEBUF2. While the video routine displayes the contents of the first one, the second one will be prepared, and vice versa. That small pipeline is not so confusing like it seems, and it was the last trick which enabled the project realization.

So there are four basic steps, each of them executed before the scan line is outputed to port:

1. Test for every sprite in SPRITELIST and calculate pointers for the sprites which are present in scan line N+2 (and N+3), then load COPYLIST table with those pointers... then generate scan line N, using SPRITEBUF1

2. Use the COPYLIST to transfer pixel data from program memory to SPRITEBUF2... then generate the equal scan line N+1, using SPRITEBUF1

3. Test the sprites in SPRITELIST and calculate pointers if sprites are present in scan line N+4 (and N+5), then load COPYLIST table with those pointers... then generate the new scan line N+2, using SPRITEBUF2

4. Use the COPYLIST to transfer pixel data from program memory to SPRITEBUF1... then generate the equal scan line N+3, using SPRITEBUF2

By the way, SPRITEBUF1 and SPRITEBUF2 are spaced and surrounded by three areas named DUMMUSPACE1, DUMMUSPACE2 and DUMMUSPACE3, each of them 86 bytes wide. They are are used for nothing, except to store dummy pixels for some sprites which are close to the borders of the screen or even outside the screen. So X pointers can point up to -172 to the left or (380+172-sprite width) to the right, and the sprites will be correctly hidden if they are outside the screen. Y pointers can be streched unlimitedly, with no special care.

How to draw your own sprites and convert them to data tables

Both in video memory and in sprite tables, pixels are organized in the same way: bits #0,#2,#4,#6 are for the first pixel, bits #1,#3,#5,#7 of the same byte for the next one, and so on. That is how they have to be arranged when the sprite is created and the pixel data table is created. It can be .byte or .word data list, so the video routine can access it. Bits 16...23 of program memory are not used by video routine. Sprite tables can be located at any page of program memory.

There is a lot of ways to create image or sprite data tables. One possible way is to use some drawing program (e. g. Photoshop) to create the 16-colour pallete, with colours are arranged in this way:

0 Orange	4 Dark blue	8 Black	12 Light blue
1 Dark red	5 Dark violet	9 Light red	13 Light violet
2 Dark green	6 Dark cyan	10 Light green	14 Light cyan
3 Dark yellow	7 Gray	11 Light yellow	15 White

Now draw the sprite or slides for the animation in Indexed Color mode (with all transparent areas painted orange), and save it in .RAW format. If you look at the .RAW file in some hex editor, you shall see that the colour for every pixel is represented in a single byte. Now you have to create the simple program which converts the file to ASCII data table, respecting bit orders represented on the drawing.

That program should create ASCII directive .WORD or .BYTE, numeric constant prefixes 0x (if bytes are converted to hex), commas as table separators and line feeds, so the output should possibly look like this:

.word 0x0000,0x0000,0x8000,0xC0C0,0x0040,0x0000,0x0000,0x0000,0xC000,0xC0C0,0x0040,0x0000
.word 0x0040,0x0000,0x0000,0x0000,0xCF80,0xC5CF,0x0040,0x0000,0x0000,0x0000,0xC580,0xCFCF
.word 0xCF80,0xCACF,0x0040,0x0000,0x0000,0x0000,0xCF80,0xCFCF,0x0040,0x0000,0x0000,0x0000
.word 0x0000,0x0000,0xCF00,0x45CF,0x0000,0x0000,0x0000,0x0000,0xC300,0x00CB,0x0000,0x0000
...

Or like this, depended on mode used:

.byte 0x00,0x00,0x04,0x08,0x12,0x18,0x1d,0x21,0x26,0x28,0x2a,0x28,0x28,0x22,0x23,0x20
.byte 0x20,0x20,0x21,0x25,0x27,0x2f,0x2f,0x2d,0x2a,0x27,0x20,0x1c,0x15,0x0e,0x00,0x02
.byte 0x0b,0x15,0x1f,0x28,0x2d,0x31,0x35,0x37,0x36,0x2b,0x20,0x1b,0x1a,0x19,0x1a,0x1b
.byte 0x1e,0x23,0x30,0x33,0x35,0x31,0x2f,0x2a,0x20,0x18,0x12,0x0a,0x00,0x24,0x37,0x38
...

Yoy can copy this table as the text to your source file in your application.

Back to projects
1. Home
2. Hardware
3. Video signal
4. Sound
5. Sync
6. Jumping Jack
7. Download