Everything You Have Always Wanted to Know about the Playstation
But Were Afraid to Ask.










































Version 1.1
Compiled \ Edited By
Joshua Walker

Table of Contents

1.Introduction
2.History
3.The R3000A
Overview
The R3000A instruction set
R3000A opcode encoding
4.Memory
Memory Map
Virtual Memory
The System Control Coprocessor (Cop0)
Exception Handling
Breakpoint management
DMA
5.Video
Overview
The Graphics Processing Unit (GPU)
The Graphics Transformation Engine (GTE)
The Motion Decoder (MDEC) 
6.Sound
The Sound Processing Unit (SPU)
7.CD-ROM
8.Root Counters
9.Controllers
10.Memory Cards
11.Serial port I/O
12.Parallel port I/O


Appendices
A.Number Systems
B.BIOS functions
C.GPU command listing
D.Glossary of terms
E.Works cited - Bibliography 


Introduction

	This project to document the Playstation stated about a year ago. It started with the utter disgust I had for Sony of America after suing Bleem  over the PSX emulation technology. I saw the ugliness of a huge multinational company try to destroy two guys who had a good idea and even tried to share it with them. It made me sick. I wanted to do something to help, but alas I had no money, (I still don't) but I did buy a Bleem CD to support them. 
	I decided to start this little project. Partially to prove to Sony, but mostly to prove to myself, that coming up with the data to create you own emulator was not that hard.  I also wanted to show that behind that gray box that so many people hold dear. It's just a computer with no keyboard, that plugs into your TV. It's one thing to think that you were spending $250 on a new PSX, but it's another to realize that the CPU costs $5.99 from LSI.
	Kind of puts thing into perspective, doesn't it.
	I'm not a programmer. I've never worked for sony, and I never signed a Non-Disclosure Agreement with them. I just took my PSX apart, found out what made it tick, and put it back together. I also scoured the web looking for material that I could find. I never looked at any of Sony's official documentation and never took any thing you would have to have a license to get. Such as PSY-Q. I mostly poked at emulators to see how they worked. Bleem was only 512k at the time and was pretty easy to see how it functioned without even running it through a dissembler. PSEmu had an awesome debugger so I can see how a PSX ran even without caelta. 
 	I want this documentation to be freely available. Anyone can use it. From the seasoned PSX programmer to the lurking programmer read to make the next big emulator. If there is a discrepancy in my doc, please fix it. Tear out parts that are wrong and correct it so it's better that what I have now. I wanted to shoot for a 75% accuracy rating. I think I got it, but I don't know. Most of the stuff in here is hearsay and logical deductions. Much of it is merely a guess.
	Of course there is the standard disclaimer, all trademarks are of the appropriate owners and that this documentation is not endorsed by Sony or Bleem in any way.  You are, once again, free to give this away, trade it, or do what you will. It's not mine anymore. It's everybody's. Do with it what you please. Oh, and if your PSX blows up or melts down due to this documentation, sorry. I can't assure the validity of *any* info other that I didn't get it from Sony's official documentation. I'm not responsible to what you do to your machine.
	In closing I wish to apologize for the way this introduction was written as it's 2:00 in the morning. I have a wedding to get to at 10:30 and I've been up for the last three days finishing the darn thing. I wish to thank everyone who supported me. Janice, for believing in me and My girlfriend Kim who put it with the long nights in front of the computer writing and the long days in fornt of the Playstation claming I was "doing research" while playing FF8. I can't think of anything more to say. Have fun with this

-Joshua Walker
 4/29/00
2:34am







84905



History

Prologue B.P. (Before PlayStation) 

	Before the release of the PlayStation, Sony had never held a large portion of the videogames market. It had made a few forays into the computer side of things, most notably in its involvement with the failed MSX chip in the early 80's, but it wasn't until the advent of CD-ROM technology that Sony could claim any market share. A joint venture with the Dutch company Philips had yielded the CD-ROM/XA, an extension of the CD-ROM format that combined compressed audio, and visual and computer data and allowed both to be accessed simultaneously with the aid of extra hardware. By the late 80's, CD-ROM technology was being assimilated, albeit slowly, into the home computer market, and Sony was right there along side it. But they wanted a bigger piece of the pie. 

1988 Sony Enters The Arena 

	By 1988, the gaming world was firmly gripped in Nintendo's 8-bit fist. Sega had yet to make a proper showing, and Sony, although hungry for some action, hadn't made any moves of its own. 
	Yet. 
	Sony's first foray into the gaming market came in 1988, when it embarked on a deal with Nintendo to develop a CD-ROM drive for the Super NES, not scheduled to be released for another 18 months. This was Sony's chance to finally get involved in the home videogame market. What better way to enter that arena than on the coat-tails of the world's biggest gaming company? 
	Using the same Super Disc technology as the proposed SNES drive, Sony began development on what was to eventually become the PlayStaion. Initially called the Super Disc, it was supposed to be able to play both SNES cartridges and CD-ROMs, of which Sony was to be the "sole worldwide licenser," as stated in the contract. Nintendo was now to be at the mercy of Sony, who could manufacture their own CDs, play SNES carts, and play Sony CDs. Needless to say, Nintendo began to get worried. 

1991 Multimedia Comes Home 

	1991 saw the commercial release of the multimedia machine in the form of Philips' CD-I, which had initially been developed jointly by both Philips and Sony until mounting conflicts resulted in a parting of ways. Multimedia, with the current rise of the CD-ROM, was seen as the next big thing. And although the CD-I was too expensive for the mass market, its arrival cemented the CD-ROM as a medium for entertainment beyond the computer. 

June 1991 Treachery At The 11th Hour 

I	n June of 1991, at the Chicago CES (Consumer Electronics Show), Sony officially announced the Play Station (space intentional). The Play Station would have a port to play Super Nintendo cartridges, as well as a CD-ROM drive that would play Sony Super Discs. The machine would be able to play videogames as well as other forms of interactive entertainment, as was considered important at the time. 
	Sony intended to draw on its family of companies, including Sony Music and Columbia Pictures, to develop software. Olaf Olafsson, then chief of Sony Electronic Publishing, was seen on the set of Hook, Steven Spielberg's new Peter Pan movie, presumably deciding how the movie could be worked into a game for the fledgling Play Station. In Fortune magazine, Olafsson was quoted as saying "The video-game business...will be much more interesting (than when it was cartridge based). By owning a studio, we can get involved right from the beginning, during the writing of the movie." 
	By this point, Nintendo had had just about all it could take. On top of the deal signed in 1988, Sony had also contributed the main audio chip to the cartridge-based Super NES. 
	The Ken Kutaragi-designed chip was a key element to the system, but was designed in such a way as to make effective development possible only with Sony's expensive development tools. Sony had also retained all rights to the chip, which further exaserbated Nintendo. 
	The day after Sony announced its plans to begin work on the Play Station, Nintendo made an announcement of its own. Instead of confirming its alliance with Sony, as everyone expected, Nintendo announced it was working with Philips, Sony's longtime rivals, on the SNES CD-ROM drive. Sony was understandably furious. 
	Because of their contract-breaking actions, Nintendo not only faced legal repercussions from Sony, but could also experience a serious backlash from the Japanese business community. Nintendo had broken the unwritten law that a company shouldn't turn against a reigning Japanese company in favor of a foreign one. 
	However, Nintendo managed to escape without a penalty. Because of their mutual involvement, it would be in the best interests of both companies to maintain friendly relations. Sony, after all, was planning a port for SNES carts, and Nintendo was still using the Sony audio chip. 

1992 The Smoke Clears 

	By the end of 1992, most of the storm had blown over. Despite a deal penned between Sega, one of Nintendo's biggest competitors, and Sony, whereby Sony would produce software for the proposed Sega Multimedia Entertainment System, negotiations were reached with Nintendo. In October of 1992, it was announced that the two companies' CD-ROM players would be compatible. The software licensing for the proposed 32-bit machines was awarded to Nintendo, with Sony receiving only minimal licensing royalties. Nintendo had won this battle, but hadn't won the war. Not by a long shot.
	The first Play Station never made it out of the factories. Apparently, about 200 were produced, and some software even made it to development. For whatever reason, whether it was because of the tough licensing deal with Nintendo, or the predicted passing of masked ROM (cartridge-based) technology, Sony scrapped its prototype. Steve Race, Sony Computer Entertainment Of America's (SCEA) then CEO, stated, "Since the deal with Nintendo didn't come to fruition we decided to put games on a back burner and wait for the next category. Generally, the gaming industry has a seven-year product life-cycle, so we bided our time until we could get in on the next cycle."




1993 The Next Cycle 

	After returning to the drawing boards, Sony revealed the PS-X, or PlayStation-X. Gone was the original cartridge port, as were the plans for multimedia. Apparently, Sony had visited 3DO when Trip Hawkins was selling his hardware and had come away unimpressed, saying it was "nothing new." The PS-X was to be a dedicated game-machine, pure and simple. Steve Race said in Next Generation magazine, "We designed the PlayStation to be the best game player we could possibly make. Games really are multimedia, no matter what we want to call it. The conclusion is that the PlayStation is a multimedia machine that is positioned as the ultimate game player."
	Key to Sony's battle plan was the implementation of 3D into its graphics capabilities, a move that many felt was critical to any kind of future success. At the core of the PlayStation's 3D prowess was the R3000 processor, operating at 33 Mhz and 30 MIPS (millions of instructions per second). While this may seem fairly average for a RISC CPU, it's the PlayStation's supplementary custom hardware, designed by Ken Kutaragi (who had previously designed the key audio chip for the SNES), that provides the real power. The CPU relies heavily on Kutaragi's VLSI (very large scale integration) chip to provide the speed necessary to process complex 3D graphics quickly. 
	The CPU is backed up by the GPU (Graphics Processing Unit), which takes care of all the data from the CPU and passes the results to the 1024K of dual-ported VRAM, which stores the current frame buffer and allows the picture to be displayed on-screen. Part of this picture involves adding special effects such as transparency and fog, something that the PlayStation excels at. This was to be the most impressive display of hardware the home gaming world had ever seen

1994 Third Party Round Up 

	There was no doubt that Sony could deliver the hardware. After all, Sony was one of the world's largest manufacturers of electronics. There was no denying though, that Sony was extremely green when it came to videogames. And no one knew it better than Sony. 
	Not wanting to end up like Atari or 3DO, Sony set about rounding up third party developers, assembling an impressive 250 developing parties in Japan alone. Sony also knew it had to gain the support of the millions of arcade-going gamers if it was to succeed. The involvement of Namco, Konami, and Williams helped ensure Sony would be able to compete with the arcade-savvy Sega on its own turf. Namco's Ridge Racer was the natural choice to be the flagship launch game, and Williams' Mortal Kombat III, previously promised to Nintendo for their Ultra 64, could be tested in the arcades using the new PS-X board.
	Perhaps Sony's most controversial acquisition was the purchase of Psygnosis, a relatively unknown European developer, for $48 million. Sony needed a strong in-house development team, and Psygnosis' Lemmings seemed to point at good things. While the purchase confused many at the time, prompting vocal outcries from naysayers and competitors alike, Psygnosis has since proven them all wrong. Sony Interactive Entertainment, as Psygnosis was renamed, has been responsible for some of the PlayStation's best games, including WipeOut and Destruction Derby. 
	Sony's acquisition of Psygnosis yielded another fruit as well: the development system. SN Systems, co-owned by Andy Beveridge and Martin Day, had been publishing its development software through Psygnosis under the PSY-Q moniker. Sony originally had been planning on using expensive, Japanese MIPS R4000-based machines that would be connected to the prototype PS-X box. Having become accustomed to developing on the PC, Psygnosis gave Beveridge and Day first crack at creating a PlayStation development system that would work with a standard PC. 
	The two men worked through Christmas and New Year's, around the clock, eventually completing the GNU-C compiler and the source-level debugger. Psygnosis quickly arranged a meeting for SN and Sony at the Winter CES in Las Vegas, 1994. Fortunately, Sony liked the PSY-Q alternative and decided to work with SN Systems on cendensing the software onto two PC-compatible cards. Thus, an afordable and, more importantly, universally compatible PlayStation development station was born. 

December 3, 1994 We Have Lift Off 

	On December 3, 1994, the PlayStation was finally released in Japan, one week after the Sega Saturn. The initial retail cost was 37,000 yen, or about $387. Software available at launch included King's Field, Crime Crackers, and Namco's Ridge Racer, the PlayStation's first certifiable killer app. It was met with long lines across Japan, and was hailed by Sony as their most important product since the WalkMan in the late 1970's. 
	Also available at launch were a host of peripherals, including: a memory card to save high scores and games; a link cable, whereby you could connect two PlayStations and two TVs and play against a friend; a mouse with pad for PC ports; an RFU Adaptor; an S-Video Adaptor; and a Multitap Unit. Third party peripherals were also available, including Namco's Negcon. 
	The look of the PlayStation was dramatically different than the Saturn, which was beige (in Japan), bulky, and somewhat clumsy looking. In contrast, the PlayStation was slim, sleek, and gray, with a revolutionary controller that was years ahead of the Saturn's SNES-like pad. The new PSX joypad provided unheardof control by adding two more buttons on the shoulder, making a total of eight buttons. The two extended grips also added a new element of control. Ken Kutaragi realized the importance of control when dealing with 3 Dimensional game worlds. "We probably spent as much time on the joypad's development as the body of the machine. Sony's boss showed special interest in achieving the final version so it has his seal of approval." To Sony's delight, the PlayStation sold more than 300,000 units in the first 30 days. The Saturn claimed to have sold 400,000, but research has shown that number to be misleading. The PSX sold through (to customers) 97% of its stock, while many Saturns were still sitting on the shelves. These misleading numbers were to be quoted by Sega on many occasions, and continued even after the US launch. 

1995 Setting Up House 

	By mid-1995, Sony had set its sights firmly on the United States. Sony Computer Entertainment of America was created and housed in Foster City, California, in the heart of Silicon Valley. Steve Race, formerly of Atari, was appointed as president and CEO of the new branch of Sony. The accumulation of third party developers continued apace, with over 100 licenses in the US and 270 licenses in Japan secured. Steve Race said, "We've allowed people to come in and to play on the PlayStation - and at a much more reasonable cost than has been done in the old days with Nintendo and Sega." Sony's development strategy had paid off, with over 700 development units having been shipped out worldwide. 

May 11, 1995 Victory At E3 

	The Electronic Entertainment Expo (E3) was held in Los Angeles from May 11 to 13, 1995, and was the United State's first real look at the PlayStation. Sony made a huge impression at the show with their (rumored) $4 million booth and surprise appearance by Michael Jackson. The PSX was definitely the highlight of the show, besting Sega's Saturn and Nintendo's laughable Virtual Boy. 
	The launch software was also displayed, with WipeOut and Namco's Tekken and Ridge Racer drawing the most praise. Sony also announced the unit would not be bundled with Ridge Racer, as was previously assumed. 
	Overall, Sony made a very formidable showing at E3. They had already proven themselves in Japan and were close on Sega's heels. Over the course of the next year they would overtake Sega and conquer Japan as their own. Now they were poised to do the same in America. 

September 9, 1995 You Are Not Ready 

	The PlayStation launched in the United States on September 9, 1995 to instant success. Although it retailed for $299, that was still $100 less than the Sega Saturn. Over 100,000 units were already presold at launch, and 17 games were available. Stores reported sell-outs across the country, and sold out of many games and peripherals as well, including second controllers and memory cards. 
	Sony's initial marketing strategy seemed to be aimed at an older audience than the traditional 8-16 year old demographic of the past. With the tag line "U R Not E" (the "E" being red) and a rumored $40 million to spend on launch marketing, Sony swiftly positioned itself as the market leader. To further cement their audience demographic, Sony sponsored the 1995 MTV Music Awards. 

Epilogue What A Year 

By the US launch, Sony had sold over one million PlayStations in Japan alone. Since the US launch, as of late 1996, the PlayStation has sold over 7 million units worldwide, with close to two million of those being in the US alone. In May of 1996, Sony dropped the price of the PlayStation to $199, making it even more attractive to buy. 
	Like Japan, America and Europe embraced the PlayStation as their next-gen console of choice. The demographic of PlayStation owners has fallen in years steadily from twenty-somethings to the younger age bracket so coveted by Nintendo. In fact, many former Nintendo loyalists, tired of waiting for the Nintendo 64 to be released, bought PlayStations and are now happier for it. With close to 200 games available by Christmas 1996, it's easy to see why. This really is the ultimate gaming console! 







The R3000A

Overview
The heart of the PSX is a slightly modified R3000A CPU from MIPS and LSI. This is a 32 bit Reduced Instruction Set Controller (RISC) processor that clocks at 33.8688 MHz. It has an operating performance of 30 million instructions per second. In addition, it has an Internal instruction cache of 4 KB, a data cache of  1 KB and has a bus transfer rate of 132 MB/sec. It has internally one Arithmetic/Logic unit (ALU), One shifter, and totally lacks an FPU or floating point unit.  The R3000A is configured for litle-endian  byte order and defines a word as 32-bits, a half-word, as 16-bits, and a byte as 8-bits. 
	The PSX has two coprocessors, cop0, the System Control coprocessor, and cop2, the GPU or Graphics Processing Unit. These are covered later on in this document. 

Instruction cache
	The PSXs R3000A contains 4 KB of instruction cache. The instruction cache is organized with a line size of 16 bytes. This should achieve hit rate of around 80%. The cache is implemented using physical address and tags, as opposed to virtual ones.

Data cache
	The PSXs R3000A incorporates an on-chip data cache of  1KB, organized as a line size of 4 bytes (one word). This also should achieve hit rates of 80% in most applications. This also is a directly mapped physical address cache. The data cache is implemented as a write through cache, to maintain that the main memory is the same as the internal cache. In order to minimize processor stalls due to data write operations, the bus interface unit uses a 4-deep write buffer which captures address and data at the processor execution rate, allowing it to be retired to main memory at a much slower rate without impacting system performance. 

32 bit architecture
	The R3000A uses thirty-two 32-bit registers, a 32 bit program counter, and two 32 bit registers for multiply/divide functions.  The following table lists the registers by register number, name, and usage.

General Purpose Registers

Register number	Name	Usage	
R0	ZR	Constant Zero	
R1	AT	Reserved for the assembler	
R2-R3	V0-V1	Values for results and expression evaluation	
R4-R7	A0-A3	Arguments	
R8-R15	T0-T7	Temporaries (not preserved across call)	
R16-R23	S0-S7	Saved (preserved across call)	
R24-R25	T8-T9	More temporaries (not preserved across call)	
R26-R27	K0-K1	Reserved for OS Kernel	
R28	GP	Global Pointer	
R29	SP	Stack Pointer	
R30	FP	Frame Pointer	
R31 	RA	Return address (set by function call)	





Multiply/Divide result Registers and Program counter
Name	Description	
HI	Multiplication 64 bit high result or division  remainder	
LO	Multiplication 64 bit low result or division quotient	
PC	Program Counter	

	Even though all general purpose registers have different names, they are all treated the same except for two. The R0 (ZR) register is hardwired as zero. The Second exception is R31 (RA) which is used at a link register when link or jump routines are called. These instructions are used in subroutine calls, and the subroutine return address is placed in register R31. This register can be written to or read as a normal register in other operations.

R3000A Instruction set
	The instruction encoding is based on the MIPS architecture. The means that there are three types of instruction encoding.

I-Type (Immediate)
op	rs	rt	immediate	

J-Type (Jump)
 op	target	

R-Type (Register)
op	rs	rt	rd	shamt	funct	

where:

op	is a 6-bit operation code	
rs	is a five bit source register specifier	
rt	is a 5-bit target register or branch condition	
immediate	is a 16-bit immediate, or branch or address displacement	
target	is a 26-bit jump target address	
rd	is a 5-bit destination register specifier	
shamt	is a 5-bit shift amount	
funct	is a 6-bit function field	


The R3000A instruction set can be divided into the following basic groups:
Load/Store instructions move data between memory and the general registers. They are all encoded as I-Type instructions, and the only addressing mode implemented is base register plus signed, immediate offset. This directly enables the use of three distinct addressing modes: register plus offset; register direct; and immediate.
Computational instructions perform arithmetic, logical, and shift operations on values in registers. They are encoded as either R-Type instructions, when both source operands as well as the result are general registers, and I-Type, when one of the source operands is a 16-bit immediate value. Computational instructions use a three address format, so that operations dont needlessly interfere with the contents of source registers.
Jump and Branch instructions change the control flow of a program. A Jump instruction can be encoded as a J-Type instruction, in which case the Jump target address is a paged absolute address formed by combining the 26-bit immediate value with four bits of the Program Counter. This form is used for subroutine calls. Alternately, Jumps can be encoded using the R-Type format, in which case the target address is a 32-bit value contained in one of the general registers. This form is typically used for returns and dispatches. Branch operations are encoded as I-Type instructions. The target address is formed from a 16-bit displacement relative to the Program Counter. The Jump and Link instructions save a return address in Register r31. These are typically used as subroutine calls, where the subroutine return address is stored into r31 during the call operation.
Co-Processor instructions perform operations on the co-processor set. Co-Processor Loads and Stores are always encoded as I-Type instructions; co-processor operational instructions have co-processor dependent formats. In the R3000A, the System Control Co-Processor (cop0) contains registers which are used in memory management and exception handling. 
Special instructions perform a variety of tasks, including movement of data between special and general registers, system calls, and breakpoint operations. They are always encoded as R-Type instructions. 

INSTRUCTION SET SUMMARY
The following table describes The assembly instructions for the R3000A. Please refer to the appendix for more detail about opcode encoding

Load and Store Instructions

Instruction	Format and Description	
Load Byte	LB rt, offset (base) Sign-extend 16-bit offset and add to contents of register base to form address. Sign-extend contents of addressed byte and load into rt.	
Load Byte Unsigned	LBU rt, offset (base) Sign-extend 16-bit offset and add to contents of register base to form address. Zero-extend contents of addressed byte and load into rt.	
Load Halfword	LH rt, offset (base) Sign-extend 16-bit offset and add to contents of register base to form address. Sign-extend contents of addressed byte and load into rt.	
Load Halfword Unsigned	LHU rt, offset (base) Sign-extend 16-bit offset and add to contents of register base to form address. Zero-extend contents of addressed byte and load into rt.	
Load Word	LW rt, offset (base) Sign-extend 16-bit offset and add to contents of register base to form address. Load contents of addressed word into register rt.	
Load Word Left	LWL rt, offset (base) Sign-extend 16-bit offset and add to contents of register base to form address. Shift addressed word left so that addressed byte is leftmost byte of a word. Merge bytes from memory with contents of register rt and load result into register rt.	
Load Word Right	LWR rt, offset (base) Sign-extend 16-bit offset and add to contents of register base to form address. Shift addressed word right so that addressed byte is rightmost byte of a word. Merge bytes from memory with contents of register rt and load result into register rt.	
Store Byte	SB rt, offset (base) Sign-extend 16-bit offset and add to contents of register base to form address. Store least significant byte of register rt at addressed location.	
Store Halfword	SH rt, offset (base) Sign-extend 16-bit offset and add to contents of register base to form address. Store least significant halfword of register rt at addressed location.	
Store Word	SW rt, offset (base) Sign-extend 16-bit offset and add to contents of register base to form address. Store least significant word of register rt at addressed location.	
Store Word Left	SWL rt, offset (base) Sign-extend 16-bit offset and add to contents of register base to form address. Shift contents of register rt right so that leftmost byte of the word is in position of addressed byte. Store bytes containing original data into corresponding bytes at addressed byte.	
Store Word Right	SWR rt, offset (base) Sign-extend 16-bit offset and add to contents of register base to form address. Shift contents of register rt left so that rightmost byte of the word is in position of addressed byte. Store bytes containing original data into corresponding bytes at addressed byte.	

Computational Instructions

ALU Immediate Operations
Instruction	Format and Description	
ADD Immediate	ADDI rt, rs, immediate Add 16-bit sign-extended immediate to register rs and place 32-bit result in register rt . Trap on twos complement overflow.	
ADD Immediate Unsigned	ADDIU rt, rs, immediate Add 16-bit sign-extended immediate to register rs and place 32-bit result in register rt . Do not trap on overflow.	
Set on Less Than Immediate	SLTI rt, rs, immediate Compare 16-bit sign-extended immediate with register rs as signed 32-bit integers. Result = 1 if rs is less than immediate; otherwise result = 0. Place result in register rt.	
Set on Less Than Unsigned Immediate	SLTIU rt, rs, immediate Compare 16-bit sign-extended immediate with register rs as unsigned 32-bit integers. Result = 1 if rs is less than immediate; otherwise result = 0. Place result in register rt. Do not trap on overflow.	
AND Immediate	ANDI rt, rs, immediate Zero-extend 16-bit immediate, AND with contents of register rs and place result in register rt.	
OR Immediate	ORI rt, rs, immediate Zero-extend 16-bit immediate, OR with contents of register rs and place result in register rt.	
Exclusive OR Immediate	XORI rt, rs, immediate Zero-extend 16-bit immediate, exclusive OR with contents of register rs and place result in register rt.	
Load Upper Immediate	LUI rt, immediate Shift 16-bit immediate left 16 bits. Set least significant 16 bits of word to zeroes. Store result in register rt.	


Three Operand Register-Type Operations
Instruction	Format and Description	
Add	ADD rd, rs, rt Add contents of registers rs and rt and place 32-bit result in register rd. Trap on twos complement overflow.	
ADD Unsigned	ADDU rd, rs, rt Add contents of registers rs and rt and place 32-bit result in register rd. Do not trap on overflow.	
Subtract	SUB rd, rs, rt Subtract contents of registers rt and rs and place 32-bit result in register rd. Trap on twos complement overflow.	
Subtract Unsigned	SUBU rd, rs, rt Subtract contents of registers rt and rs and place 32-bit result in register rd. Do not trap on overflow.	
Set on Less Than	SLT rd, rs, rt Compare contents of register rt to register rs (as signed 32-bit integers). If register rs is less than rt, result = 1; otherwise, result = 0.	
Set on Less Than Unsigned	SLTU rd, rs, rt Compare contents of register rt to register rs (as unsigned 32-bit integers). If register rs is less than rt, result = 1; otherwise, result = 0.	
AND	AND rd, rs, rt Bit-wise AND contents of registers rs and rt and place result in register rd.	
OR	OR rd, rs, rt Bit-wise OR contents of registers rs and rt and place result in register rd.	
Exclusive OR	XOR rd, rs, rt Bit-wise Exclusive OR contents of registers rs and rt and place result in register rd.	
NOR	NOR rd, rs, rt Bit-wise NOR contents of registers rs and rt and place result in register rd.	

Shift Operations
Instruction	Format and Description	
Shift Left Logical	SLL rd, rt, shamt Shift contents of register rt left by shamt bits, inserting zeroes into low order bits. Place 32-bit result in register rd.	
Shift Right Logical	SRL rd, rt, shamt Shift contents of register rt right by shamt bits, inserting zeroes into high order bits. Place 32-bit result in register rd.	
Shift Right Arithmetic	SRA rd, rt, shamt Shift contents of register rt right by shamt bits, sign-extending the high order bits. Place 32-bit result in register rd.	
Shift Left Logical Variable	SLLV rd, rt, rs Shift contents of register rt left. Low-order 5 bits of register rs specify number of bits to shift. Insert zeroes into low order bits of rt and place 32-bit result in register rd.	
Shift Right Logical Variable	SRLV rd, rt, rs Shift contents of register rt right. Low-order 5 bits of register rs specify number of bits to shift. Insert zeroes into high order bits of rt and place 32-bit result in register rd.	
Shift Right Arithmetic Variable	SRAV rd, rt, rs Shift contents of register rt right. Low-order 5 bits of register rs specify number of bits to shift. Sign-extend the high order bits of rt and place 32-bit result in register rd.	

Multiply and Divide Operations
Instruction	Format and Description	
Multiply	MULT rs, rt Multiply contents of registers rs and rt as twos complement values. Place 64-bit result in special registers HI/LO	
Multiply Unsigned	MULTU rs, rt Multiply contents of registers rs and rt as unsigned values. Place 64-bit result in special registers HI/LO	
Divide	DIV rs, rt Divide contents of register rs by rt treating operands as twos complements values. Place 32-bit quotient in special register LO, and 32-bit remainder in HI.	
Divide Unsigned	DIVU rs, rt Divide contents of register rs by rt treating operands as unsigned values. Place 32-bit quotient in special register LO, and 32-bit remainder in HI.	
Move From HI	MFHI rd Move contents of special register HI to register rd.	
Move From LO	MFLO rd Move contents of special register LO to register rd.	
Move To HI	MTHI rd Move contents of special register rd to special register HI.	
Move To LO	MTLO rd Move contents of register rd to special register LO.	

Jump and Branch Instructions

Jump Instructions
Instruction	Format and Description	
Jump	J target Shift 26-bit target address left two bits, combine with high-order4 bits of PC and jump to address with a one instruction delay.	
Jump and Link	JAL target Shift 26-bit target address left two bits, combine with high-order 4 bits of PC and jump to address with a one instruction delay. Place address of instruction following delay slot in r31 (link register).	
Jump Register	JR rs Jump to address contained in register rs with a one instruction delay.	
Jump and Link Register	JALR rs, rd Jump to address contained in register rs with a one instruction delay. Place address of instruction following delay slot in rd.	

Branch Instructions
Instruction	Format and Description	
	Branch Target: All Branch instruction target addresses are computed as follows: Add address of instruction in delay slot and the 16-bit offset (shifted left two bits and sign-extended to 32 bits). All branches occur with a delay of one instruction.	
Branch on Equal	BEQ rs, rt, offset Branch to target address if register rs equal to rt	
Branch on Not Equal	BNE rs, rt, offset Branch to target address if register rs not equal to rt.	
Branch on Less than or Equal Zero	BLEZ rs, offset Branch to target address if register rs less than or equal to 0.	
Branch on Greater Than Zero	BGTZ rs, offset Branch to target address if register rs greater than 0.	
Branch on Less Than Zero	BLTZ rs, offset Branch to target address if register rs less than 0.	
Branch on Greater than or Equal Zero	BGEZ rs, offset Branch to target address if register rs greater than or equal to 0.	
Branch on Less Than Zero And Link	BLTZAL rs, offset Place address of instruction following delay slot in register r31 (link register). Branch to target address if register rs less than 0.	
Branch on greater than or Equal Zero And Link	BGEZAL rs, offset Place address of instruction following delay slot in register r31 (link register). Branch to target address if register rs is greater than or equal to 0.	


Special Instructions
Instruction	Format and Description	
System Call	SYSCALL Initiates system call trap, immediately transferring control to exception handler. More information on the PSX SYSCALL routines are covered later on.	
Breakpoint	BREAK Initiates breakpoint trap, immediately transferring control to exception handler.	

	More information on the PSX SYSCALL routines are covered later on.


Co-processor Instructions
Instruction	Format and Description	
Load Word to Co-processor	LWCz rt, offset (base) Sign-extend 16-bit offset and add to base to form address. Load contents of addressed word into co-processor register rt of co-processor unit z.	
Store Word from Co-processor	SWCz rt, offset (base) Sign-extend 16-bit offset and add to base to form address. Store contents of co-processor register rt from co-processor unit z at addressed memory word.	
Move To Co-processor	MTCz rt, rd Move contents of CPU register rt into co-processor register rd of co-processor unit z.	
Move from Co-processor	MFCz rt,rd Move contents of co-processor register rd from co-processor unit z to CPU register rt.	
Move Control To Co-processor	CTCz rt,rd Move contents of CPU register rt into co-processor control register rd of co-processor unit z.	
Move Control From Co-processor	CFCz rt,rd Move contents of control register rd of co-processor unit z into CPU register rt.	
Move Control To Co-processor	COPz cofun Co-processor z performs an operation. The state of the R3000A is not modified by a co-processor operation.	



System Control Co-processor (COP0) Instructions
Instruction	Format and Description	
Move To CP0	MTC0 rt, rd Store contents of CPU register rt into register rd of CP0. This follows the convention of store operations.	
Move From CP0	MFC0 rt, rd Load CPU register rt with contents of CP0 register rd.	
Read Indexed TLB Entry	TLBR Load EntryHi and EntryLo registers with TLB entry pointed at by Index register.	
Write Indexed TLB Entry	TLBWI Load TLB entry pointed at by Index register with contents of EntryHi and EntryLo registers.	
Write Random TLB Entry	TLBWR Load TLB entry pointed at by Random register with contents of EntryHi and EntryLo registers.	
Probe TLB for Matching Entry	TLBP Entry Load Index register with address of TLB entry whose contents match EntryHi and EntryLo. If no TLB entry matches, set high-order bit of Index register.	
Restore From Exception	RFE Restore previous interrupt mask and mode bits of status register into current status bits. Restore old status bits into previous status bits.	



R3000A OPCODE ENCODING
The following shows the opcode encoding for the MIPS architecture.


OPCODE
Bits	28...26								
3829	0	1	2	3	4	5	6	7	
0	SPECIAL	BCOND	J	JAL	BEQ	BNE	BLEZ	BGTZ	
1	ADDI	ADDIU	SLTI	SLTIU	ANDI	ORI	XORI	LUI	
2	COP0	COP1	COP2	COP3					
3									
4	LB	LH	LWL	LW	LBU	LHU	LWR		
5	SB 	SH	SWL	SW			SWR		
6	LWC0	LWC1	LWC2	LWC3					
7	SWC0	SWC1	SWC2	SWC3					

SPECIAL
Bits	20								
53	0	1	2	3	4	5	6	7	
0	SLL		SRL	SRA	SLLV		SRLV	SRAV	
1	JR	JALR			SYSCALL	BREAK			
2	MFHI	MTHI	MFLO	MTLO					
3	MULT	MULTU	DIV	DIVU					
4	ADD	ADDU	SUB	SUBU	AND	OR	XOR	NOR	
5			SLT	SLTU					
6									
7									

BCOND
Bits	816								
2019	0	1	2	3	4	5	6	7	
0	BLTZ	BGEZ							
1									
2	BLTZAL	BGEZAL							
COPz
Bits	2321								
2524	0	1	2	3	4	5	6	7	
0	MF		CF		MT		CT		
1	BC								
Co-Processor Specific Operations


COP0
Bits	20								
43	0	1	2	3	4	5	6	7	
0		TLBR	TLBWI				TLBWR		
1	TLBP								
2	RFE								
3									



Memory

Overview
	The PSXs memory consists of four 512k 60ns SRAM chips creating 2 megabytes of system memory. The RAM is arranged so that the addresses at 0x00xxxxxx, 0xA0xxxxxx, 0x80xxxxxx all point to the same physical memory.  The PSX has a special coprocessor called cop0 that handles almost every aspect of memory management. Let us first examine how the memory looks and then how it is managed.

The PSX Memory Map

0x0000_0000-0x0000_ffff	Kernel (64K)	
0x0001_0000  0x001f_ffff	 User Memory (1.9 Meg)	
		
0x1f00_0000-0x1f00_ffff	Parallel Port (64K)	
		
0x1f80_0000-0x1f80_03ff	Scratch Pad (1024 bytes)	
		
0x1f80_1000-0x1f80_2fff	Hardware  Registers (8K)	
		
0x8000_0000  0x801f_ffff	Kernel and User Memory Mirror (2 Meg) Cached	
		
0xa000_0000  0xa01f_ffff	Kernel and User Memory Mirror (2 Meg) Uncached	
		
0xbfc0_0000-0xbfc7_ffff	BIOS (512K)	

	All blank areas represent the absence of memory.  The mirrors are used mostly for caching and exception handling  purposes  The Kernel is also mirrored in all three user memory spaces. 

Virtual Memory
	The PSX uses a memory architecture known as Virtual Memory to help with general system memory and cache management. In a nutshell what the PSX does is mirror the two meg of addressable space into 3 segments at three different virtual addresses. The names of these segments are Kuseg, Kseg0, and Kseg1. 
	Kuseg spans from 0x0000_0000 to 0x001f_ffff. This is what you might call real memory. This facilitates the kernel having direct access to user memory regions.
Kseg0 begins at virtual address 0x8000_0000 and goes to 0x801f_ffff. This segment is always translated to a linear 2MB region of the physical address space starting at physical address 0. All references through this segment are cacheable. When the most significant three bits of the virtual address are 100, the virtual address resides in kseg0. The physical address is constructed by replacing these three bits of the virtual address with the value 000.
Kseg1 is also a linear 2MB region from 0xa000_0000 to 0xa01f_ffff pointing to the same address at address 0. When the most significant three bits of the virtual address are 101, the virtual address resides in kseg1. The physical address is constructed by replacing these three bits of the virtual address with the value 000. Unlike kseg0, references through kseg1 are not cacheable. 



Looking a little deeper into how virtual memory works,  the following shows the anatomy of an R3000A virtual address. The most significant 20 bits of the 32-bit virtual address are called the virtual page number, or VPN. Only the three highest bits (segment number) are involved in the virtual to physical address translation.
31				0	
			VPN	Offset	
31	30	29	20	12	

bits 31-29
0xx	kuseg
100	kseg0
101	kseg1

The three most significant bits of the virtual address identify which virtual address segment the processor is currently referencing; these segments have associated with them the mapping algorithm to be employed, and whether virtual addresses in that segment may reside in the cache. Pages are mapped by substituting a 20-bit physical frame number (PFN) for the 20-bit virtual page number field of the virtual address. This substitution is performed through the use of the on-chip Translation Lookaside Buffer (TLB). The TLB is a fully associative memory that holds 64 entries to provide a mapping of 64 4kB pages. When a virtual reference to kuseg each TLB entry is probed to see if it maps the corresponding VPN. 

Virtual to physical memory translation 
	The following table is a quick look at how virtual memory gets translated via the Translation Lookaside Buffer. This whole subsystem of memory management is handled by Cop0.




Cop0, The System Control Coprocessor
	This Unit is actually part of the R3000A. This particular cop0 has been modified from the original R3000A cop0 architecture with the addition of a few registers and functions. Cop0 contains 16 32-bit control registers that control the various aspects of memory management, system interrupt (exception) management, and breakpoints. Much of it is compatible with the normal R3000A cop0.  The following is an overview of the Cop0 registers.

Cop0 Registers
Number	Mnemonic	Name	Read/Write	Usage	
0	INDX	Index	r/w	Index to an entry in the 64-entry TLB file	
1	RAND	Random	r	Provides software with a suggested random  TLB entry to be written with the correct translation	
2	TLBL	TBL low	r/w	Provides the data path for operations which read, write, or probe the TLB file (first 32 bits)	
3	BPC	Breakpoint PC	r/w	Sets the breakpoint address to break on execute	
4	CTXT	Context	r	Duplicates information in the BADV register, but provides this information in a form that may be more useful for a software TLB exception handler.	
5	BDA	Breakpoint data	r/w	Sets the breakpoint address for load/store operations	
6	PIDMASK	PID Mask	r/w	Process ID mask	
7	DCIC	Data/Counter interrupt control	r/w	Breakpoint control	
8	BADV	Bad Virtual Address	r	Contains the address whose reference caused an exception.	
9	BDAM	Break data mask	r/w	Data fetch address is ANDed with this value and then compared to the value in BDA	
10	TLBH	TBL high	r/w	Provides the data path for operations which read, write, or probe the TLB file (second 32 bits)	
11	BPCM	Break point counter mask	r/w	Program counter is ANDed with this value and then compared to the value in BPC	
12	SR	System status register	r/w	Contains all the major status bits	
13	CAUSE	Cause	r	Describes the most recently recognized exception	
14	EPC	Exception Program Counter 	r	Contains the return address after an exception	
15	PRID	Processor ID	r	Cop0 type and revision level	
16	ERREG	???	?	????	

	Note that some of these registers will be explained later in the part on exception handling.  But for now we will return to how the Cop0 is used in memory management. 

Returning to the TLB
	As stated before the TLB is a fully associative memory that holds 64 entries to provide a mapping of 64 4kB pages. Each TLB entry is 64 bits wide.  This is referenced by the Index, Random, TBL high, and TBL low. It is used to virtual to physical address mapping. 

The Index Register
The Index register is a 32-bit, read-write register, which has a 6-bit field used to index to a specific entry in the 64-entry TLB file. The high-order bit of the register is a status bit which reflects the success or failure of a TLB Probe (tlbp) instruction.. The Index register also specifies the TLB entry that will be affected by the TLB Read (tlbr) and TLB Write Index (tlbwi) instructions. the following shows the format of the Index register.

31	30	14	13	8	7	0	
P	0	Index	0	
1	17	6	8	


P 	Probe failure. Set to 1 when the last TLBProbe (tlbp) instruction was unsuccessful.
Index 	Index to the TLB entry that will be affected by the TLBRead and TLBWrite instructions.
0 	Reserved. Must be written as zero, returns zero when read.

The Random Register
The Random register is a 32-bit read-only register. The format of the Random register is below. The six-bit Random field indexes a Random entry in the TLB. It is basically a counter which decrements on every clock cycle, but which is constrained to count in the range of 63 to 8. That is, software is guaranteed that the Random register will never index into the first 8 TLB entries. These entries can be locked by software into the TLB file, guaranteeing that no TLB miss exceptions will occur in operations which use those virtual address. This is useful for particularly critical areas of the operating system.

			
0	Random	0	
18	6	8	

Random 	A random index (with a value from 8 to 63) to a TLB entry.
0 		Reserved. Returns zero when read.

The Random register is typically used in the processing of a TLB miss exception. The Random register provides software with a suggested TLB entry to be written with the correct translation; although slightly less efficient than a Least Recently Used (LRU) algorithm, Random replacement offers substantially similar performance while allowing dramatically simpler hardware and software management. To perform a TLB replacement, the TLB Write Random (tlbwr) instruction is used to write the TLB entry indexed by this register. At reset, this counter is preset to the value 63. Thus, it is possible for two processors to operate in lock-step, even when using the Random TLB replacement algorithm. Also, software may directly read this register, although this feature probably has little utility outside of device testing and diagnostics.


TBL High and TBL Low Registers
These two registers provide the data path for operations which read, write, or probe the TLB file. The format of these registers is the same as the format of a TLB entry. 

TBL High	TBL Low		
VPN	PID	0	FPN	N	D	V	G	0		
20	6	6	20	1	1	1	1	8		

VPN	Virtual Page Number. Bits 31..12 of virtual address.
PID 	Process ID field. A 6-bit field which lets multiple processes share the TLB while each process has a distinct mapping of otherwise identical virtual page numbers.
PFN 	Page Frame Number. Bits 31..12 of the physical address. 
N 	Non-cacheable. If this bit is set, the page is marked as non-cacheable 
D 	Dirty. If this bit is set, the page is marked as "dirty" and therefore writable. This bit is actually a "write-protect" bit that software can use to prevent alteration of data
V 	Valid. If this bit is set, it indicates that the TLB entry is valid; otherwise, a TLBL or TLBS Miss occurs.
G	Global. If this bit is set, the R3000A ignores the PID match requirement for valid translation. In kseg2, the Global bit lets the kernel access all mapped data without requiring it to save or restore PID (Process ID) values.
0Reserved. Must be written as '0', returns '0' when read.


Exception Handling
	There are times when in is necessary to suspend a program in order to process a hardware or software function. The exception processing capability of the R3000A is provided to assure an orderly transfer of control from an executing program to the kernel. Exceptions may be broadly divided into two categories: they can be caused by an instruction or instruction sequence, including an unusual condition arising during its execution; or can be caused by external events such as interrupts. When an R3000A detects an exception, the normal sequence of instruction flow is suspended; the processor is forced to kernel mode where it can respond to the abnormal or asynchronous event. The table below lists the exceptions recognized by the R3000A.

Exception	Mnemonic	Cause	
Reset	Reset	Assertion of the Reset signal causes an exception that transfers control to the special vector at virtual address 0xbfc0_0000 (The start of the BIOS)	
Bus Error	IBE  DBE (Data)	Assertion of the Bus Error input during a read operation, due to such external events as bus timeout, backplane memory errors, invalid physical address, or invalid access types.	
Address Error	AdEL (Load)  AdES (Store)	Attempt to load, fetch, or store an unaligned word; that is, a word or halfword at an address not evenly divisible by four or two, respectively. Also caused by reference to a virtual address with most significant bit set while in User Mode.	
Overflow	Ovf	Twos complement overflow during add or subtract.	
System Call	Sys	Execution of the SYSCALL Trap Instruction	
Breakpoint	Bp	Execution of the break instruction	
Reserved Instruction	RI	Execution of an instruction with an undefined or reserved major operation code (bits 31:26), or a special instruction whose minor opcode (bits 5:0) is undefined.	
Co-processor Unusable	CpU	Execution of a co-processor instruction when the CU (Co-processor  usable) bit is not set for the target co-processor.	
TLB Miss	TLBL (Load) TLBS (Store)	A referenced TLB entrys Valid bit isnt set	
TLB Modified	Mod	During a store instruction, the Valid bit is set but the dirty bit is not set in a matching TLB entry. 	
Interrupt	Int	Assertion of one of the six hardware interrupt inputs or setting of one of the two software interrupt bits in the Cause register.	

Returning to the Cop0
	The Cop0 controls the exception handling with the use of the Cause register, the EPC register, the Status register, the BADV register, and the Context register. A brief description of each follows, after which the rest of the Cop0 registers for breakpoint management will be described for the sake of completeness.

The Cause Register
The contents of the Cause register describe the last exception. A 5-bit exception code indicates the cause of the current exception; the remaining fields contain detailed information specific to certain exceptions. All bits in this register, with the exception of the SW bits, are read-only. 

31			0	
BD	0	CE	0	IP	SW	0	EXECODE	0	
1	1	2	12	6	2	1	5	2	


BD 		Branch Delay.  The Branch Delay bit is set (1) if the last exception was taken while the
processor was executing in the branch delay slot. If so, then the EPC will be rolled back to point to the branch instruction, so that it can be re-executed and the branch direction re-determined..
CE 		Coprocessor Error, Contains the coprocessor number if the exception occurred because of a coprocessor instruction for a coprocessor which wasn't enabled in SR.
IP 		Interrupts Pending. It indicates which interrupts are pending. Regardless of which interrupts are masked, the IP field can be used to determine which interrupts are pending.
SW 		Software Interrupts.  The SW bits can be written to set or reset software interrupts. As long as any of the bits are set within the SW field they will cause an interrupt if the corresponding bit is set in SR under the interrupt mask field.
0  		Reserved,  Must Be Written as 0. Returns 0 when Read
EXECODE 	Exception Code Field. Describes the type of exception that occurred. The following table lists the type of exception that it was.

Number	Mnemonic	Description	
0	INT	External Interrupt	
1	MOD	TLB Modification Exception	
2	TLBL	TLB miss Exception (Load or instruction fetch)	
3	TLBS	TLB miss exception (Store)	
4	ADEL	Address Error Exception (Load or instruction fetch)	
5	ADES	Address Error Exception (Store)	
6	IBE	Bus Error Exception (for Instruction Fetch)	
7	DBE	Bus Error Exception (for data Load or Store)	
8	SYS	SYSCALL Exception	
9	BP	Breakpoint Exception	
10	RI	Reserved Instruction Exception	
11	CPU	Co-Processor Unusable Exception	
12	OVF	Arithmetic Overflow Exception	
13-31	-	Reserved	


The EPC (Exception Program Counter) Register
The 32-bit EPC register contains the virtual address of the instruction which took the exception, from which point processing resumes after the exception has been serviced. When the virtual address of the instruction resides in a branch delay slot, the EPC contains the virtual address of the instruction immediately preceding the exception (that is, the EPC points to the Branch or Jump instruction).

BADV Register
The BADV register saves the entire bad virtual address for any addressing exception.

Context Register
The Context register duplicates some of the information in the BADV register, but provides this information in a form that may be more useful for a software TLB exception handler. The following illustrates the layout of the Context register. The Context register is used to allow software to quickly determine the main memory address of the page table entry corresponding to the bad virtual address, and allows the TLB to be updated by software very quickly (using a nine-instruction code sequence).


PTE Base	BADV	0	
11	19	2	

0		Reserved, read as 0 and must be written as 0
BADV		Failing virtual page number (set by hardware read only derived from BADV register
PTE Base	Base address of page table entry, set by the kernel

The Status Register
The Status register contains all the major status bits; any exception puts the system in Kernel mode. All bits in the status register, with the exception of the TS (TLB Shutdown) bit, are readable and writable; the TS bit is read-only. Figure 5.4 shows the functionality of the various bits in the status register. The status register contains a three level stack (current, previous, and old) of the kernel/user mode bit (KU) and the interrupt enable (IE) bit. The stack is pushed when each exception is taken, and popped by the Restore From Exception instruction. These bits may also be directly read or written. At reset, the SWc, KUc, and IEc bits are set to zero; BEV is set to one; and the value of the TS bit is set to 0 (TS = 0) The rest of the bit fields are undefined after reset. 

31		0	
CU	0	RE	0	BEV	TS	PE	CM	PZ	SwC	IsC	IntMask	0	KUo	IEo	KUp	IEp	KUc	IEc	
4	2	1	2	1	1	1	1	1	1	1	8	2	1	1	1	1	1	1	


The various bits of the status register are defined as follows:

CU 	Co-processor Usability. These bits individually control user level access to co-processor operations, including the polling of the BrCond input port and the manipulation of the System Control Co-processor (CP0). CU2 is for the GTE, CU1 is for the FPA, which is not available in the PSX.

RE 	Reverse Endianness. The R3000A allows the system to determine the byte ordering convention for the Kernel mode, and the default setting for user mode, at reset time. If this bit is cleared, the endianness
defined at reset is used for the current user task. If this bit is set, then the user task will operate with the opposite byte ordering convention from that determined at reset. This bit has no effect on kernel mode.

BEV 	Bootstrap Exception Vector. The value of this bit determines the locations of the exception vectors of the processor. If BEV = 1, then the processor is in Bootstrap mode, and the exception vectors reside
in the BIOS ROM. If BEV = 0, then the processor is in normal mode, and the exception vectors reside in RAM.

TS 	TLB Shutdown. This bit reflects whether the TLB is functioning.

PE 	Parity Error. This field should be written with a "1" at boot time. Once initialized, this field will always be read as "0'.

CM 	Cache Miss. This bit is set if a cache miss occurred while the cache was isolated. It is useful in determining the size and operation of the internal cache subsystem.

PZ 	Parity Zero. This field should always be written with a "0".

SwC 	Swap Caches. Setting this bit causes the execution core to use the on-chip instruction cache as a data cache and vice-versa. Resetting the bit to zero unswaps the caches. This is useful for certain operations
such as instruction cache flushing. This feature is not intended for normal operation with the caches swapped.

IsC 	Isolate Cache. If this bit is set, the data cache is isolated from main memory; that is, store operations modify the data cache but do not cause a main memory write to occur, and load operations return the data value from the cache whether or not a cache hit occurred. This bit is also useful in various operations such as flushing.

IM 	Interrupt Mask. This 8-bit field can be used to mask the hardware and software interrupts to the execution engine (that is, not allow them to cause an exception). IM(1:0) are used to mask the software interrupts, and IM (7:2) mask the 6 external interrupts. A value of 0 disables a particular interrupt, and a 1 enables it. Note that the IE bit is a global interrupt enable; that is, if the IE is used to disable interrupts, the value of particular mask bits is irrelevant; if IE enables interrupts, then a particular interrupt is selectively masked by this field.

KUo 	Kernel/User old. This is the privilege state two exceptions previously. A 0 indicates kernel mode.

IEo 	Interrupt Enable old. This is the global interrupt enable state two exceptions previously. A 1 indicates that interrupts were enabled, subject to the IM mask.

KUp 	Kernel/User previous. This is the privilege state prior to the current exception A 0 indicates kernel mode.

IEp 	Interrupt Enable previous. This is the global interrupt enable state prior to the current exception. A 1 indicates that interrupts were enabled, subject to the IM mask.

KUc 	Kernel/User current. This is the current privilege state. A 0 indicates kernel mode.

IEc 	Interrupt Enable current. This is the current global interrupt enable state. A 1 indicates that interrupts are enabled, subject to the IM mask.
0	Fields indicated as 0 are reserved; they must be written as 0, and will
return 0 when read.

PRID Register
This register is useful to software in determining which revision of the processor is executing the code. The format of this register is illustrated below.

0	Imp	Rev	
16	8	8	

Imp      	3        	CoP0 type R3000A
         	7       	IDT unique (3041) use REV to determine correct configuration.
Rev               	Revision level.

EXCEPTION VECTOR LOCATIONS
The R3000A separates exceptions into three vector spaces. The value of each vector depends on the BEV (Boot Exception Vector) bit of the status register, which allows two alternate sets of vectors (and thus two different pieces of code) to be used. Typically, this is used to allow diagnostic tests to occur before the functionality of the cache is validated; processor reset forces the value of the BEV bit to a 1. 

Exception	Virtual Address	Physical Address	
Reset	0xbfc0_0000	0x1fc0_0000	
UTLB Miss	0x8000_0000	0x0000_0000	
General	0x8000_0080	0x0000_0080	
Exception Vectors When BEV = 0

Exception	Virtual Address	Physical Address	
Reset	0xbfc0_0000	0x1fc0_0000	
UTLB Miss	0xbfc0_0100	0x1fc0_0100	
General	0xbfc0_0180	0x1fc0_0180	
Exception Vectors When BEV =1


Exception Priority
	The following is a priority list of exceptions:

Reset 	At any time (highest)
AdEL 	Memory (Load instruction)
AdES 	Memory (Store instruction)
DBE 	Memory (Load or store)
MOD 	ALU (Data TLB)
TLBL 	ALU (DTLB Miss)
TLBS 	ALU (DTLB Miss)
Ovf 	ALU
Int 	ALU
Sys 	RD (Instruction Decode)
Bp 	RD (Instruction Decode)
RI 	RD (Instruction Decode)
CpU	RD (Instruction Decode)
TLBL 	I-Fetch (ITLB Miss)
AdEL 	IVA (Instruction Virtual Address)
IBE 	RD (end of I-Fetch, lowest)
	
Breakpoint Management
	The following is a listing of the registers in Cop0 that are used for breakpoint management. These registers are very useful for low-level debugging. 

BPC
Breakpoint on execute. Sets the breakpoint address to break on execute.

BDA       
Breakpoint on data access. Sets the breakpoint address for load/store operations

DCIC
Breakpoint control. To use the Execution breakpoint, set PC. To use the Data access
breakpoint set DA and either R, W or both. Both breakpoints can be used simultaneously. When a breakpoint occurs the PSX jumps to 0x0000_0040.


1	1	1	0	W	R	DA	PC	1	0	
1	1	1	1	1	1	1	1	1	23	

W	0
1	Break on Write
R	0
1	Break on Read
DA	0	Data access breakpoint disabled
1	Data access breakpoint enabled
PC	0	Execution breakpoint disabled
1	Execution breakpoint enabled



BDAM 
Data Access breakpoint mask. Data fetch address is ANDed with this value and then compared
to the value in BDA

BPCM
Execute breakpoint mask. Program counter is ANDed with this value and then compared to
the value in BPC.


DMA
	From time to time the PSX will need to take the CPU off the main bus in order to give a device access directly to Memory.  The devices able to take control of the bus are the CD-ROM, MDEC, GPU, SPU, and the Parallel port. There are 7 DMA channels in all (The GPU and MDEC use two)  The DMA registers reside between 0x1f80_1080 and 0x1f80_10f4. The DMA channel registers are located starting at 0x1f80_1080. The base address for each channel is as follows

Base Address	Channel Number	Device	
0x1f80_1080	DMA channel 0	MDECin	
0x1f80_1090	DMA channel 1	MDECout	
0x1f80_10a0	DMA channel 2	GPU (lists + image data)	
0x1f80_10b0	DMA channel 3	CD-ROM	
0x1f80_10c0	DMA channel 4	SPU	
0x1f80_10d0	DMA channel 5	PIO	
0x1f80_10e0	DMA channel 6	GPU OTC (reverse clear the Ordering Table)	

	 Each channel has three 32-bit control registers at a offset of the base address for that particular channel. These registers are the DMA Memory Address Register (D_MADR) at the base address, DMA Block Control Register (D_BCR)at base+4, and the DMA Channel Control Register (D_CHCR) at base+8.
In order to use DMA the appropriate channel must be enabled. This is done using the DMA Primary Control Register (DPCR) located at 0x1f80_10f0. 

DMA Primary Control Register  (DPCR)		0x1f80_10f0

	DMA6	DMA5	DMA4	DMA3	DMA2	DMA1	DMA0	
4	4	4	4	4	4	4	4	

Each register has a 4 bit control block allocated in this register.
Bit 3	1= DMA Enabled
      2	Unknown
      1	Unknown
      0	Unknown

Bit 3 must be set for a channel to operate.

	As stated above, each  device has three 32-bit control registers within its own DMA address space. The following describes their functions. The n represents 8,9,a,b,c,d,e for DMA channels 0,1,2,3,4,5,6 respectively. 
 


DMA Memory Address Register (D_MADR)	0x1f80_10n0

31	0	
MADR

MADR	Pointer to the virtual address the DMA will start reading from/writing to.



DMA Block Control Register (D_BCR)		0x1f80_10n4

31	0	
BA	BS	
16	16	

BA	Amount of blocks
BS	Blocksize (words)

The channel will transfer BA blocks of BS words. Take care not to set the size larger than the buffer of the corresponding unit can hold. (GPU & SPU both have a $10 word buffer). A larger blocksize, means a faster transfer.


DMA Channel Control Register (D_CHCR) 	0x1f80_10n8

31								0	
0	TR	0	LI	CO	0	DR	
7	1	13	1	1	8	1	


TR	0	No DMA transfer busy.
1	Start DMA transfer/DMA transfer busy.
LR	1	Transfer linked list. (GPU only)
CO	1	Transfer continuous stream of data.
DR	1	Direction from memory
0	Direction from memory



The last register is used to control DMA interrupts.  The usage is currently unknown.

DMA Interrupt Control Register (DICR)		0x1f80_10f4


Video

Overview
The GPU is the unit responsible for the graphical output of the PSX. It handles display and drawing of all graphics. It has the control over an 1MB frame buffer, which at 16 bits per pixel gives you a maximum surface of 1024x512 resolution. It also contains a 2Kb texture cache for increased speed.  The display can be set for 15-bit color or 24-bit color.
	Because the PSX also totally lacks an FPU. A second coprocessor has been added called the Geometry Transformation Engine or GTE. The GTE is the heart of all 3d calculations on the PSX. The GTE can perform vector and matrix operations, perspective transformation, color equations and the like. It is much faster than the CPU on these operations. It is mounted as the second coprocessor (Cop2) and as such takes up no physical address space in the PSX. The GTE is covered later in the document.

The Graphics Processing Unit (GPU)
	As stated before the GPU is responsible for graphical output. It has at its disposal a 1 MB frame buffer and registers to access it. The frame buffer it totally inaccessible to the CPU, meaning that it doesnt reside in addressable memory. The only way to access it is through the GPU. The GPU is able to take commands from the CPU, or via DMA to place objects on the frame buffer to be displayed. Communication is handled through a command and data port. It has a 64 byte command FIFO buffer, which can hold up to 3 commands and is connected to a DMA channel for transfer of image data and linked command lists (channel 2) and a DMA channel for reverse clearing an Ordering Table (channel 6). 

Communication and Ordering Tables (OT).
All data regarding drawing and drawing environment are sent as packets to the GPU. Each packet tells the GPU how and where to draw one primitive, or it sets one of the drawing environment parameters. The display environment is set up through single word commands using the control port of the GPU.
Packets can be forwarded word by word through the data port of the GPU, or more efficiently for large numbers of packets through DMA. A special DMA mode was created for this so large numbers of packets can be sent and managed easily. In this mode a list of packets is sent, where each entry in the list contains a header which is one word containing the address of the next entry and the size of the packet and the packet itself. A result of this is that the packets do not need to be stored sequentially. This makes it possible to easily control the order in which packets get processed. The GPU processes the packets it gets in the order they are offered. So the first entry in the list also gets drawn first. To insert a packet into the middle of the list simply find the packet after which needs it to be processed, replace the address in that packet with the address of the new packet, and let that point to the address that was replaced.
To aid in finding a location in the list, the Ordering Table was invented. At first this is basically a linked list with entries of packet size 0, so it's a list of only list entry headers, where each entry points to to the next entry. Then as primitives are generated by your program you can then add them to the table at a certain index. Just read the address in the table entry and replace it with the address of the new packet and store the address from the table in the packet. When all packets are generated drawing will just require passing the address of the first list entry to the DMA and the packets will get drawn in the order you entered the packets to the table. Packets entered at a higher table index will get drawn after those entered at a lower table index. Packets entered at the same index will get drawn in the order they were entered, the last one first.
In 3d drawing it's most common that you want the primitives with the highest Z value to be drawn first, so it would be nice if the table would be drawn the other way around, so the Z value can be used as index. This is a simple thing, just make a table of which each entry points to the previous entry, and start the DMA with the address of the last table entry. To assist you in making such a table, a special DMA channel is available which creates it for you.

The Frame Buffer
The frame buffer is the memory which stores all graphic data which the GPU can access and manipulate, while drawing and displaying an image . The memory is under the GPU and cannot be accessed by the CPU directly. It is operated solely by the GPU. The frame buffer has a size of 1 MB and is  treated as a space of 1024 pixels wide and 512 pixels high. Each "pixel" has the size of one word (16 bit). It is not treated linearly like usual memory, but is accessed through coordinates, with an upper left corner of (0,0) and a lower right corner of (1023,511).
When data is displayed from the frame buffer, a rectangular area is read from the specified coordinate within this memory. The size of this area can be chosen from several hardware defined types. Note that these hardware sizes are only valid when the X and Y stop/start registers are at their default values. This display area can be displayed in two color formats, being 15bit direct and 24bit direct. The data format of one pixel is as follows.

15-bit direct display

Pixel
M	Blue	Green	Red	
15	14	10	9	5	4	0	

This means each color has a value of 0-31. The MSB of a pixel (M) is used to mask the pixel.


24-bit direct display
The GPU can also be set to 24bit mode, in which case 3 bytes form one pixel, 1 byte for each color. Data in this mode is arranged as follows:

Pixel 0	Pixel 1	Pixel 2	
G0	R0	R1	B0	B1	G1	
15	8	7	0	15	8	7	0	15	8	7	0	

Thus 2 display pixels are encoded in 3 frame buffer pixels. They are displayed as follows: [R0,G0,B0] [R1,G1,B1].

Primitives.
A basic figure which the GPU can draw is called a primitive, and it can draw the following:

Polygon
 	The GPU can draw 3 point and 4 point polygons. Each point of the polygon specifies a point in the frame buffer. The polygon can be also be gourad shaded.  The correct order of vertices for 4 point polygons is as follows

1		2	
			
3		4	

A 4 point polygon is processed internally as two 3 point polygons. also note when drawing a polygon the GPU will not draw the right most and bottom edge. So a (0,0)-(32,32) rectangle will actually be drawn as (0,0)-(31,31). Make sure adjoining polygons have the same coordinates if you want them to touch each other!. 

Polygon with texture
A primitive of this type is the same as above, except that a texture is applied. Each vertex of the polygon maps to a point on a texture page in the frame buffer. The polygon can be gourad shaded.
 Because a 4 point polygon is processed internally as two 3 point polygons, texture mapping is also done independently for both halves. This has some annoying consequences.

Rectangle
A rectangle is defined by the location of the top left corner and its width and height. Width and height can be either free, 8*8 or 16*16. It's drawn much faster than a polygon, but gourad shading is not possible.
 



Sprite
A sprite is a textured rectangle, defined as a rectangle with coordinates on a texture page. Like the rectangle is drawn much faster than the polygon equivalent. No gourad shading possible. Even though the primitive is called a sprite, it has nothing in common with the traditional sprite, other than that it's a rectangular piece of graphics. Unlike the PSX sprite, the traditional sprite is NOT drawn to the bitmap, but gets sent to the screen instead of the actual graphics data at that location at display time.

Line
A line is a straight line between 2 specified points. The line can be gourad shaded. A special form is the polyline, for which an arbitrary number of points can be specified.

Dot
The dot primitive draws one pixel at the specified coordinate and in the specified color. It is actually a special form of rectangle, with a size of 1x1.

Textures
A texture is an image put on a polygon or sprite. It is necessary to prepare the data beforehand in the frame buffer. This image is called a texture pattern. The texture pattern is located on a texture page which has a standard size and is located somewhere in the frame buffer, see below. The data of a texture can be stored in 3 different modes

15-bit direct mode

I0
S	Blue	Green	Red	
15	14	10	9	5	4	0	

This means each color has a value of 0-31. The MSB of a pixel (S) is used to specify it the pixel is semi transparent or not. More on that later.


8bit CLUT mode,
 	Each pixel is defined by 8bits and the value of the pixel is converted to a 15-bit color using the CLUT(color lookup table) much like standard VGA pictures. So in effect you have 256 colors which are in 15bit precision.

I1	I0	
15	8	7	0	

 I0 is the index to the CLUT for the left pixel, I1 for the right.

4-bit CLUT mode,
 	Same as above except that only 16 colors can be used. Data is arranged as follows:

I3	I2	I1	I0	
15	12	11	8	7	4	3	0	
 
I0 is first drawn to the left to I3 to the right.

 
Texture Pages
Texture pages have a unit size of 256*256 pixels, regardless of color mode. This means that in the frame buffer they will be 64 pixels wide for 4bit CLUT, 128 pixels wide for 8bit CLUT and 256 pixels wide for 15-bit direct. The pixels are addressed with coordinates relative to the location of the texture page, not the frame buffer. So the top left texture coordinate on a texture page is (0,0) and the bottom right one is (255,255). The pages can be located in the frame buffer on X multiples of 64 and Y multiples of 256. More than one texture page can be set up, but each primitive can only contain texture from one page.

Texture Windows
The area within a texture window is repeated throughout the texture page. The data is not actually stored all over the texture page but the GPU reads the repeated patterns as if they were there. The X and Y
and H and W must be multiples of 8.

CLUT (Color Lookup Table)
The CLUT is a the table where the colors are stored for the image data in the CLUT modes. The pixels of those images are used as indexes to this table. The CLUT is arranged in the frame buffer as a 256x1 image for the 8bit CLUT mode, and a 16x1 image for the 4bit CLUT mode. Each pixel as a 16 bit value, the first 15 used of a 15 bit color, and the 16th used for semi-transparency. The CLUT data can be arranged in the frame buffer at X multiples of 16 (X=0,16,32,48,etc) and anywhere in the Y range of 0-511. More than one CLUT can be prepared but only one can be used for each
primitive.

Texture Caching
If polygons with texture are displayed, the GPU needs to read these from the frame buffer. This slows down the drawing process, and as a result the number of polygons that can be drawn in a given time span. To speed up this process the GPU is equipped with a texture cache, so a given piece of texture needs not to be read multiple times in succession. The texture cache size depends on the color mode used for the textures. In 4-bit CLUT mode it has a size of 64x64, in 8-bit CLUT it's 32x64 and in 15-bit direct is 32x32. A general speed up can be achieved by setting up textures according to these sizes. For further speed gain a more precise knowledge of how the cache works is necessary.

Cache blocks
The texture page is divided into non-overlapping cache blocks, each of a unit size according to color mode. These cache blocks are tiled within the texture page.

Cache Block 0	  1	  2	
			

- Cache entries
	Each cache block is divided into 256 cache entries, which are numbered sequentially, and are 8 bytes wide. So a cache entry holds 16 4-bit CLUT pixels 8 8-bit CULT pixels, or 4 15bitdirect pixels.

4-bit and 8-bit CLUT
0	1	2	3	
4	5	6	7	
8	9			
c				
				

15-bit direct
0	1	2	3	4	5	6	7	
8	9	a	b	c	d	e	f	
10	11							
18								
								

The cache can hold only one cache entry by the same number, so if for example, a
piece of texture spans multiple cache blocks and it has data on entry 9 of block 1, but also on entry 9 of block 2, these cannot be in the cache at once.


Rendering options
There are 3 modes which affect the way the GPU renders the primitives to the frame buffer.

Semi Transparency
When semi transparency is set for a pixel, the GPU first reads the pixel it wants to write to, and then calculates the color it will write from the 2 pixels according to the semi-transparency mode selected. Processing speed is lower in this mode because additional reading and calculating are necessary. There are 4 semi-transparency modes in the GPU.

B=  the pixel read from the image in the frame buffer, F = the half transparent pixel

1.0 x B + 0.5 x F
1.0 x B + 1.0 x F
1.0 x B - 1.0 x F
1.0 x B + 0.25 x F

A new semi transparency mode can be set for each primitive. For primitives without texture semi- transparency can be selected. For primitives with texture semi transparency is stored in the MSB of each pixel, so some pixels can be set to STP others can be drawn opaque. For the CLUT modes the STP bit is obtained from the CLUT. So if a color index points to a color in the CLUT with the MSB set, it will be drawn semi transparent.

When the color is black(BGR=0), STP is processed different from when it's not black (BGR<>0). The table below shows the differences:


Transparency Processing (bit 1 of command packet)
BGR	STP	off	on	
0,0,0	0	Transparent	Transparent	
0,0,0	1	Non-transparent	Non-transparent	
x,x,x	0	Non-transparent	Non-transparent	
x,x,x	1	Non-transparent	Transparent	

Shading
The GPU has a shading function, which will scale the color of a primitive to a specified brightness. There are 2 shading modes: Flat shading, and gourad shading. Flat shading is the mode in which one brightness value is specified for the entire primitive. In gourad shading mode, a different brightness value can be given for each vertex of a primitive, and the brightness between these points is automatically interpolated.

Mask
The mask function will prevent to GPU to write to specific pixels when drawing in the frame buffer. This means that when the GPU is drawing a primitive to a masked area, it will first read the pixel at the coordinate it wants to write to, check if it's masking bit is set, and if so refrain from writing to that particular pixel. The masking bit is the MSB of the pixel, just like the STP bit. To set this masking bit, the GPU provides a mask out mode, which will set the MSB of any pixel it writes. If both mask out and mask evaluation are on, the GPU will not draw to pixels with set MSB's, and will draw pixels with set MSB's to the others, these in turn becoming masked pixels.

Drawing Environment
The drawing environment specifies all global parameters the GPU needs for drawing primitives.

Drawing offset.
 	This locates the top left corner of the drawing area. Coordinates of  primitives originate to this point. So if the drawing offset is (0,240) and a vertex of a polygon is located at (16,20) it will be drawn to the  frame buffer at (0+16,240+20).
 
Drawing clip area
 	This specifies the maximum range the GPU draws primitives to. So in effect  it specifies the top left and bottom right corner of the drawing area.
 
Dither enable
 	When dither is enabled the GPU will dither areas during shading. It will  process internally in 24 bit and dither the colors when converting back to 15-bit. When it is off, the lower 3 bits of each color simply get discarded.
 
Draw to display enable.
 This will enable/disable any drawing to the area that is currently displayed.
 
Mask enable
 When turned on any pixel drawn to the frame buffer by the GPU will have a set masking bit. (= set MSB)
 
Mask judgement enable
 Specifies if the mask data from the frame buffer is evaluated at the time of drawing.

Display Environment.
This contains all information about the display, and the area displayed.
	
Display area in frame buffer
 	This specifies the resolution of the display. The size can be set as follows:
Width: 256,320,384,512 or 640 pixels
Height: 240 or 480 pixels

 	These sizes are only an indication on how many pixels will be displayed using a default start end. These settings only specify the resolution of the display.

Display start/end.
 	Specifies where the display area is positioned on the screen, and how much data gets sent to the screen. The screen sizes of the display area are valid only if the horizontal/vertical start/end values are default. By changing these you can get bigger/smaller display screens. On most TV's there is some black around the edge, which can be utilized by setting the start of the screen earlier and the end later. The size of the pixels is NOT changed with these settings, the GPU simply sends more data to the screen. Some monitors/TVs have a smaller display area and the extended size might not be visible on those sets.(Mine is capable of about 330 pixels horizontal, and 272 vertical in 320*240 mode)




Interlace enable
 When enabled the GPU will display the even and odd lines of the display area alternately. It is necessary to set this when using 480 lines as the number of scan lines on a TV screen are not sufficient to display 480 lines.

15bit/24bit direct display
 Switches between 15bit/24bit display mode.

Video mode
 	Selects which video mode to use, which are either PAL or NTSC.


GPU operation
GPU control registers.
There are 2 32 bit IO ports for the GPU, which are at 0x1f80_1810 for GPU Data and 0x1f80_1814 for GPU control/Status. The data register is used to exchange data with the GPU and  the control/status register gives the status of the GPU when read, and sets the control bits when written to.

Control/Status Register 			0x1f80_1814

Status (Read) High
31	16	
lcf	dma	com	img	busy	?	?	den	isinter	isrgb24	Video	Height	Width0	Width1	
1	2	1	1	1	1	1	1	1	1	1	1	2	1	

W0 	W1
Width	00	0   	256 pixels
01 	0   	320
10	0   	512
11	0   	640
00	1	384
Height	0  		240 pixels
1		480

Video     0  	NTSC
1  	PAL
isrgb24	0  	15-bit direct mode
1	24-bit direct mode
isinter	0	Interlace off
1	Interlace on
den	0	Display enabled
1	Display disabled
busy	0	GPU is Busy  (i.e. drawing primitives)
1	GPU is Idle
img	0	Not Ready to send image (packet $c0)
1	Ready
com	0	Not Ready to receive commands
1	Ready
dma	00	DMA off, communication through GP0
01	Unknown
10 	DMA CPU -> GPU
11DMA GPU -> CPU
lcf 	0	Drawing even lines in interlace mode
1	Drawing uneven lines in interlace mode

Status (Read) Low
15	0	
?	?	?	me	md	dfe	dtd	tp	abr	ty	tx	
1	1	1	1	1	1	1	2	2	1	4	

tx	0	0		Texture page X = tx*64
1     	64
2    	128
3    	196
4	 ...
ty	0	0		Texture page Y
1	256
abr	00	0.5xB+0.5 x F	Semi transparent state
01	1.0xB+1.0 x F
10	1.0xB-1.0 x F
11	1.0xB+0.25 x F
tp	00	4-bit CLUT	Texture page color mode
 	01	8-bit CLUT
10	15-bit
dtd	0	Dither off
 	1	Dither on
dfe	0	off		Draw to display area prohibited
1	on		Draw to display area allowed
md	0	off		Do not apply mask bit to drawn pixels
1	on		Apply mask bit to drawn pixels
me	0	off		Draw over pixel with mask set
1	on 		No drawing to pixels with set mask bit.



Control (Write)
A control command is composed of one word as follows:

command	parameter	
31	16	15	0	

The composition of the parameter is different for each command.

Reset GPU
command	0x00
parameter	0x000000
Description 	Resets the GPU. Also turns off the screen. (sets status to $14802000)

Reset Command Buffer
command	0x01
parameter	0x000000
Description	Resets the command buffer.

Reset IRQ
command	0x02
parameter	0x000000
Description	Resets the IRQ.

Display Enable
command	0x03
parameter	0x000000	Display disable
0x000001	Display enable
description	Turns on/off display. Note that a turned off screen still gives the flicker of NTSC on a pal screen if NTSC mode is selected..

DMA setup.
command	0x04
parameter	0x000000 	DMA disabled
0x000001 	Unknown DMA function
0x000002	DMA CPU to GPU
0x000003	DMA GPU to CPU
description	Sets DMA direction. 

Start of display area
command	0x05
parameter	bit  0x00-0x09	X (0-1023)
             		bit 0x0a-0x12	Y (0-512) = Y<<10 + X
description	Locates the top left corner of the display area.

Horizontal Display range
command	0x06
parameter	bit  0x00-0x0b	X1 (0x1f4-0xCDA)
bit  0x0c-0x17	X2  = X1+X2<<12
description 	Specifies the horizontal range within which the display area is displayed. The display is relative to the display start, so X coordinate 0 will be at the value in X1. The display end is not relative to the display start. The number of pixels that get sent to the screen in 320 mode are (X2-X1)/8. How many actually are visible depends on your TV/monitor. (normally $260-$c56)

Vertical Display range
command	0x07
parameter	bit  0x00-0x09	Y1
bit  0x0a-0x14	Y2  = Y1+Y2<<10
description 	Specifies the vertical range within which the display area is displayed. The display is relative to the display start, so Y coordinate 0 will be at the value in Y1. The display end is not relative to the display start. The number of pixels that get sent to the display are Y2-Y1, in 240 mode. (Not sure about the default values, should be something like NTSC $010-$100, PAL  $023-$123)

Display mode
command	0x08
parameter	bit  0x00-0x01	Width 0
bit  0x02		Height
bit  0x03		Video mode:  See above
bit  0x04		Isrgb24
bit  0x05		Isinter
bit  0x06		Width1
bit  0x07		Reverse flag
description	Sets the display mode.

Unknown
command	0x09
parameter	0x000001  ??
description	Used with value $000001

GPU Info
command	0x10
parameter	0x000000
0x000001
 		0x000002
 		0x000003	Draw area top left
 		0x000004	Draw area bottom right
0x000005	Draw offset
0x000006
0x000007	GPU Type, should return 2 for a standard GPU description. Returns requested info. Read result from GP0. 0,1 seem to return draw area top left also 6 seems to return draw offset too.

?????
command	0x20
parameter	???????
description	Used with value $000504


Command Packets, Data Register
Primitive command packets use an 8 bit command value which is present in all packets. They contain a 3 bit type block and a 5 bit option block of which the meaning of the bits depend on the type. layout is as follows:

Type
000	GPU command
001	Polygon primitive
010	Line primitive
011	Sprite primitive
100	Transfer command
111	Environment command

Configuration of the option blocks for the primitives is as follows:


Polygon
Type	Option	
0	0	1	IIP	VTX	TME	ABE	TGE	
7	6	5	4	3	2	1	0	


Line
Type	Option	
0	1	0	IIP	PLL	0	ABE	0	
7	6	5	4	3	2	1	0	


Sprite
Type	Option	
1	0	0	Size	TME	ABE	0	
7	6	5	4	3	2	1	0	


IIP	0	Flat Shading
1	Gourad Shading
VTX	0	3 vertex polygon
1	4 vertex polygon
TME	0	Texture mapping off
1	Texture mapping on
ABE	0	Semi transparency off
	1	Semi transparency on
TGE	0	Brightness calculation at time of texture mapping on
1	off. (draw texture as is)
Size	00	Free size (Specified by W/H)
01	1 x  1
10	8 x  8
11	16 x 16
PLL	0	Single line (2 vertices)
1	Polyline (n vertices)


Color information
Color information is forwarded as 24-bit data. It is parsed to 15-bit by the GPU.

Layout as follows:

Blue	Green	Red	
23	16	15	8	7	0	

Shading information.
For textured primitive shading data is forwarded by this packet. Layout is the same as for color data, the RGB values controlling the brightness of the individual colors ($00-$7f). A value of $80 in a color will take the former value as data.

Blue	Green	Red	
23	16	15	8	7	0	

*Texture Page information
The Data is 16 bit wide, layout is as follows:

0	TP	ABR	TY	TX	
15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0	


TX	0-0xf	X*64  t		texture page x coordinate 
TY	0	0     		texture page y coordinate
1	256
ABR	0	0.5xB+0.5 x F  	Semi transparency mode
1	1.0xB+1.0 x F
2	1.0xB-1.0 x F
3	1.0xB+0.25 x F
TP	0	4-bit CLUT
1	8-bit CLUT
2	15-bit direct

CLUT-ID
Specifies the location of the CLUT data. Data is 16-bits.

Y coordinate 0-511	X coordinate X/16	
15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0	



Abbreviations in packet list

BGR	Color/Shading info see above.
xn,yn	16 bit values of X and Y in frame buffer.
un,vn	8 bit values of X and Y in texture page
tpage	texture page information packet, see above
clut 	CULT ID, see above.

Packet list
The packets sent to the GPU are processed as a group of data, each one word wide. The data must be written to the GPU data register ($1f801810) sequentially. Once all data has been received, the GPU
starts operation.

Overview of packet commands:
Primitive drawing packets
 0x20     monochrome 3 point polygon
 0x24     textured 3 point polygon
 0x28     monochrome 4 point polygon
 0x2c     textured 4 point polygon
 0x30     gradated 3 point polygon
 0x34     gradated textured 3 point polygon
 0x38     gradated 4 point polygon
 0x3c     gradated textured 4 point polygon
 0x40     monochrome line
 0x48     monochrome polyline
 0x50     gradated line
 0x58     gradated line polyline
 0x60     rectangle
 0x64     sprite
 0x68     dot
 0x70     8*8 rectangle
 0x74     8*8 sprite
 0x78     16*16 rectangle
 0x7c     16*16 sprite
GPU command & Transfer packets
 0x01     clear cache
 0x02     frame buffer rectangle draw
 0x80     move image in frame buffer
 0xa0     send image to frame buffer
 0xc0     copy image from frame buffer
Draw mode/environment setting packets
 0xe1     draw mode setting
 0xe2     texture window setting
 0xe3     set drawing area top left
 0xe4     set drawing area bottom right
 0xe5     drawing offset
 0xe6     mask setting






















Packet Descriptions
Primitive Packets

0x20 monochrome 3 point polygon		
Order	31	24	23	16	15	8	7	0		
1	0x20	BGR	Command + Color	
2	y0	x0	Vertex 0	
3	y1	x1	Vertex 1	
4	y2	x2	Vertex 2	


0x24 textured 3 point polygon		
Order	31	24	23	16	15	8	7	0		
1	0x24	BGR	Command + Color	
2	y0	x0	Vertex 0	
3	CLUT	v0	u0	CULT ID + texture coordinates vertex 0	
4	y1	x1	Vertex 1	
5	tpage	v1	u1	Texture page + texture coordinates vertex 1	
6	y2	x2	Vertex 1	
7		v2	u2	Texture coordinates vertex 2	


0x28 monochrome 4 point polygon		
Order	31	24	23	16	15	8	7	0		
1	0x28	BGR	Command + Color	
2	y0	x0	Vertex 0	
3	y1	x1	Vertex 1	
4	y2	x2	Vertex 2	
5	y3	y3	Vertex 3	


0x2c textured 3 point polygon		
Order	31	24	23	16	15	8	7	0		
1	0x2c	BGR	Command + Color Vertex 0	
2	y0	x0	Vertex 0	
3	CLUT	v0	u0	CULT ID + texture coordinates vertex 0	
4	y1	x1	Vertex 1	
5	tpage	v1	u1	Texture page + texture coordinates vertex 1	
6	y2	x2	Vertex 2	
7		v2	u2	Texture coordinates vertex 2	
8	y3	x3	Vertex 3	
9		v3	v3	Texture coordinates vertex 3	

0x30 gradated 3 point polygon		
Order	31	24	23	16	15	8	7	0		
1	0x30	BGR0	Command + Color Vertex 0	
2	y0	x0	Vertex 0	
3		BGR1	Color Vertex 1	
4	y1	x1	Vertex 1	
5		BGR2	Color Vertex 2	
6	y2	x2	Vertex 2	


0x34 shaded textured 3 point polygon		
Order	31	24	23	16	15	8	7	0		
1	0x34	BGR0	Command + Color Vertex 0	
2	y0	x0	Vertex 0	
3	CLUT	v0	u0	CULT ID + texture coordinates vertex 0	
4		BGR1	Color Vertex 1	
5	y1	x1	Vertex 1	
6	tpage	v1	u1	Texture page + texture coordinates vertex 1	
7		BGR2	Color vertex 2	
8	y2	x2	Vertex 2	
9		v2	u2	CULT ID + texture coordinates vertex 2	


0x38 gradated 4 point polygon		
Order	31	24	23	16	15	8	7	0		
1	0x38	BGR0	Command + Color Vertex 0	
2	y0	x0	Vertex 0	
3		BGR1	Color Vertex 1	
4	y1	x1	Vertex 1	
5		BGR2	Color Vertex 2	
6	y2	x2	Vertex 2	
7		BGR3	Color Vertex 3	
8	y3	x3	Vertex 3	


0x3c shaded textured 4 point polygon		
Order	31	24	23	16	15	8	7	0		
1	0x3c	BGR0	Command + Color Vertex 0	
2	y0	x0	Vertex 0	
3	CLUT	v0	u0	CULT ID + texture coordinates vertex 0	
4		BGR1	Color Vertex 1	
5	y1	x1	Vertex 1	
6	tpage	v1	u1	Texture page + texture coordinates vertex 1	
7		BGR2	Color vertex 2	
8	y2	x2	Vertex 2	
9		v2	u2	CULT ID + texture coordinates vertex 2	
10		BGR3	Color vertex 3	
11	y3	x3	Vertex 3	
12		v3	32	CULT ID + texture coordinates vertex 3	



0x40 monochrome line		
Order	31	24	23	16	15	8	7	0		
1	0x40	BGR	Command + Color	
2	y0	x0	Vertex 0	
3	y1	x1	Vertex 1	



0x48 single color polyline		
Order	31	24	23	16	15	8	7	0		
1	0x48	BGR	Command + Color	
2	y0	x0	Vertex 0	
3	y1	x1	Vertex 1	
4	y2	x2	Vertex 2	
	yn	xn	Vertex n	
	0x55555555	Termination code	
Any number of points can be entered, end with termination code.

0x50 gradated line		
Order	31	24	23	16	15	8	7	0		
1	0x50	BGR0	Command + Color Vertex 0	
2	y0	x0	Vertex 0	
3		BGR1	Color Vertex 1	
4	y1	x1	Vertex 1	


0x58 gradated polyline		
Order	31	24	23	16	15	8	7	0		
1	0x58	BGR0	Command + Color Vertex 0	
2	y0	x0	Vertex 0	
3		BGR1	Color Vertex 1	
4	y1	x1	Vertex 1	
5		BGR2	Color Vertex 2	
6	y2	x2	Vertex 2	
		BGRn	Color Vertex n	
	yn	xn	Vertex n	
	0x55555555	Termination code	
Any number of points can be entered, end with termination code.

0x60 Rectangle		
Order	31	24	23	16	15	8	7	0		
1	0x60	BGR	Command + Color 	
2	y	x	upper left corner location	
3	h	w	height and width 	

0x64 Sprite		
Order	31	24	23	16	15	8	7	0		
1	0x64	BGR	Command + Color 	
2	y	x	upper left corner location	
3	CLUT	v	u	CULT ID + texture coordinates page y,x	
4	h	w	height and width	


0x68 Dot		
Order	31	24	23	16	15	8	7	0		
1	0x68	BGR	Command + Color 	
2	y	x	location	




0x70 8x8 Rectangle		
Order	31	24	23	16	15	8	7	0		
1	0x70	BGR	Command + Color 	
2	y	x	location	


0x74 8x8 Sprite		
Order	31	24	23	16	15	8	7	0		
1	0x74	BGR	Command + Color 	
2	y	x	location	
3	CLUT	v	u	CULT ID + texture coordinates page y,x	


0x78 16x16 Rectangle		
Order	31	24	23	16	15	8	7	0		
1	0x78	BGR	Command + Color 	
2	y	x	location	


0x7c 16x16 Sprite		
Order	31	24	23	16	15	8	7	0		
1	0x74	BGR	Command + Color 	
2	y	x	location	
3	CLUT	v	u	CULT ID + texture coordinates page y,x	


GPU command & Transfer packets


0x01Clear cache		
Order	31	24	23	16	15	8	7	0		
1	0x01	0	clear cache	


0x02 frame buffer rectangle draw		
Order	31	24	23	16	15	8	7	0		
1	0x02	BGR	Command + Color 	
2	y	x	upper left corner location	
3	h	w	height and width 	
Fills the area in the frame buffer with the value in RGB. This command will draw without regard to drawing environment settings. Coordinates are absolute frame buffer coordinates. Max width is 0x3ff, max height is 0x1ff.

0x80 Rectangle		
Order	31	24	23	16	15	8	7	0		
1	0x80	BGR	Command + Color 	
2	sy	sx	Source coordinate.	
4	dy	dx	Destination coordinate	
5	h	w	height and width of transfer	
Copies data within frame buffer


0x01 0xa0 send image to frame buffer		
Order	31	24	23	16	15	8	7	0		
1	0x01		Reset command buffer (write to GP1 or GP0)	
2	0xa0	BGR	Command + Color 	
3	y	x	Destination coordinate	
4	h	w	height and width of transfer	
5	pix1	pix0	image data	
6..				
	pixn	pixn-1		
Transfers data from main memory to frame buffer If the number of pixels to be sent is odd, an extra should be sent. (32 bits per packet)

0x01 0xc0 send image to frame buffer		
Order	31	24	23	16	15	8	7	0		
1	0x01		Reset command buffer (write to GP1 or GP0)	
2	0xc0	BGR	Command + Color 	
3	y	x	Destination coordinate	
4	h	w	height and width of transfer	
5	pix1	pix0	image data	
6..				
	pixn	pixn-1		
Transfers data from frame buffer to main memory. Wait for bit 27 of the status register to be set before reading the image data. When the number of pixels is odd, an extra pixel is read at the end.(because on packet is 32 bits)


Draw mode/environment setting packets
Some of these packets can also be by primitive packets, in any case it is the last packet of either that the GPU received that is active. so if a primitive sets tpage info, it will over write the existing data, even if it was sent by an 0xe? packet.

0xe1 draw mode setting
31	24	23	11	10	9	8	7	6	5	4	3	0	
0xe1		dfe	dtd	tp	abr	ty	tx	
See above for explanations

It seems that bits 11-13 of the status register can also be passed with this command on some GPU's other than type 2. (i.e. Command 0x10000007 doesn't return 2)

0xe2 texture window setting
31	24	23	20	19	15	14	10	9	5	4	0	
0xe2		twx	twy	tww	twh	
twx	Texture window X, (twx*8)
twy	Texture window Y, (twy*8)
tww	Texture window width, 256-(tww*8)
twh	Texture window height, 256-(twh*8)

0xe3 set drawing area top left
31	24	23	16	15	8	7	0	
0xe3		Y	X	

Sets the drawing area top left corner. X &Y are absolute frame buffer coordinates.


0xe4 set drawing area bottom right
31	24	23	16	15	8	7	0	
0xe4		Y	X	

Sets the drawing area bottom right corner. X &Y are absolute frame buffer coordinates.


0xe5 drawing offset
31	24	23	20	13	11	10	0	
0xe5		OffsY	OffsX	

Offset Y = y << 11
Sets the drawing area offset within the drawing area. X&Y are offsets in the frame buffer.

0xe6 drawing offset
31	24	23	2	1	0	
0xe6		Mask2	Mask1	

Mask1	Set mask bit while drawing. 1 = on
Mask2	Do not draw to mask areas. 1= on
While mask1 is on, the GPU will set the MSB of all pixels it draws. While mask2 is on, the GPU will not write to pixels with set MSB's


DMA and the GPU
The GPU has two DMA channels allocated to it. DMA channel 2 is used to send linked packet lists to the GPU and to transfer image data to and from the frame buffer. DMA channel 6 is sets up an empty linked list, of which each entry points to the previous (i.e. reverse clear an OT.)


DMA Second Memory Address Register (D2_MADR)	0x1f80_10a0

31	0	
MADR

MADR	Pointer to the virtual address the DMA will start reading from/writing to.



DMA Second Block Control Register (D2_BCR)		0x1f80_10a4

31	0	
BA	BS	
16	16	

BA	Amount of blocks
BS	Block size (words)

Sets up the DMA blocks. Once started the DMA will send BA blocks of BS
words. Don't set a block size larger then $10 words, as the command buffer
of the GPU is 64 bytes.


DMA Second Channel Control Register (D2_CHCR) 	0x1f80_10a8

31								0	
0	TR	0	LI	CO	0	DR	
7	1	13	1	1	8	1	


TR	0	No DMA transfer busy.
1	Start DMA transfer/DMA transfer busy.
LR	1	Transfer linked list. (GPU only)
CO	1	Transfer continuous stream of data.
DR	1	Direction from memory
0	Direction from memory

This configures the DMA channel. The DMA starts when bit 18 is set. DMA is finished as soon as bit 18 is cleared again. To send or receive data to/from VRAM send the appropriate GPU packets first (0xa0/0xc0)


DMA Sixth Memory Address Register (D6_MADR)	0x1f80_10e0

31	0	
MADR

MADR	Pointer to the virtual address if the last entry.


DMA Sixth Block Control Register (D6_BCR)		0x1f80_10e4

31	0	
BC

BC	Number of list entries.

DMA Sixth Channel Control Register (D6_CHCR) 	0x1f80_10e8

31								0	
0	TR	0	LI	CO	0	DR	
7	1	13	1	1	8	1	


TR	0	No DMA transfer busy.
1	Start DMA transfer/DMA transfer busy.
LR	1	Transfer linked list. (GPU only)
CO	1	Transfer continuous stream of data.
DR	1	Direction from memory
0	Direction from memory

This configures the DMA channel. The DMA starts when bit 18 is set. DMA is finished as soon as bit 18 is cleared again. To send or receive data to/from VRAM send the appropriate GPU packets first (0xa0/0xc0) When this register is set to $11000002, the DMA channel will create an empty linked list of D6_BCR entries ending at the address in D6_MADR. Each entry has a size of 0, and points to the previous. The first entry is So if D6_MADR = $80100010, D6_BCR=$00000004, and the DMA is kicked this mwill result in a list looking like this:
0x8010_0000  0x00ff_ffff
0x8010_0004  0x0010_0000
0x8010_0008  0x0010_0004
0x8010_000c  0x0010_0008
0x8010_0010  0x0010_000c

DMA Primary Control Register  (DPCR)		0x1f80_10f0

	DMA6	DMA5	DMA4	DMA3	DMA2	DMA1	DMA0	
4	4	4	4	4	4	4	4	

Each register has a 4 bit control block allocated in this register.
Bit 3	1= DMA Enabled
      2	Unknown
      1	Unknown
      0	Unknown

Bit 3 must be set for a channel to operate.


Common GPU functions, step by step.

Initializing the GPU.

First thing to do when using the GPU is to initialize it. To do that take the following steps:

1 - Reset the GPU (GP1 command $00). This turns off the display as well.
2 - Set horizontal and vertical start/end. (GP1 command $06, $07)
3 - Set display mode. (GP1 command $08)
4 - Set display offset. (GP1 command $05)
5 - Set draw mode. (GP0 command $e1)
6 - Set draw area. (GP0 command $e3, $e4)
7 - Set draw offset. (GP0 command $e5)
8 - Enable display.

Sending a linked list.

The normal way to send large numbers of primitives is by using a linked list DMA transfer. This list is built up of entries of which each points to the next. One entry looks like this:

dw $nnYYYYYY 	; nn = the number of words in the list entry
; YYYYYY = address of next list entry & 0x00ff_ffff

1	dw ..			; here goes the primitive.
2	dw			;
	dw ..			;
nn-1	dw ..			;
nn	dw ..			;

The last entry in the list should have 0xffffff as pointer, which is the terminator. As soon as this value is found DMA is ended. If the entry size is set to 0, no data will be transferred to the GPU and the next entry is processed.

To send the list do this:
1 - Wait for the GPU to be ready to receive commands. (bit $1c == 1)
2 - Enable DMA channel 2
3 - Set GPU to DMA CPU->GPU mode. ($04000002)
3 - Set D2_MADR to the start of the list
4 - Set D2_BCR to zero.
5 - Set D2_CHCR to link mode, memory->GPU and DMA enable. ($01000401)

Uploading Image data through DMA.

To upload an image to VRAM take the following steps:

1 - Wait for the GPU to be idle and DMA to finish. Enable DMA channel 2 if necessary.
2 - Send the 'Send image to VRAM' primitive. (You can send this through DMA if you want. Use the linked list method described above)
3 - Set DMA to CPU->GPU ($04000002) (if you didn't do so already in the previous step)
4 - Set D2_MADR to the start of the list
5 - Set D2_BCR with:	bits 31-16  = Number of words to send (H*W /2)
bits 15- 0  = Block size of 1 word. ($01)
if H*W is odd, add 1. (Pixels are 2 bytes, send an extra blank pixel in case of an odd amount)
6 - Set D2_CHCR to continuous mode, memory -> GPU and DMA enable. ($01000201)

Note that H, W, X and Y are always in frame buffer pixels, even if you send image data in other formats.
You can use bigger block sizes if you need more speed. If the number of words to be sent is not a multiple of the block size, you'll have to send the remainder separately, because the GPU only accepts an extra halfword if the number of pixels is odd. (i.e. of the last word sent, only the low half word is used.) Also take care not to use block sizes bigger than 0x10, as the buffer of the GPU is only 64 bytes (=0x10 words).



Waiting to send commands

You can send new commands as soon as DMA has ceased and the GPU is ready.
1 - Wait for bit $18 to become 0 in D2_CHCR
2 - Wait for bit $1c to become 1 in GP1.


The Geometry Transformation Engine (GTE)
The Geometry Transformation Engine (GTE) is the heart of all 3D calculations on the PSX. The GTE can perform vector and matrix operations, perspective transformation, color equations and the like. It is much faster than the CPU on these operations. It is mounted as the second coprocessor and as such is no physical address in the memory of the PSX. All control is done through special instructions.

Basic mathematics
	The GTE is basicly an engine for vector mathematics. The basic representation of a point(vertex) in 3d space is through a vector of the sort [X,Y,Z]. In GTE operation there's basicly two kinds of these, vectors of variable length and vectors of a unit length of 1.0, called normal vectors. The first is used to decribe a locations and translations in 3d space, the second to describe a direction.
	Rotation of vertices is performed by multiplying the vector of the vertex with a rotation matrix. The rotation matrix is a 3x3 matrix consisting of 3 normal vectors which are orthogonal to each other. (It's actually the matrix which describes the coordinate system in which the vertex is located in relation to the unit coordinate system. See a math book for more details.) This matrix is derived from rotation angles as follows:

sn = sin(n), cn = cos(n)

Rotation angle A about X axis:  
|  1   0   0|      
|  0  cA -sA|
|  0  sA  cA|   

Rotation angle B about Y axis:
 | cB   0  sB|
|  0   1   0|   
|-sB   0  cB|  

Rotation angle C about Z axis:
| cC -sC   0|
| sC  cC   0|
|  0   0   1|

	Rotation about multiple axis can be done by multiplying these matrices with eachother. Note that the order in which this multiplication is done *IS* important. The GTE has no sine or cosine functions, so the calculation of these must be done by the CPU.
	Translation is the simple addition of two vectors, relocating the vertex within its current coordinate system. Needless to say the order in which translation and rotation occur for a vector is important.


Brief Function descriptions

RTPS/RTPT
Rotate, translate and perpective transformation.
	These two functions perform the final 3d calculations on one or three vertices at once. The points are first multiplied with a rotation matrix(R), and after that translated(TR). Finally a perspective transformation is applied, which results in 2d screen coordinates. It also returns an interpolation value to be used with the various depth cueing instructions.

MVMVA
Matrix & Vector multiplication and addition.
	Multiplies a vector with either the rotation matrix, the light matrix or the color matrix and then adds the translation vector or background color vector.



DCPL
Depth cue light color
	First calculates a color from a light vector(normal vector of a plane multiplied with the light matrix and zero limited) and a provided RGB value. Then performs depth cueing by interpolating between the far color vector and the newfound color.

DPCS/DPCT
Depth cue single/triple
	Performs depth cueing by interpolating between a color and the far color vector on one or three colors.

INTPL
Interpolation
	Interpolates between a vector and the far color vector.

SQR
Square
	Calculates the square of a vector.

NCS/NCT
Normal Color
	Calculates a color from the normal of a point or plane and the light sources and colors. The basic color of the plane or point the normal refers to is assumed to be white.

NCDS/NCDT
Normal Color Depth Cue.
	Same as NCS/NCT but also performs depth cueing (like DPCS/DPCT)

NCCS/NCCT
	Same NCS/NCT, but the base color of the plane or point is taken into account.

CDP
	A color is calculated from a light vector (base color is assumed to be white) and depth cueing is performed (like DPCS).

CC
	A color is calculated from a light vector and a base color.

NCLIP
Calculates the outer product of three 2d points.(ie. 3 vertices which define a plane after projection.)

The 3 vertices should be stored clockwise according to the visual point:

          Z+
         /
       /____ X+
       |
       |
        Y+

If this is so, the result of this function will be negative if we are
facing the backside of the plane.

AVSZ3/AVSZ4
	Adds 3 or 4 z values together and multplies them by a fixed point value. This value is normally chosen so that this function returns the average of the z values (usually further divided by 2 or 4 for easy adding to the OT)


OP
	Calculates the outer product of 2 vectors.

GPF
	Multiplies 2 vectors. Also returns the result as 24bit rgb value.

GPL
	Multiplies a vector with a scalar and adds the result to another vector. Also returns the result as 24bit rgb value.

Instructions
The CPU has six special load and store instructions for the GTE registers, and an instruction to issue commands to the coprocessor.

rt		CPU register 0-31
gd		GTE data register 0-31
gc		GTE control register 0-31
imm		16 bit immediate value
base		CPU register 0-31
imm(base)	address pointed to by base + imm.
b25		25 bit wide data field.

LWC2  gd, imm(base)	stores value at imm(base) in GTE data register gd.
SWC2  gd, imm(base)	stores GTE data register at imm(base).
MTC2  rt, gd		stores register rt in GTE data register gd.
MFC2  rt, gd		stores GTE data register gd in register rt.
CTC2  rt, gc		stores register rt in GTE control register gc.
CFC2  rt, gc		stores GTE control register in register rt.

COP2  b25		 Issues a GTE command.

GTE load and store instructions have a delay of 2 instructions, for any GTE commands or operations accessing that register.

Programming the GTE.
Before use the GTE must be turned on. The GTE has bit 30 allocated to it in the status register of the system control coprocessor (cop0). Before any GTE instruction is used, this bit must be set.

GTE instructions and functions should not be used in
- Delay slots of jumps and branches
- Event handlers or interrupts.

If an instruction that reads a GTE register or a GTE command is executed before the current GTE command is finished, the CPU will hold until the instruction has finished. The number of cycles each GTE instruction takes is in the command list.

Registers.
The GTE has 32 data registers, and 32 control registers, each 32 bits wide. The control registers are commonly called Cop2C, while the data registers are called Cop2D. The following list describes their common use. The format is explained later on.

Control Registers (Cop2C)
Number	Name	Description	
0	R11R12	Rotation matrix elements 1 to 1, 1 to 2	
1	R13R21	Rotation matrix elements 1 to 3, 2 to 1	
2	R22R23	Rotation matrix elements 2 to 2, 2 to 3	
3	R31R32	Rotation matrix elements 3 to 1, 3 to 2	
4	R33	Rotation matrix elements 3 to 3	
5	TRX	Translation vector X	
6	TRY	Translation vector Y	
7	TRZ	Translation vector Z	
8	L11L12	Light source matrix elements 1 to 1, 1 to 2	
9	L13L21	Light source matrix elements 1 to 3, 2 to 1	
10	L22L23	Light source matrix elements 2 to 2, 2 to 3	
11	L31L32	Light source matrix elements 3 to 1, 3 to 2	
12	L33	Light source matrix elements 3 to 3	
13	RBK	Background color red component	
14	BBK	Background color blue component	
15	GBK	Background color green component	
16	LR1LR2	Light color matrix source 1&2 red component	
17	LR3LG1	Light color matrix source 3 red, 1 green component	
18	LG2LG3	Light color matrix source 2&3 green component	
19	LB1LB2	Light color matrix source 1&2 blue comp	
20	LB3	Light color matrix source 3 blue component	
21	RFC	Far color red component	
22	GFC	Far color green component	
23	BFC	Far color blue component	
24	OFX	Screen offset X	
25	OFY	Screen offset y	
26	H	Projection plane distance	
27	DQA	depth queuing parameter A.(coefficient.)	
28	DQB	Depth queuing parameter B.(offset.)	
29	ZSF3	Z3 average scale factor (normally 1/3)	
30	ZSF4	Z4 average scale factor (normally 1/4)	
31	FLAG	Returns any calculation errors.(See below)	

Control Register format
	The GTE uses signed, fixed point registers for mathematics. The following is a bit-wise description of the registers.


R11R12
31	0	
R11	R12	
Sign	integral part	fractional part	Sign	integral part	fractional part	
1	3	12	1	3	12	


R13R21
31	0	
R13	R21	
Sign	integral part	fractional part	Sign	integral part	fractional part	
1	3	12	1	3	12	


R22R23
31	0	
R22	R23	
Sign	integral part	fractional part	Sign	integral part	fractional part	
1	3	12	1	3	12	


R31R32
31	0	
R31	R32	
Sign	integral part	fractional part	Sign	integral part	fractional part	
1	3	12	1	3	12	

R33
31
0
0
R33
Sign
integral part
fractional part
1
3
12


TRX
31	0	
Sign	integral part	
1	31	

TRY
31	0	
Sign	integral part	
1	31	

TRZ
31	0	
Sign	integral part	
1	31	

L11L12
31	0	
L11	L12	
Sign	integral part	fractional part	Sign	integral part	fractional part	
1	3	12	1	3	12	

L13L21
31	0	
L13	L21	
Sign	integral part	fractional part	Sign	integral part	fractional part	
1	3	12	1	3	12	

L22L23
31	0	
L22	L23	
Sign	integral part	fractional part	Sign	integral part	fractional part	
1	3	12	1	3	12	

L31L32
31	0	
L31	L32	
Sign	integral part	fractional part	Sign	integral part	fractional part	
1	3	12	1	3	12	

L33
31
0
0
L33
Sign
integral part
fractional part
1
3
12

RBK
31	0	
Sign	integral part	fractional part	
1	19	12	


GBK
31	0	
Sign	integral part	fractional part	
1	19	12	


BBK
31	0	
Sign	integral part	fractional part	
1	19	12	
    
LR1LR2
31	0	
LR1	LR2	
Sign	integral part	fractional part	Sign	integral part	fractional part	
1	3	12	1	3	12	

LR3LLG1
31	0	
LR3	LG1	
Sign	integral part	fractional part	Sign	integral part	fractional part	
1	3	12	1	3	12	

LG2LG3
31	0	
LG2	LG3	
Sign	integral part	fractional part	Sign	integral part	fractional part	
1	3	12	1	3	12	

LB1LB2
31	0	
LB1	LB2	
Sign	integral part	fractional part	Sign	integral part	fractional part	
1	3	12	1	3	12	


LB3
31
0
0
LB3
Sign
integral part
fractional part
1
3
12


RFC
31	0	
Sign	integral part	fractional part	
1	27	4	

GFC
31	0	
Sign	integral part	fractional part	
1	27	4	

BFC
31	0	
Sign	integral part	fractional part	
1	27	4	

OFX
31	0	
Sign	integral part	fractional part	
1	15	16	

0FY
31	0	
Sign	integral part	fractional part	
1	15	16	


H
31
0
0
H
integral part
16


DQA
31
0
0
DQA
Sign
integral part
fractional part
1
7
8


DQB
31
0
0
DQB
Sign
integral part
fractional part
1
7
8


ZF3
31
0
0
ZF3
Sign
integral part
fractional part
1
3
12


DZF4
31
0
0
ZF4
Sign
integral part
fractional part
1
3
12

FLAGS
31	0	

Flags  bit description.
31	Logical sum of bits 30-23 and bits 18-13
30	Calculation test result #1 overflow (2^43 or more)
29  	Calculation test result #2 overflow (2^43 or more)
28  	Calculation test result #3 overflow (2^43 or more)
27  	Calculation test result #1 underflow (less than -2^43)
26  	Calculation test result #2 underflow (less than -2^43)
25  	Calculation test result #3 underflow (less than -2^43)
24  	Limiter A1 out of range (less than 0, or less than -2^15, or 2^15 or more)
23  	Limiter A2 out of range (less than 0, or less than -2^15, or 2^15 or more)
22  	Limiter A3 out of range (less than 0, or less than -2^15, or 2^15 or more)
21  	Limiter B1 out of range (less than 0, or 2^8 or more)
20  	Limiter B2 out of range (less than 0, or 2^8 or more)
19  	Limiter B3 out of range (less than 0, or 2^8 or more)
18  	Limiter C out of range (less than 0, or 2^16 or more)
17  	Divide overflow generated (quotient of 2.0 or more)
16  	Calculation test result #4 overflow (2^31 or more)
15  	Calculation test result #4 underflow (less than -2^31)
14  	Limiter D1 out of range (less than 2^10, or 2^10 or more)
13  	Limiter D2 out of range (less than 2^10, or 2^10 or more)
12  	Limiter E out of range (less than 0, or 2^12 or more)

Data Registers
	Data registers consist of the other half of the GTE. Note in some functions format are different from the one that's given here. The numbers in the format fields are the signed, integer and fractional parts of the field. So 1,3,12 means signed(1 bit), 3 bits integral part, 12 bits fractional part.

Data Registers (Cop2D)
Number	Name	r/w	31	16	15	0	Format	Description	
0.	VXY0	r/w	VY0	VX0	1,3,12 or 1,15,0	Vector 0 X and Y 	
1	VZ0	r/w	0	VZ0	1,3,12 or 1,15,0	Vector 0 Z 	
2	VXY1	r/w	VY1	VX1	1,3,12 or 1,15,0	Vector 1 X and Y 	
3	VZ1	r/w	0	VZ1	1,3,12 or 1,15,0	Vector 1 Z 	
4	VXY2	r/w	VY2	VX2	1,3,12 or 1,15,0	Vector 2 X and Y 	
5	VZ2	r/w	0	VZ2	1,3,12 or 1,15,0	Vector 2 Z 	
6	RGB	r/w	Code, R	G,B	8 bits for each	RGB value. Code is passed, but not used in calculation	
7	OTZ	r	0	OTZ	0,15,0	Z Average value.	
8	IR0	r/w	Sign	IR0	1, 3,12	Intermediate value 0. Format may differ	
9	IR1	r/w	Sign	IR1	1, 3,12	Intermediate value 1. Format may differ	
10	IR2	r/w	Sign	IR2	1, 3,12	Intermediate value 2. Format may differ	
11	IR3	r/w	Sign	IR3	1, 3,12	Intermediate value 3. Format may differ	
12	SXY0	r/w	SX0	SY0	1,15, 0	Screen XY coordinate FIFO (Note 1)	
13	SXY1	r/w	SX1	SY1	1,15, 0	Screen XY coordinate FIFO 	
14	SXY2	r/w	SX2	SY2	1,15, 0	Screen XY coordinate FIFO 	
15	SXYP	r/w	SXP	SYP	1,15, 0	Screen XY coordinate FIFO 	
16	SZ0	r/w	0	SZ0	0,16, 0	Screen Z FIFO (Note 1)	
17	SZ1	r/w	0	SZ1	0,16, 0	Screen Z FIFO 	
18	SZ2	r/w	0	SZ2	0,16, 0	Screen Z FIFO	
19	SZ3	r/w	0	SZ3	0,16, 0	Screen Z FIFO 	
20	RGB0	r/w	CD0,B0	G0,R0	8 bits each	Characteristic color FIFO(Note 1)	
21	RGB1	r/w	CD1,B1	G1,R1	8 bits each	Characteristic color FIFO	
22	RGB2	r/w	CD2,B2	G0,R2	8 bits each	CD2 is the bit pattern of currently executed function	
23	RES1	-	-	-	-	Prohibited	
24	MAC0	r/w	MAC0	1,31,0	Sum of products value 1	
25	MAC1	r/w	MAC1	1,31,0	Sum of products value 1	
26	MAC2	r/w	MAC2	1,31,0	Sum of products value 1	
27	MAC3	r/w	MAC3	1,31,0	Sum of products value 1	
28	IRGB	w	0	IB,IG,IR	Note 2	Note 2	
29	ORGB	r	0	0B,0G,OR	Note 3	Note 3	
30	LZCS	w	LZCS	1,31,0	Leading zero count source data (Note 4)	
31	LZCR	r	LZCR	6,6,0	Leading zero count result (Note 4)	


Note 1 
The SXYx, SZx and RGBx are first in first out registers (FIFO). The last calculation result is stored in the last register, and previous results are stored in previous registers. So for example when a new SXY value is obtained the following happens:
    SXY0 = SXY1
    SXY1 = SXY2
    SXY2 = SXYP
    SXYP = result.

Note 2 
IRGB
0	R	G	B	
31	15	14	10	9	5	4	0	
    
When writing a value to IRGB the following happens:
IR1 = IR format converted to (1,11,4)
IR2 = IG format converted to (1,11,4)
IR3 = IB format converted to (1,11,4)

Note 3
0RGB
0	R	G	B	
31	15	14	10	9	5	4	0	

When writing a value to IRGB the following happens:
IR = (IR1>>7) &0x1f
IG = (IR2>>7) &0x1f
IB = (IR3>>7) &0x1f

Note 4
Reading LZCR returns the leading 0 count of LZCS if LZCS is positive and the leading 1 count of LZCS if LZCS is negative.

GTE Commands.
	This part describes the actual calculations performed by the various GTE functions. The first line  contains the name of the function, the number of cycles it takes and a brief description. The second part contains any fields that may be set in the opcode and in the third line is the actual opcode. See the end of the list for the fields and their descriptions. Then follows a list of all registers which are needed in the calculation under the 'in', and a list of registers which modified under the 'out' with a brief description and the format of the data. Next follows the calculation which is performed after initiating the function. The format field left is the size in which the data is stored, the format field on the right contains the format in which the calculation is performed. At certain points in the calculation checks and limitations are done and their results stored in the flag register, see the table below. They are identified with the code from the second column of the table directly followed by square brackets enclosing the part of the calculation on which the check is performed. The additional Lm_ identifier means the value is limited to the bottom or ceiling of the check if it exceeds the boundary.

bit      description
31       Checksum.
30  A1   Result larger than 43 bits and positive
29  A2   Result larger than 43 bits and positive
28  A3   Result larger than 43 bits and positive
27  A1   Result larger than 43 bits and negative
26  A2   Result larger than 43 bits and negative
25  A3   Result larger than 43 bits and negative
24  B1   Value negative(lm=1) or larger than 15 bits(lm=0)
23  B2   Value negative(lm=1) or larger than 15 bits(lm=0)
22  B3   Value negative(lm=1) or larger than 15 bits(lm=0)
21  C1   Value negative or larger than 8 bits.
20  C2   Value negative or larger than 8 bits.
19  C3   Value negative or larger than 8 bits.
18  D    Value negative or larger than 16 bits.
17  E    Divide overflow. (quotient > 2.0)
16  F    Result larger than 31 bits and positive.
15  F    Result larger than 31 bits and negative.
14  G1   Value larger than 10 bits.
13  G2   Value larger than 10 bits.
12  H    Value negative or larger than 12 bits.


Name	Cycles	Command	Description	
RTPS	15	cop2 0x0180001	Perspective transform	
Fields:           None
In:      V0       Vector to transform.                         [1,15,0]
         R        Rotation matrix                              [1,3,12]
         TR       Translation vector                           [1,31,0]
         H        View plane distance                          [0,16,0]
         DQA      Depth que interpolation values.              [1,7,8]
         DQB                                                   [1,7,8]
         OFX      Screen offset values.                        [1,15,16]
         OFY                                                   [1,15,16]
Out:     SXY fifo Screen XY coordinates.(short)                [1,15,0]
         SZ fifo  Screen Z coordinate.(short)                  [0,16,0]
         IR0      Interpolation value for depth queing.        [1,3,12]
         IR1      Screen X (short)                             [1,15,0]
         IR2      Screen Y (short)                             [1,15,0]
         IR3      Screen Z (short)                             [1,15,0]
         MAC1     Screen X (long)                              [1,31,0]
         MAC2     Screen Y (long)                              [1,31,0]
         MAC3     Screen Z (long)                              [1,31,0]

Calculation:
[1,31,0] MAC1=A1[TRX + R11*VX0 + R12*VY0 + R13*VZ0]            [1,31,12]
[1,31,0] MAC2=A2[TRY + R21*VX0 + R22*VY0 + R23*VZ0]            [1,31,12]
[1,31,0] MAC3=A3[TRZ + R31*VX0 + R32*VY0 + R33*VZ0]            [1,31,12]
[1,15,0] IR1= Lm_B1[MAC1]                                      [1,31,0]
[1,15,0] IR2= Lm_B2[MAC2]                                      [1,31,0]
[1,15,0] IR3= Lm_B3[MAC3]                                      [1,31,0]
         SZ0<-SZ1<-SZ2<-SZ3
[0,16,0] SZ3= Lm_D(MAC3)                                       [1,31,0]
         SX0<-SX1<-SX2, SY0<-SY1<-SY2
[1,15,0] SX2= Lm_G1[F[OFX + IR1*(H/SZ)]]                       [1,27,16]
[1,15,0] SY2= Lm_G2[F[OFY + IR2*(H/SZ)]]                       [1,27,16]
[1,31,0] MAC0= F[DQB + DQA * (H/SZ)]                           [1,19,24]
[1,15,0] IR0= Lm_H[MAC0]                                       [1,31,0]

Notes:
Z values are limited downwards at 0.5 * H. For smaller z values you'll have
write your own routine.




Name	Cycles	Command	Description	
RTPT	23	cop2 0x0280030	Perspective transform on 3 points	
Fields:  None
in:      V0       Vector to transform.                         [1,15,0]
         V1                                                    [1,15,0]
         V2                                                    [1,15,0]
         R        Rotation matrix                              [1,3,12]
         TR       Translation vector                           [1,31,0]
         H        View plane distance                          [0,16,0]
         DQA      Depth que interpolation values.              [1,7,8]
         DQB                                                   [1,7,8]
         OFX      Screen offset values.                        [1,15,16]
         OFY                                                   [1,15,16]
out:     SXY fifo Screen XY coordinates.(short)                [1,15,0]
         SZ fifo  Screen Z coordinate.(short)                  [0,16,0]
         IR0      Interpolation value for depth queing.        [1,3,12]
         IR1      Screen X (short)                             [1,15,0]
         IR2      Screen Y (short)                             [1,15,0]
         IR3      Screen Z (short)                             [1,15,0]
         MAC1     Screen X (long)                              [1,31,0]
         MAC2     Screen Y (long)                              [1,31,0]
         MAC3     Screen Z (long)                              [1,31,0]

Calculation: Same as RTPS, but repeats for V1 and V2.

Name	Cycles	Command	Description	
MVMVA	8	cop2 0x0400012	Multiply vector by matrix and vector addition.	
Fields:  sf, mx, v, cv, lm

in:      V0/V1/V2/IR       Vector v0, v1, v2 or [IR1,IR2,IR3]
         R/LLM/LCM         Rotation, light or color matrix.    [1,3,12]
         TR/BK             Translation or background color vector.
out:     [IR1,IR2,IR3]     Short vector
         [MAC1,MAC2,MAC3]  Long vector

Calculation:
MX = matrix specified by mx
V = vector specified by v
CV = vector specified by cv

         MAC1=A1[CV1 + MX11*V1 + MX12*V2 + MX13*V3]
         MAC2=A2[CV2 + MX21*V1 + MX22*V2 + MX23*V3]
         MAC3=A3[CV3 + MX31*V1 + MX32*V2 + MX33*V3]
         IR1=Lm_B1[MAC1]
         IR2=Lm_B2[MAC2]
         IR3=Lm_B3[MAC3]

Notes:
The cv field allows selection of the far color vector, but this vector
is not added correctly by the GTE.


Name	Cycles	Command	Description	
DPCL	8	cop2 0x0680029	Depth Cue Color light	
Fields:  
In:      RGB               Primary color.         R,G,B,CODE   [0,8,0]
         IR0               interpolation value.                [1,3,12]
         [IR1,IR2,IR3]     Local color vector.                 [1,3,12]
         CODE              Code value from RGB.           CODE [0,8,0]
         FC                Far color.                          [1,27,4]
Out:     RGBn              RGB fifo               Rn,Gn,Bn,CDn [0,8,0]
         [IR1,IR2,IR3]     Color vector                        [1,11,4]
         [MAC1,MAC2,MAC3]  Color vector                        [1,27,4]

Calculation:
[1,27,4]  MAC1=A1[R*IR1 + IR0*(Lm_B1[RFC - R * IR1])]          [1,27,16]
[1,27,4]  MAC2=A2[G*IR2 + IR0*(Lm_B1[GFC - G * IR2])]          [1,27,16]
[1,27,4]  MAC3=A3[B*IR3 + IR0*(Lm_B1[BFC - B * IR3])]          [1,27,16]
[1,11,4]  IR1=Lm_B1[MAC1]                                      [1,27,4]
[1,11,4]  IR2=Lm_B2[MAC2]                                      [1,27,4]
[1,11,4]  IR3=Lm_B3[MAC3]                                      [1,27,4]
[0,8,0]   Cd0<-Cd1<-Cd2<- CODE
[0,8,0]   R0<-R1<-R2<- Lm_C1[MAC1]                             [1,27,4]
[0,8,0]   G0<-G1<-G2<- Lm_C2[MAC2]                             [1,27,4]
[0,8,0]   B0<-B1<-B2<- Lm_C3[MAC3]                             [1,27,4]


Name	Cycles	Command	Description	
DPCS	8	cop2 0x0780010	Depth Cueing	
Fields:
In:      IR0               Interpolation value                 [1,3,12]
         RGB               Color                  R,G,B,CODE   [0,8,0]
         FC                Far color              RFC,GFC,BFC  [1,27,4]
Out:     RGBn              RGB fifo               Rn,Gn,Bn,CDn [0,8,0]
         [IR1,IR2,IR3]     Color vector                        [1,11,4]
         [MAC1,MAC2,MAC3]  Color vector                        [1,27,4]

Calculations:
[1,27,4]  MAC1=A1[(R + IR0*(Lm_B1[RFC - R])]                   [1,27,16][lm=0]
[1,27,4]  MAC2=A2[(G + IR0*(Lm_B1[GFC - G])]                   [1,27,16][lm=0]
[1,27,4]  MAC3=A3[(B + IR0*(Lm_B1[BFC - B])]                   [1,27,16][lm=0]
[1,11,4]  IR1=Lm_B1[MAC1]                                      [1,27,4][lm=0]
[1,11,4]  IR2=Lm_B2[MAC2]                                      [1,27,4][lm=0]
[1,11,4]  IR3=Lm_B3[MAC3]                                      [1,27,4][lm=0]
[0,8,0]   Cd0<-Cd1<-Cd2<- CODE
[0,8,0]   R0<-R1<-R2<- Lm_C1[MAC1]                             [1,27,4]
[0,8,0]   G0<-G1<-G2<- Lm_C2[MAC2]                             [1,27,4]
[0,8,0]   B0<-B1<-B2<- Lm_C3[MAC3]                             [1,27,4]






Name	Cycles	Command	Description	
DPCT	17	cop2 0x0F8002A	Depth cue color RGB0,RGB1,RGB2	
Fields:

In:      IR0               Interpolation value                 [1,3,12]
         RGB0,RGB1,RGB2    Colors in RGB fifo.    Rn,Gn,Bn,CDn [0,8,0]
         FC                Far color              RFC,GFC,BFC  [1,27,4]
Out:     RGBn              RGB fifo               Rn,Gn,Bn,CDn [0,8,0]
         [IR1,IR2,IR3]     Color vector                        [1,11,4]
         [MAC1,MAC2,MAC3]  Color vector                        [1,27,4]

Calculations:
[1,27,4]  MAC1=A1[R0+ IR0*(Lm_B1[RFC - R0])]                   [1,27,16][lm=0]
[1,27,4]  MAC2=A2[G0+ IR0*(Lm_B1[GFC - G0])]                   [1,27,16][lm=0]
[1,27,4]  MAC3=A3[B0+ IR0*(Lm_B1[BFC - B0])]                   [1,27,16][lm=0]
[1,11,4]  IR1=Lm_B1[MAC1]                                      [1,27,4][lm=0]
[1,11,4]  IR2=Lm_B2[MAC2]                                      [1,27,4][lm=0]
[1,11,4]  IR3=Lm_B3[MAC3]                                      [1,27,4][lm=0]
[0,8,0]   Cd0<-Cd1<-Cd2<- CODE
[0,8,0]   R0<-R1<-R2<- Lm_C1[MAC1]                             [1,27,4]
[0,8,0]   G0<-G1<-G2<- Lm_C2[MAC2]                             [1,27,4]
[0,8,0]   B0<-B1<-B2<- Lm_C3[MAC3]                             [1,27,4]

Performs this calculation 3 times, so all three RGB values have been
replaced by the depth cued RGB values.

Name	Cycles	Command	Description	
INTPL	8	cop2 0x0980011	Interpolation of vector and far color	
Fields:
In:      [IR1,IR2,IR3]     Vector                              [1,3,12]
         IR0               Interpolation value                 [1,3,12]
         CODE              Code value from RGB.           CODE [0,8,0]
         FC                Far color              RFC,GFC,BFC  [1,27,4]
Out:     RGBn              RGB fifo               Rn,Gn,Bn,CDn [0,8,0]
         [IR1,IR2,IR3]     Color vector                        [1,11,4]
         [MAC1,MAC2,MAC3]  Color vector                        [1,27,4]

Calculations:
[1,27,4]  MAC1=A1[IR1 + IR0*(Lm_B1[RFC - IR1])]                [1,27,16]
[1,27,4]  MAC2=A2[IR2 + IR0*(Lm_B1[GFC - IR2])]                [1,27,16]
[1,27,4]  MAC3=A3[IR3 + IR0*(Lm_B1[BFC - IR3])]                [1,27,16]
[1,11,4]  IR1=Lm_B1[MAC1]                                      [1,27,4]
[1,11,4]  IR2=Lm_B2[MAC2]                                      [1,27,4]
[1,11,4]  IR3=Lm_B3[MAC3]                                      [1,27,4]
[0,8,0]   Cd0<-Cd1<-Cd2<- CODE
[0,8,0]   R0<-R1<-R2<- Lm_C1[MAC1]                             [1,27,4]
[0,8,0]   G0<-G1<-G2<- Lm_C2[MAC2]                             [1,27,4]
[0,8,0]   B0<-B1<-B2<- Lm_C3[MAC3]                             [1,27,4]




Name	Cycles	Command	Description	
SQR	5	cop2 0x0A00428	Square of vector	
Fields: sf
in:      [IR1,IR2,IR3]     vector                      [1,15,0][1,3,12]
out:     [IR1,IR2,IR3]     vector^2                    [1,15,0][1,3,12]
         [MAC1,MAC2,MAC3]  vector^2                    [1,31,0][1,19,12]

Calculation: (left format sf=0, right format sf=1)

[1,31,0][1,19,12] MAC1=A1[IR1*IR1]                     [1,43,0][1,31,12]
[1,31,0][1,19,12] MAC2=A2[IR2*IR2]                     [1,43,0][1,31,12]
[1,31,0][1,19,12] MAC3=A3[IR3*IR3]                     [1,43,0][1,31,12]
[1,15,0][1,3,12]  IR1=Lm_B1[MAC1]                      [1,31,0][1,19,12][lm=1]
[1,15,0][1,3,12]  IR2=Lm_B2[MAC2]                      [1,31,0][1,19,12][lm=1]
[1,15,0][1,3,12]  IR3=Lm_B3[MAC3]                      [1,31,0][1,19,12][lm=1]

Name	Cycles	Command	Description	
NCS	14	cop2 0x0C8041E	Normal color v0	
Fields:
In:      V0                Normal vector                       [1,3,12]
         BK                Background color       RBK,GBK,BBK  [1,19,12]
         CODE              Code value from RGB.           CODE [0,8,0]
         LCM               Color matrix                        [1,3,12]
         LLM               Light matrix                        [1,3,12]
Out:     RGBn              RGB fifo.              Rn,Gn,Bn,CDn [0,8,0]
         [IR1,IR2,IR3]     Color vector                        [1,11,4]
         [MAC1,MAC2,MAC3]  Color vector                        [1,27,4]

[1,19,12] MAC1=A1[L11*VX0 + L12*VY0 + L13*VZ0]                 [1,19,24]
[1,19,12] MAC2=A2[L21*VX0 + L22*VY0 + L23*VZ0]                 [1,19,24]
[1,19,12] MAC3=A3[L31*VX0 + L32*VY0 + L33*VZ0]                 [1,19,24]
[1,3,12]  IR1= Lm_B1[MAC1]                                     [1,19,12][lm=1]
[1,3,12]  IR2= Lm_B2[MAC2]                                     [1,19,12][lm=1]
[1,3,12]  IR3= Lm_B3[MAC3]                                     [1,19,12][lm=1]
[1,19,12] MAC1=A1[RBK + LR1*IR1 + LR2*IR2 + LR3*IR3]           [1,19,24]
[1,19,12] MAC2=A2[GBK + LG1*IR1 + LG2*IR2 + LG3*IR3]           [1,19,24]
[1,19,12] MAC3=A3[BBK + LB1*IR1 + LB2*IR2 + LB3*IR3]           [1,19,24]
[1,3,12]  IR1= Lm_B1[MAC1]                                     [1,19,12][lm=1]
[1,3,12]  IR2= Lm_B2[MAC2]                                     [1,19,12][lm=1]
[1,3,12]  IR3= Lm_B3[MAC3]                                     [1,19,12][lm=1]
[0,8,0]   Cd0<-Cd1<-Cd2<- CODE
[0,8,0]   R0<-R1<-R2<- Lm_C1[MAC1]                             [1,27,4]
[0,8,0]   G0<-G1<-G2<- Lm_C2[MAC2]                             [1,27,4]
[0,8,0]   B0<-B1<-B2<- Lm_C3[MAC3]                             [1,27,4]








Name	Cycles	Command	Description	
NCT	30	cop2 0x0D80420	Normal color v0, v1, v2	
Fields:
In:      V0,V1,V2          Normal vector                       [1,3,12]
         BK                Background color       RBK,GBK,BBK  [1,19,12]
         CODE              Code value from RGB.           CODE [0,8,0]
         LCM               Color matrix                        [1,3,12]
         LLM               Light matrix                        [1,3,12]
Out:     RGBn              RGB fifo.              Rn,Gn,Bn,CDn [0,8,0]
         [IR1,IR2,IR3]     Color vector                        [1,11,4]
         [MAC1,MAC2,MAC3]  Color vector                        [1,27,4]

Calculation: Same as NCS, but repeated for V1 and V2.

Name	Cycles	Command	Description	
NCDS	19	cop2 0x0E80413	Normal color depth cuev0	
Fields:
In:      V0                Normal vector                       [1,3,12]
         BK                Background color       RBK,GBK,BBK  [1,19,12]
         RGB               Primary color          R,G,B,CODE   [0,8,0]
         LLM               Light matrix                        [1,3,12]
         LCM               Color matrix                        [1,3,12]
         IR0               Interpolation value                 [1,3,12]
Out:     RGBn              RGB fifo.              Rn,Gn,Bn,CDn [0,8,0]
         [IR1,IR2,IR3]     Color vector                        [1,11,4]
         [MAC1,MAC2,MAC3]  Color vector                        [1,27,4]

Calculation:
[1,19,12] MAC1=A1[L11*VX0 + L12*VY0 + L13*VZ0]                 [1,19,24]
[1,19,12] MAC2=A1[L21*VX0 + L22*VY0 + L23*VZ0]                 [1,19,24]
[1,19,12] MAC3=A1[L31*VX0 + L32*VY0 + L33*VZ0]                 [1,19,24]
[1,3,12]  IR1= Lm_B1[MAC1]                                     [1,19,12][lm=1]
[1,3,12]  IR2= Lm_B2[MAC2]                                     [1,19,12][lm=1]
[1,3,12]  IR3= Lm_B3[MAC3]                                     [1,19,12][lm=1]
[1,19,12] MAC1=A1[RBK + LR1*IR1 + LR2*IR2 + LR3*IR3]           [1,19,24]
[1,19,12] MAC2=A1[GBK + LG1*IR1 + LG2*IR2 + LG3*IR3]           [1,19,24]
[1,19,12] MAC3=A1[BBK + LB1*IR1 + LB2*IR2 + LB3*IR3]           [1,19,24]
[1,3,12]  IR1= Lm_B1[MAC1]                                     [1,19,12][lm=1]
[1,3,12]  IR2= Lm_B2[MAC2]                                     [1,19,12][lm=1]
[1,3,12]  IR3= Lm_B3[MAC3]                                     [1,19,12][lm=1]
[1,27,4]  MAC1=A1[R*IR1 + IR0*(Lm_B1[RFC-R*IR1])]              [1,27,16][lm=0]
[1,27,4]  MAC2=A1[G*IR2 + IR0*(Lm_B2[GFC-G*IR2])]              [1,27,16][lm=0]
[1,27,4]  MAC3=A1[B*IR3 + IR0*(Lm_B3[BFC-B*IR3])]              [1,27,16][lm=0]
[1,3,12]  IR1= Lm_B1[MAC1]                                     [1,27,4][lm=1]
[1,3,12]  IR2= Lm_B2[MAC2]                                     [1,27,4][lm=1]
[1,3,12]  IR3= Lm_B3[MAC3]                                     [1,27,4][lm=1]
[0,8,0]   Cd0<-Cd1<-Cd2<- CODE
[0,8,0]   R0<-R1<-R2<- Lm_C1[MAC1]                             [1,27,4]
[0,8,0]   G0<-G1<-G2<- Lm_C2[MAC2]                             [1,27,4]
[0,8,0]   B0<-B1<-B2<- Lm_C3[MAC3]                             [1,27,4]


Name	Cycles	Command	Description	
NCDT	44	cop2 0x0F80416	Normal color depth cue v0, v1, v2	
Fields:
In:      V0                Normal vector                       [1,3,12]
         V1                Normal vector                       [1,3,12]
         V2                Normal vector                       [1,3,12]
         BK                Background color       RBK,GBK,BBK  [1,19,12]
         FC                Far color              RFC,GFC,BFC  [1,27,4]
         RGB               Primary color          R,G,B,CODE   [0,8,0]
         LLM               Light matrix                        [1,3,12]
         LCM               Color matrix                        [1,3,12]
         IR0               Interpolation value                 [1,3,12]
Out:     RGBn              RGB fifo.              Rn,Gn,Bn,CDn [0,8,0]
         [IR1,IR2,IR3]     Color vector                        [1,11,4]
         [MAC1,MAC2,MAC3]  Color vector                        [1,27,4]

Calculation:
Same as NCDS but repeats for v1 and v2.


Name	Cycles	Command	Description	
NCCS	17	cop2 0x108041B	Normal color col. v0	
Fields:
In:      V0                Normal vector                       [1,3,12]
         BK                Background color       RBK,GBK,BBK  [1,19,12]
         RGB               Primary color          R,G,B,CODE   [0,8,0]
         LLM               Light matrix                        [1,3,12]
         LCM               Color matrix                        [1,3,12]
Out:     RGBn              RGB fifo.              Rn,Gn,Bn,CDn [0,8,0]
         [IR1,IR2,IR3]     Color vector                        [1,11,4]
         [MAC1,MAC2,MAC3]  Color vector                        [1,27,4]

Calculation:
[1,19,12] MAC1=A1[L11*VX0 + L12*VY0 + L13*VZ0]                  [1,19,24]
[1,19,12] MAC2=A2[L21*VX0 + L22*VY0 + L23*VZ0]                  [1,19,24]
[1,19,12] MAC3=A3[L31*VX0 + L32*VY0 + L33*VZ0]                  [1,19,24]
[1,3,12]  IR1= Lm_B1[MAC1]                                      [1,19,12][lm=1]
[1,3,12]  IR2= Lm_B2[MAC2]                                      [1,19,12][lm=1]
[1,3,12]  IR3= Lm_B3[MAC3]                                      [1,19,12][lm=1]
[1,19,12] MAC1=A1[RBK + LR1*IR1 + LR2*IR2 + LR3*IR3]            [1,19,24]
[1,19,12] MAC2=A2[GBK + LG1*IR1 + LG2*IR2 + LG3*IR3]            [1,19,24]
[1,19,12] MAC3=A3[BBK + LB1*IR1 + LB2*IR2 + LB3*IR3]            [1,19,24]
[1,3,12]  IR1= Lm_B1[MAC1]                                      [1,19,12][lm=1]
[1,3,12]  IR2= Lm_B2[MAC2]                                      [1,19,12][lm=1]
[1,3,12]  IR3= Lm_B3[MAC3]                                      [1,19,12][lm=1]
[1,27,4]  MAC1=A1[R*IR1]                                        [1,27,16]
[1,27,4]  MAC2=A2[G*IR2]                                        [1,27,16]
[1,27,4]  MAC3=A3[B*IR3]                                        [1,27,16]
[1,3,12]  IR1= Lm_B1[MAC1]                                      [1,27,4][lm=1]
[1,3,12]  IR2= Lm_B2[MAC2]                                      [1,27,4][lm=1]
[1,3,12]  IR3= Lm_B3[MAC3]                                      [1,27,4][lm=1]
[0,8,0]   Cd0<-Cd1<-Cd2<- CODE
[0,8,0]   R0<-R1<-R2<- Lm_C1[MAC1]                              [1,27,4]
[0,8,0]   G0<-G1<-G2<- Lm_C2[MAC2]                              [1,27,4]
[0,8,0]   B0<-B1<-B2<- Lm_C3[MAC3]                              [1,27,4]

Name	Cycles	Command	Description	
NCCT	39	cop2 0x118043F	Normal color col.v0, v1, v2	
Fields:
In:      V0                Normal vector 1                     [1,3,12]
         V1                Normal vector 2                     [1,3,12]
         V2                Normal vector 3                     [1,3,12]
         BK                Background color       RBK,GBK,BBK  [1,19,12]
         RGB               Primary color          R,G,B,CODE   [0,8,0]
         LLM               Light matrix                        [1,3,12]
         LCM               Color matrix                        [1,3,12]
Out:     RGBn              RGB fifo.              Rn,Gn,Bn,CDn [0,8,0]
         [IR1,IR2,IR3]     Color vector                        [1,11,4]
         [MAC1,MAC2,MAC3]  Color vector                        [1,27,4]

Calculation:
Same as NCCS but repeats for v1 and v2.

Name	Cycles	Command	Description	
CDP	13	cop2 0x1280414	Color Depth Queue	
Fields:
In:      [IR1,IR2,IR3]     Vector                              [1,3,12]
         RGB               Primary color          R,G,B,CODE   [0,8,0]
         IR0               Interpolation value                 [1,3,12]
         BK                Background color       RBK,GBK,BBK  [1,19,12]
         LCM               Color matrix                        [1,3,12]
         FC                Far color              RFC,GFC,BFC  [1,27,4]
Out:     RGBn              RGB fifo               Rn,Gn,Bn,CDn [0,8,0]
         [IR1,IR2,IR3]     Color vector                        [1,11,4]
         [MAC1,MAC2,MAC3]  Color vector                        [1,27,4]

Calculation:
[1,19,12] MAC1=A1[RBK + LR1*IR1 + LR2*IR2 + LR3*IR3]           [1,19,24]
[1,19,12] MAC2=A2[GBK + LG1*IR1 + LG2*IR2 + LG3*IR3]           [1,19,24]
[1,19,12] MAC3=A3[BBK + LB1*IR1 + LB2*IR2 + LB3*IR3]           [1,19,24]
[1,3,12]  IR1= Lm_B1[MAC1]                                     [1,19,12][lm=1]
[1,3,12]  IR2= Lm_B2[MAC2]                                     [1,19,12][lm=1]
[1,3,12]  IR3= Lm_B3[MAC3]                                     [1,19,12][lm=1]
[1,27,4]  MAC1=A1[R*IR1 + IR0*(Lm_B1[RFC-R*IR1])]              [1,27,16][lm=0]
[1,27,4]  MAC2=A2[G*IR2 + IR0*(Lm_B2[GFC-G*IR2])]              [1,27,16][lm=0]
[1,27,4]  MAC3=A3[B*IR3 + IR0*(Lm_B3[BFC-B*IR3])]              [1,27,16][lm=0]
[1,3,12]  IR1= Lm_B1[MAC1]                                     [1,27,4][lm=1]
[1,3,12]  IR2= Lm_B2[MAC2]                                     [1,27,4][lm=1]
[1,3,12]  IR3= Lm_B3[MAC3]                                     [1,27,4][lm=1]
[0,8,0]   Cd0<-Cd1<-Cd2<- CODE
[0,8,0]   R0<-R1<-R2<- Lm_C1[MAC1]                             [1,27,4]
[0,8,0]   G0<-G1<-G2<- Lm_C2[MAC2]                             [1,27,4]
[0,8,0]   B0<-B1<-B2<- Lm_C3[MAC3]                             [1,27,4]


Name	Cycles	Command	Description	
CC	11	cop2 0x138041C	Color Col.	
Fields:
In:      [IR1,IR2,IR3]     Vector                              [1,3,12]
         BK                Background color       RBK,GBK,BBK  [1,19,12]
         RGB               Primary color          R,G,B,CODE   [0,8,0]
         LCM               Color matrix                        [1,3,12]
Out:     RGBn              RGB fifo.              Rn,Gn,Bn,CDn [0,8,0]
         [IR1,IR2,IR3]     Color vector                        [1,11,4]
         [MAC1,MAC2,MAC3]  Color vector                        [1,27,4]

Calculations:
[1,19,12] MAC1=A1[RBK + LR1*IR1 + LR2*IR2 + LR3*IR3]           [1,19,24]
[1,19,12] MAC2=A2[GBK + LG1*IR1 + LG2*IR2 + LG3*IR3]           [1,19,24]
[1,19,12] MAC3=A3[BBK + LB1*IR1 + LB2*IR2 + LB3*IR3]           [1,19,24]
[1,3,12]  IR1= Lm_B1[MAC1]                                     [1,19,12][lm=1]
[1,3,12]  IR2= Lm_B2[MAC2]                                     [1,19,12][lm=1]
[1,3,12]  IR3= Lm_B3[MAC3]                                     [1,19,12][lm=1]
[1,27,4]  MAC1=A1[R*IR1]                                       [1,27,16]
[1,27,4]  MAC2=A2[G*IR2]                                       [1,27,16]
[1,27,4]  MAC3=A3[B*IR3]                                       [1,27,16]
[1,3,12]  IR1= Lm_B1[MAC1]                                     [1,27,4][lm=1]
[1,3,12]  IR2= Lm_B2[MAC2]                                     [1,27,4][lm=1]
[1,3,12]  IR3= Lm_B3[MAC3]                                     [1,27,4][lm=1]
[0,8,0]   Cd0<-Cd1<-Cd2<- CODE
[0,8,0]   R0<-R1<-R2<- Lm_C1[MAC1]                             [1,27,4]
[0,8,0]   G0<-G1<-G2<- Lm_C2[MAC2]                             [1,27,4]
[0,8,0]   B0<-B1<-B2<- Lm_C3[MAC3]                             [1,27,4]
- 
Name	Cycles	Command	Description	
NCLIP	8	cop2 0x1400006	Normal clipping	
Fields:
in:      SXY0,SXY1,SXY2    Screen coordinates                  [1,15,0]
out:     MAC0              Outerproduct of SXY1 and SXY2 with  [1,31,0]
                           SXY0 as origin.

Calculation:
[1,31,0] MAC0 = F[SX0*SY1+SX1*SY2+SX2*SY0-SX0*SY2-SX1*SY0-SX2*SY1] [1,43,0]

Name	Cycles	Command	Description	
AVSZ3	5	cop2 0x158002D	Average of three Z values	
Fields:
in:      SZ1, SZ2, SZ3     Z-Values                            [0,16,0]
         ZSF3              Divider                             [1,3,12]
out:     OTZ               Average.                            [0,16,0]
         MAC0              Average.                            [1,31,0]

Calculation:
[1,31,0] MAC0=F[ZSF3*SZ1 + ZSF3*SZ2 + ZSF3*SZ3]                [1,31,12]
[0,16,0] OTZ=Lm_D[MAC0]                                        [1,31,0]

Name	Cycles	Command	Description	
AVSZ4	6	cop2 0x168002E	Average of four Z values	
Fields:
in:      SZ1,SZ2,SZ3,SZ4   Z-Values                            [0,16,0]
         ZSF4              Divider                             [1,3,12]
out:     OTZ               Average.                            [0,16,0]
         MAC0              Average.                            [1,31,0]

Calculation:
[1,31,0] MAC0=F[ZSF4*SZ0 + ZSF4*SZ1 + ZSF4*SZ2 + ZSF4*SZ3]     [1,31,12]
[0,16,0] OTZ=Lm_D[MAC0]                                        [1,31,0]

Name	Cycles	Command	Description	
OP	6	cop2 0x170000C	Outer Product	
Fields: sf
in:      [R11R12,R22R23,R33] vector 1
         [IR1,IR2,IR3]      vector 2
out:     [IR1,IR2,IR3]      outer product
         [MAC1,MAC2,MAC3]   outer product

Calculation: (D1=R11R12,D2=R22R23,D3=R33)

         MAC1=A1[D2*IR3 - D3*IR2]
         MAC2=A2[D3*IR1 - D1*IR3]
         MAC3=A3[D1*IR2 - D2*IR1]
         IR1=Lm_B1[MAC0]
         IR2=Lm_B2[MAC1]
         IR3=Lm_B3[MAC2]

Name	Cycles	Command	Description	
GPF	6	cop2 0x190003D	General purpose interpolation	
Fields: sf

in:      IR0               scaling factor
         CODE              code field of RGB
         [IR1,IR2,IR3]     vector
out:     [IR1,IR2,IR3]     vector
         [MAC1,MAC2,MAC3]  vector
         RGB2              RGB fifo.

Calculation:

         MAC1=A1[IR0 * IR1]
         MAC2=A2[IR0 * IR2]
         MAC3=A3[IR0 * IR3]
         IR1=Lm_B1[MAC1]
         IR2=Lm_B2[MAC2]
         IR3=Lm_B3[MAC3]
[0,8,0]   Cd0<-Cd1<-Cd2<- CODE
[0,8,0]   R0<-R1<-R2<- Lm_C1[MAC1]
[0,8,0]   G0<-G1<-G2<- Lm_C2[MAC2]
[0,8,0]   B0<-B1<-B2<- Lm_C3[MAC3]

Name	Cycles	Command	Description	
GPL	5	cop2 0x1A0003E	general purpose interpolation	
Fields: sf
in:      IR0               scaling factor
         CODE              code field of RGB
         [IR1,IR2,IR3]     vector
         [MAC1,MAC2,MAC3]  vector
out:     [IR1,IR2,IR3]     vector
         [MAC1,MAC2,MAC3]  vector
         RGB2              RGB fifo.

Calculation:

         MAC1=A1[MAC1 + IR0 * IR1]
         MAC2=A2[MAC2 + IR0 * IR2]
         MAC3=A3[MAC3 + IR0 * IR3]
         IR1=Lm_B1[MAC1]
         IR2=Lm_B2[MAC2]
         IR3=Lm_B3[MAC3]
[0,8,0]  Cd0<-Cd1<-Cd2<- CODE
[0,8,0]  R0<-R1<-R2<- Lm_C1[MAC1]
[0,8,0]  G0<-G1<-G2<- Lm_C2[MAC2]
[0,8,0]  B0<-B1<-B2<- Lm_C3[MAC3]

Field descriptions.

24	20	19	18	17	16	15	14	13	12	11	10	9	0	
	sf	mx	v	cv		lm		


sf	0	vector format (1,31, 0)
1	vector format (1,19,12)

mx	0	Multiply with rotation matrix
1 	Multiply with light matrix
2	Multiply with color matrix
3	Unknown

v	0	V0 source vector (short)
1	V1 source vector (short)
2	V2 source vector (short)
3	IR source vector (long)

cv	0	Add translation vector
1  	Add back color vector
2  	Unknown
3	Add no vector

lm	0	No negative limit.
1	Limit negative results to 0.


A list of common MVMVA instructions:


Name	Cycles	Command	Description	
rtv0	-	cop2 0x0486012	v0 * rotmatrix	
rtv1	-	cop2 0x048E012	v1 * rotmatrix	
rtv2	-	cop2 0x0496012	v2 * rotmatrix	
rtir12	-	cop2 0x049E012	ir * rotmatrix	
rtir0	-	cop2 0x041E012	ir * rotmatrix	
rtv0tr	-	cop2 0x0480012	v0 * rotmatrix + tr vector	
rtv1tr	-	cop2 0x0488012	v1 * rotmatrix + tr vector	
rtv2tr	-	cop2 0x0490012	v2 * rotmatrix + tr vector	
rtirtr	-	cop2 0x0498012	ir * rotmatrix + tr vector	
rtv0bk	-	cop2 0x0482012	v0 * rotmatrix + bk vector	
rtv1bk	-	cop2 0x048A012	v1 * rotmatrix + bk vector	
rtv2bk	-	cop2 0x0492012	v2 * rotmatrix + bk vector	
rtirbk	-	cop2 0x049A012	ir * rotmatrix + bk vector	
ll	-	cop2 0x04A6412	v0 * light matrix. Lower limit result to 0	
llv0	-	cop2 0x04A6012	v0 * light matrix	
llv1	-	cop2 0x04AE012	v1 * light matrix	
llv2	-	cop2 0x04B6012	v2 * light matrix	
llvir	-	cop2 0x04BE012	ir * light matrix	
llv0tr	-	cop2 0x04A0012	v0 * light matrix + tr vector	
llv1tr	-	cop2 0x04A8012	v1 * light matrix + tr vector	
llv2tr	-	cop2 0x04B0012	v2 * light matrix + tr vector	
llirtr	-	cop2 0x04B8012	ir * light matrix + tr vector	
llv0bk	-	cop2 0x04A2012	v0 * light matrix + bk vector	
llv1bk	-	cop2 0x04AA012	v1 * light matrix + bk vector	
llv2bk	-	cop2 0x04B2012	v2 * light matrix + bk vector	
llirbk	-	cop2 0x04BA012	ir * light matrix + bk vector	
lc	-	cop2 0x04DA412	v0 * color matrix, Lower limit clamped to 0	
lcv0	-	cop2 0x04C6012	v0 * color matrix	
lcv1	-	cop2 0x04CE012	v1 * color matrix	
lcv2	-	cop2 0x04D6012	v2 * color matrix	
lcvir	-	cop2 0x04DE012	ir * color matrix	
lcv0tr	-	cop2 0x04C0012	v0 * color matrix + tr vector	
lcv1tr	-	cop2 0x04C8012	v1 * color matrix + tr vector	
lcv2tr	-	cop2 0x04D0012	v2 * color matrix + tr vector	
lcirtr	-	cop2 0x04D8012	ir * color matrix + tr vector	
lev0bk	-	cop2 0x04C2012	v0 * color matrix + bk vector	
lev1bk	-	cop2 0x04CA012	v1 * color matrix + bk vector	
lev2bk	-	cop2 0x04D2012	v2 * color matrix + bk vector	
leirbk	-	cop2 0x04DA012	ir * color matrix + bk vector	

Other instructions:

Name	Cycles	Command	Description	Format	
sqr12	-	cop2 0x0A80428	square of ir	1,19,12	
sqr0	-	cop2 0x0A80428	square of ir	1,31, 0	
op12	-	cop2 0x178000C	outer product	1,19,12	
op0	-	cop2 0x170000C	outer product	1,31, 0	
gpf12	-	cop2 0x198003D	general purpose interpolation	1,19,12	
gpf0	-	cop2 0x190003D	general purpose interpolation	1,31, 0	
gpl12	-	cop2 0x1A8003E	general purpose interpolation	1,19,12	
gpl0	-	cop2 0x1A0003E	general purpose interpolation	1,31, 0	




The Motion Decoder (MDEC)
	The Motion Decoder  (MDEC) is a special controller chip that takes a compressed  JPEG-like images and decompresses them into  24-bit bitmapped images for display by the GPU. The MDEC can only decompress a 16x16 pixel 24-bit image at at time,called "Macroblocks" These Macrobock are encoded block that uses the YUV (YCbCr) color scheme with  Discrete Cosine Transformation (DCT) and Run Length Encoding (RLE) applied The MDEC also performs 24 to 16 bit color conversion to prepare it for whatever color depth the GPU is in. Due to the extremely high speed that the decompression is done, the decompressed RGB bitmaps can be combined to from larger pictures and then ,if displayed in sequential order, to produce movies. The maximum speed is about 9,000 macroblocks per second, thereby making a movie that is 320x240 able to be played at about 30 frames per second. MDEC data can only be sent/received via DMA channels 0 and 1. DMA channel 0 is for uncompressed data going in and channel 1 is for retrieval of the uncompressed macroblocks. The MDEC gets controlled via the MDEC control register at location $1f80_1820. The current status of the MDEC can be checked using the MDEC status register at $1f80_1824. The following is a layout of the registers.

$1f80_1820 (mdec0) 
write:

31	28	27	26	25	24	0	
u	 RGB24 	u	 STP 	u	

Note: The first word of every data segment in a str-file is a control word written to this register.

u		Unknown
RGB24		should be set to 0 for 24-bit color and to 1 for 16-bit. In 16-bit mode
STP		toggles whether to set bit 15 of the decompressed data (semi-transparency)



$1f80_1824 (mdec1) 
read:

31	30	29	28	27	26	25	24	23	22	0	
FIFO	InSync	DREQ	u	RGB24	OutSync	STP	u	

u		Unknown
FIFO		First-In-First-Out buffer state
InSync		MDEC is busy decompressing data
OutSync	MDEC is trasnferring data to man memory
DREQ		Data Request
RGB24		0 for 24-bit color and to 1 for 16-bit. In 16-bit mode
STP		toggles whether to set bit 15 of the decompressed data (semi-transparency)
	
write:

31	30	0	
reset	u	

u		Unknown
reset		reset MDEC





MDEC Data Fomat
	The MDEC uses a 'lossy' picture format simalar to that of  the JPEG file format. A typical picture, before being put into the MDEC via DMA, is of the following format;

header
macroblock
...
macroblock
footer


The header is a 32 byte word.
31	16	15	0	
0x3800	sixe	
 0x3800		Data ID
size		size if data after the header


The Macrobocks are further broken up as follows
Cb block
Cr block
Y0 block
Y1 block
Y2 block
Y3 block
Cb,Cr		The color difference blocks
Y0,Y1,Y2,Y3	The Luminescence blocks

Within each block the DCT informaton and RLE compressed is is stored.
15	0	
DCT
RLE
...
RLE
EOD

DCT		DCT data, it has the quantization factor and the Direct Current (DC) reference 
15	10	9	0	
Q	DC	
Q	 Quantization factor (6 bits, unsigned)
DC	 Direct Current reference (10 bits, signed)





RLE		Run length data 
15	10	9	0	
LENGTH	DATA	
LENGTH	The number if zeros between data (6 bits, unsigned)
DATA		The data (10 bits, signed)

EOD		End Of Data(Footer)
15	0	
0xfe00
Lets the MDEC know a block is done. The footer is also the same thing.



SOUND

SPU - Sound Processing Unit

Introduction.
	The SPU is the unit responsible for all aural capabilities of the psx. It handles 24 voices, has a 512kb sound buffer. It also has ADSR envelope filters for each voice and lots of other features.

The Sound Buffer
	The SPU has control over a 512kb sound buffer. Data is stored compressed into blocks of 16 bytes. Each block contains 14 packed sample bytes and two header bytes, one for the packing and one for sample end and looping information. One such block is decoded into 28 sample bytes (= 14 16bit samples).
	In the first 4 kb of the buffer the SPU stores the decoded data of CD audio after volume processing and the sound data of voice 1 and voice 3 after envelope processing. The decoded data is stored as 16 bit signed values,
one sample per clock (44.1 khz).
	Following this first 4kb are 8 bytes reserved by the system. The memory beyond that is free to store samples, up to the reverb work area if the effect processor is used. The size of this work area depends on which type of effect is being processed. More on that later.
Memory layout
0x00000-0x003ff 	CD audio left	
0x00400-0x007ff  	CD audio right	
0x00800-0x00bff  	Voice 1	
0x00c00-0x00fff  	Voice 3	
0x01000-0x0100f  	System area.	
0x01008-0xxxxxx  	Sound data area.	
0x0xxxx-0x7ffff  	Reverb work area.	

Voices
	The SPU has 24 hardware voices. These voices can be used to reproduce sample data, noise or can be used as frequency modulator on the next voice. Each voice has it's own programmable ADSR envelope filter. The main volume can be programmed independently for left and right output.
The ADSR envelope filter works as follows:
Ar	Attack rate, which specifies the speed at which the volume increases from zero to it's maximum value, as soon as the note on is given. The slope can be set to lineair or exponential.
Dr	Decay rate specifies the speed at which the volume decreases to the sustain level. Decay is always decreasing exponentially.
Sl	Sustain level, base level from which sustain starts.
Sr	Sustain rate is the rate at which the volume of the sustained note increases or decreases. This can be either lineair or exponential.
Rr	Release rate is the rate at which the volume of the note decreases as soon as the note off is given.
lvl	Volume level
t	Time

The overal volume can also be set to sweep up or down lineairly or exponentially from it's current value. This can be done seperately for left and right.

SPU Operation
	The SPU occupies the area 0x1f80_1c00-0x1f80_1dff. All registers are 16 bit wide.

0x1f80_1c00-0x1f80_1d7f 		Voice data area. For each voice there are 8 16 bit  registers structured like this:

0x1f80_1xx0-0x1f80_1xx2	Volume
(xx = 0xc0 + voice number)
0x1f80_1xx0	Volume Left	
0x1f80_1xx2	Volume Right	

Volume mode:
15	14	13	0	
0	S	VV	
VV 	0x0000-0x3fff   Voice volume.
S	0	Phase Normal
	1	Inverted

Sweep mode:
15	14	13	12	11	7	6	0	
1	Sl	Dr	Ph		VV	
VV	0x0000-0x007f   Voice volume.
Sl	0	Lineair slope
	1	Exponential slope
Dr	0	Increase
	1	Decrease
Ph	0	Normal phase
	1	Inverted phase
In sweep mode, the current volume increases to its maximum value, or decreases to its mimimum value, according to mode. Choose phase equal to the the phase of the current volume.

0x1f80_1xx4	Pitch
15	14	13	0	
	Pt	
Pt	0x0000-0x3fff 	Specifies pitch.
Any value can be set, table shows only octaves:
0x0200	-3 octaves
0x0400	-2
0x0800 	-1
0x1000	sample pitch
0x2000	+1
0x3fff	+2

0x1f80_1xx6	Start address of Sound
15	0	
Addr
Addr	Startaddress of sound in Sound buffer /8



0x1f80_1xx8	Attack/Decay/Sustain level
15	14	8	7	4	3	0	
Am	Ar	Dr	Sl	
Am	0	Attack mode Linear
	1	Exponential
Ar	0-7f	attack rate
Dr	0-f	decay rate
Sl	0-f	sustain level

0x1f80_1xxa	Sustain rate, Release Rate.
15	14	13	12	6	5	4	0	
Sm	Sd	0	Sr	Rm	Rr	
Sm	0	sustain rate mode linear
	1	exponential
Sd	0	sustain rate mode increase
	1	decrease
Sr	0-7f	Sustain Rate
Rm	0	Linear decrease
	1	Exponential decrease
Rr	0-1f	Release Rate
Note: decay mode is always Expontial decrease, and thus cannot be set.

0x1f80_1xxc	Current ADSR volume
15	0	
ASDRvol
ADSRvol	Returns the current envelope volume when read.

0x1f80_1xxe	Repeat address.
15	0	
Ra
Ra	0x0000-0xffff	Address sample loops to at end.
Note:	Setting this register only has effect after the voice has started (ie. KeyON), else the loop address gets reset by the sample.

SPU Global Registers

0x1f801d80	Main volume left
0x1f801d82	Main volume right
15	0	
MVol
Mvol	0x0000-0xffff	Main volume
Sets Main volume, these work the same as the channel volume registers. See those for details.

0x1f801d84	Reverberation depth left
0x1f801d86	Reverberation depth right
15	14	0	
P	Rvd	
Rvd	0x0000-0x7fff	Sets the wet volume for the effect.
P	0	Normal phase
	1	Inverted phase

Following registers have a common layout:

first register:
15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0	
c15	c14	c13	c12	c11	c10	c9	c8	c7	c6	c5	c4	c3	c2	c1	c0	
second register:
15	8	7	6	5	4	3	2	1	0	
0	c17	c16	c15	c14	c13	c12	c11	c10	
c0-c17	0	Mode for channel cxx off
	1	Mode for channel cxx on

0x1f80_1d88	Voice ON  (0-15)
0x1f80_1d8a	Voice ON  (16-23)
Sets the current voice to key on. (ie. start ads)

0x1f80_1d8c	Voice OFF (0-15)
0x1f80_1d8e	Voice OFF (16-23)
Sets the current voice to key off.(ie. release)

0x1f80_1d90	Channel FM (pitch lfo) mode (0-15)
0x1f80_1d92	Channel FM (pitch lfo) mode (16-23)
Sets the channel frequency modulation. Uses the previous channelas modulator.

0x1f80_1d94	Channel Noise mode (0-15)
0x1f80_1d96	Channel Noise mode (16-23)
Sets the channel to noise.

0x1f80_1d98	Channel Reverb mode (0-15)
0x1f80_1d9a	Channel Reverb mode (16-23)
Sets reverb for the channel. As soon as the sample ends, the reverb for that channel is turned off.

0x1f80_1d9c	Channel ON/OFF (0-15)                 
0x1f80_1d9e	Channel ON/OFF (16-23)
Returns wether the channel is mute or not.

0x1f80_1da2	Reverb work area start
15	0	
MVol
Revwa	0x0000-0xffff	Reverb work area start in sound buffer /8

0x1f80_1da4	Sound buffer IRQ address.
15	0	
IRQa
IRQa	0x0000-0xffff	IRQ address in sound buffer /8

0x1f80_1da6	Sound buffer IRQ address.
15	0	
Sba
Sba	0x0000-0xffff	Address in sound buffer divided by eight. Next transfer to this address.



0x1f80_1da8	SPU data
15	0	

Data forwarding reg, for non DMA transfer.

0x1f80_1daa	SPU control	sp0
15	14	13	8	7	6	5	4	3	2	1	0	
En	Mu	Noise	Rv	Irq	DMA	Er	Cr	Ee	Ce	
En	0	SPU off
	1	SPU on
Mu	0	Mute SPU
	1	Unmute SPU
Noise		Noise clock frequency
Rv	0	Reverb Disabled
	1	Reverb Enabled
Irq	0	Irq disabled
	1	Irq enabled
DMA	00
	01	Non DMA write  (transfer through data reg)
	10	DMA Write
	11	DMA Read
Er	0	Reverb for external off
	1	Reverb for external on
Cr	0	Reverb for CD off
	1	Reverb for CD on
Ee	0	External audio off
	1	External audio on
Ce	0	CD audio off
	1	CD audio on

0x1f80_1dac	SPU status
15	0	

In SPU init routines this register get loaded with 0x4.

0x1f80_1dae	SPU status
15	12	11	10	9	0	
	Dh	Rd		
Dh	0	Decoding in first half of buffer
	1	Decoding in second half of buffer
Rd	0	Spu ready to transfer
	1	Spu not ready
Some of bits 9-0 are also ready/not ready states. More on that later. Functions that wait for the SPU to be ready, wait for bits a-0 to become 0.

0x1f80_1db0	CD volume left
0x1f80_1db2	CD volume right
15	14	0	
P	CDvol	
CDvol	0x0000-0x7fff 	Set volume of CD input.
P	0	Normal phase.
	1	Inverted phase.
0x1f80_1db4	Extern volume left
0x1f80_1db6	Extern volume right
15	14	0	
P	Exvol	
Exvol	0x0000-0x7fff	Set volume of External input.
P	0	Normal phase.
	1	Inverted phase.

0x1dc0-&1dff         Reverb configuration area


0x1f80_1dc0         
0x1f80_1dc2
0x1f80_1dc4	Lowpass Filter Frequency. 7fff = max value= no filtering
0x1f80_1dc6	Effect volume 0 - 0x7fff, bit 15 = phase.
0x1f80_1dc8
0x1f80_1dca
0x1f80_1dcc
0x1f80_1dce	Feedback
0x1f80_1dd0
0x1f80_1dd2
0x1f80_1dd4	Delaytime(see below)
0x1f80_1dd6	Delaytime(see below)
0x1f80_1dd8	Delaytime(see below)
0x1f80_1dda
0x1f80_1ddc
0x1f80_1dde
0x1f80_1de0	Delaytime(see below)
0x1f80_1de2
0x1f80_1de4
0x1f80_1de6
0x1f80_1de8
0x1f80_1dea
0x1f80_1dec
0x1f80_1dee
0x1f80_1df0
0x1f80_1df2
0x1f80_1df4	Delaytime
0x1f80_1df6	Delaytime
0x1f80_1df8
0x1f80_1dfa
0x1f80_1dfc
0x1f80_1dfe

	
Reverb
	The SPU is equipped with an effect processor for reverb echo and delay type of effects. This effect processor can do one effect at a time, and for each voice you can specify wether it should have the effect applied or not.
	The effect is setup by initializing the registers 0x1dc0 to 0x1ffe to the desired effect. I do not exactly know how these work, but you can use the presets below.
	The effect processor needs a bit of sound buffer memory to perform it's calculations. The size of this depends on the effect type. For the presets the sizes are:
Reverb off	0x00000	Hall		0x0ade0
Room		0x026c0	Space echo	0x0f6c0
Studio small	0x01f40	Echo		0x18040
Studio medium	0x04840	Delay		0x18040
Studio large	0x06fe0	Half echo	03c00

	The location at which the work area is location is set in register 0x1da2 and it's value is the location in the sound buffer divided by eight. Common values are as follows:
Reverb off	0xFFFE	Hall		0xEA44
Room		0xFB28	Space echo	0xE128
Studio small	FC18	Echo		0xCFF8
Studio medium	0xF6F8	Delay		0xCFF8
Studio large	0xF204	Half echo	0xF880

	For the delay and echo effects (not space echo or half echo) you canspecify the delay time, and feedback. (range 0-127) Calculations are shownbelow.

When you setup up a new reverb effect, take the following steps:

-Turn off the reverb (bit 7 in sp0)
-Set Depth to 0
-First make delay & feedback calculations.
-Copy the preset to the effect registers
-Turn on the reverb
-Set Depth to desired value.

Also make sure there is the reverb work area is cleared, else you might get some unwanted noise.

To use the effect on a voice, simple turn on the corresponing bit in the channel reverb registers. Note that these get turned off autmatically when the sample for the channel ends.


Effect presets
 copy these in order to 0x1dc0-0x1dfe

Reverb off:
0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000
0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000
0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000
0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000

Room:
0x007D, 0x005B, 0x6D80, 0x54B8, 0xBED0, 0x0000, 0x0000, 0xBA80
0x5800, 0x5300, 0x04D6, 0x0333, 0x03F0, 0x0227, 0x0374, 0x01EF
0x0334, 0x01B5, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000
0x0000, 0x0000, 0x01B4, 0x0136, 0x00B8, 0x005C, 0x8000, 0x8000

Studio Small:
0x0033, 0x0025  0x70F0  0x4FA8  0xBCE0  0x4410  0xC0F0  0x9C00
0x5280  0x4EC0  0x03E4  0x031B  0x03A4  0x02AF  0x0372  0x0266
0x031C  0x025D  0x025C  0x018E  0x022F  0x0135  0x01D2  0x00B7
0x018F  0x00B5  0x00B4  0x0080  0x004C  0x0026  0x8000  0x8000

Studio Medium:
0x00B1  0x007F  0x70F0  0x4FA8  0xBCE0  0x4510  0xBEF0  0xB4C0
0x5280  0x4EC0  0x0904  0x076B  0x0824  0x065F  0x07A2  0x0616
0x076C  0x05ED  0x05EC  0x042E  0x050F  0x0305  0x0462  0x02B7
0x042F  0x0265  0x0264  0x01B2  0x0100  0x0080  0x8000  0x8000

Studio Large:
0x00E3  0x00A9  0x6F60  0x4FA8  0xBCE0  0x4510  0xBEF0  0xA680
0x5680  0x52C0  0x0DFB  0x0B58  0x0D09  0x0A3C  0x0BD9  0x0973
0x0B59  0x08DA  0x08D9  0x05E9  0x07EC  0x04B0  0x06EF  0x03D2
0x05EA  0x031D  0x031C  0x0238  0x0154  0x00AA  0x8000  0x8000

Hall:
0x01A5  0x0139  0x6000  0x5000  0x4C00  0xB800  0xBC00  0xC000
0x6000  0x5C00  0x15BA  0x11BB  0x14C2  0x10BD  0x11BC  0x0DC1
0x11C0  0x0DC3  0x0DC0  0x09C1  0x0BC4  0x07C1  0x0A00  0x06CD
0x09C2  0x05C1  0x05C0  0x041A  0x0274  0x013A  0x8000  0x8000

Space Echo:
0x033D  0x0231  0x7E00  0x5000  0xB400  0xB000  0x4C00  0xB000
0x6000  0x5400  0x1ED6  0x1A31  0x1D14  0x183B  0x1BC2  0x16B2
0x1A32  0x15EF  0x15EE  0x1055  0x1334  0x0F2D  0x11F6  0x0C5D
0x1056  0x0AE1  0x0AE0  0x07A2  0x0464  0x0232  0x8000  0x8000

Echo:
0x0001  0x0001  0x7FFF  0x7FFF  0x0000  0x0000  0x0000  0x8100
0x0000  0x0000  0x1FFF  0x0FFF  0x1005  0x0005  0x0000  0x0000
0x1005  0x0005  0x0000  0x0000  0x0000  0x0000  0x0000  0x0000
0x0000  0x0000  0x1004  0x1002  0x0004  0x0002  0x8000  0x8000

Delay:
0x0001  0x0001  0x7FFF  0x7FFF  0x0000  0x0000  0x0000  0x0000
0x0000  0x0000  0x1FFF  0x0FFF  0x1005  0x0005  0x0000  0x0000
0x1005  0x0005  0x0000  0x0000  0x0000  0x0000  0x0000  0x0000
0x0000  0x0000  0x1004  0x1002  0x0004  0x0002  0x8000  0x8000

Half Echo:
0x0017  0x0013  0x70F0  0x4FA8  0xBCE0  0x4510  0xBEF0  0x8500
0x5F80  0x54C0  0x0371  0x02AF  0x02E5  0x01DF  0x02B0  0x01D7
0x0358  0x026A  0x01D6  0x011E  0x012D  0x00B1  0x011F  0x0059
0x01A0  0x00E3  0x0058  0x0040  0x0028  0x0014  0x8000  0x8000


Delay time calculation:
Choose delay time in range 0-0x7f. rXXXX means register 0x1f80_XXXX.

r1dd4 = dt*64.5 - r1dc0
r1dd6 = dt*32.5 - r1dc2

r1dd8 = r1dda + dt*32.5
r1de0 = r1de2 + dt*32.5
r1df4 = r1df8 + dt*32.5
r1df6 = r1dfa + dt*32.5


The CD-ROM

Overview
	The PSX uses a special two speed CD-ROM that can stream at 352K/sec.It uses the following registers to control it
CDREG0 = 0x1f80_1800
CDREG1 = 0x1f80_1801
CDREG2 = 0x1f80_1802
CDREG3 = 0x1f80_1803

REGISTER FORMAT
CDREG0		write:	0	to send a command
			1	to get the result
		read:		I/O status
				bit 0	0	REG1 command send
					1	REG1 data read
				bit 1	0	data transfer finished
					1	data transfer ready/in progress
				bit 7	1	command being processed.

CDREG1		write:	command
		read:	results

CDREG2		write:	send arguments
		write:	7 = flush arg buffer?

CDREG3		write:	7 = flush irq
		read:	hi nibble:	unknown
			low nibble:	interrupt status

MODES FOR SETMODE
Mode	bit	function	
M_Speed	bit 7	0: normal speed	1: double speed	
M_Strsnd	bit 6	0: ADPCM off 	1: ADPCM on	
M_Size	bit 5	0: 2048 byte	1: 2340 byte	
M_Size2	bit 4	0:-	1: 2328 byte	
M_SF	bit 3	bit 3 0: Channel off	1: Channel on	
M_Report	bit 2	0: Report off 	1: Report on	
M_AutoPause	bit 1	0: AutoPause off 1	1: AutoPause on	
M_CDDA	bit 0	0: CD-DA off	1: CD-DA on	











These modes can be set using the setmode command,
 Status bits:
Play	bit 7	playing CD-DA	
Seek	bit 6	seeking	
Read	bit 5	 reading data sectors	
ShellOpen	bit 4	once shell open	
SeekError	bit 3	seek error detected	
Standby	bit 2 	  spindle motor rotating	
Error	bit 1	command error detected	

These are the bit values for the status byte recieved from CD commands.
Interrupt values:
NoIntr	0x00	No interrupt	
DataReady	0x01	Data Ready	
Acknowledge	0x02	Command Complete	
Complete	0x03	Acknowledge	
DataEnd	0x04	End of Data Detected	
DiskError	0x05	Error Detected	

These are returned in the low nibble of CDREG3. First write a 1 to CDREG0 before reading CDREG3. When a command is completed it returns 3. To acknowledge an irq value after you've handled it, write a 1 to CDREG0 then a 7 to both CDREG2 and CDREG3. Another interrupt may be queued, so you should check CDREG3 again if 0 or if there's another interrupt to be handled.
Name	Command	Blocked	Paramater	Returns	
Sync	0x00		-	status	
Nop	0x01		-	status	
Setloc	0x02		min,sec,sector	status	
Play	0x03	B	-	status	
Forward	0x04	B	-	status	
Backward	0x05	B	-	status	
ReadN	0x06	B	-	status	
Standby	0x07	B	-	status	
Stop	0x08	B	-	status	
Pause	0x09	B	-	status	
Init	0x0a		-	status	
Mute	0x0b		-	status	
Demute	0x0c		-	status	
Setfilter	0x0d		file,channel	status	
Setmode	0x0e		mode	status	
Getparam	0x0f		-	status,mode,file?,chan?,?,?	
GetlocL	0x10		-	min,sec,sector,mode,file,channel	
GetlocP	0x11		-	track,index,min,sec,frame,amin, asec,aframe	
GetTN	0x13		- 	status,first,total (BCD)	
GetTD	0x14		rack(BCD)	status,min,sec (BCD)	
SeekL	0x15	B	* 	status	
SeekP	0x16	B	*	status	
Test	0x19	B	#	depends on parameter	
ID	0x1A	B	-	success,flag1,flag2,00 4 letters of ID (SCEx)	
ReadS	0x1B	B	-	status	
Reset	0x1C		-	status	
ReadTOC	0x1E	B	-	status	

* These commands' targets are set using Setloc.
# Command 19 is really a portal to another set of commands.

B means blocking.  These commands return an immediate result saying the command was started, but you need to wait for an IRQ in order to get real results.

Command descriptions:
Command Number	Command Name	Discription	
0x00	Sync	Command does not succeed until all other commands complete. This can be used for synchronization - hence the name.	
0x01	Nop	Does nothing; use this if you just want the status.	
0x02	Setloc	This command, with its parameters, sets the target for commands with a * for their parameter list.	
0x03	Play	Plays audio sectors from the last point seeked.  This is almost identical to CdlReadS, believe it or not.  The main difference is that this does not trigger a completed read IRQ.  CdlPlay may be used on data sectors However, all sectors from data tracks are treated as 00, so no sound is played.  As CdlPlay is reading, the audio data appears in the sector buffer, but is not reliable.  Game Shark "enhancement CDs" for the 2.x and 3.x versions used this to get around the PSX copy protection.	
0x04	Forward	Seek to next track ?	
0x05	Backward	Seek to beginning of current track, or previous track if early in current track (like a CD player's back button)	
0x06	ReadN	Read with retry.  Each sector causes an IRQ (type 1) if ModeRept is on (I think).  ReadN and ReadS cause errors if you're trying to read a non-PSX CD or audio CD without a  mod chip.	
0x07	Standby	CD-ROM aborts all reads and playing, but continues spinning.  CD-ROM does not attempt to keep its place.	
0x08	Stop	Stops motor.  Official way to restart is 0A, but almost any command will restart it.	
0x09	Pause	Like Standby, except the point is to maintain the current location within reasonable error.	
0x0A	Init	Multiple effects at once.  Setmode = 00, Standby, abort all commands.	
0x0B	Mute	Turn off CDDA stream to SPU.	
0x0C	Demute	Turn on CDDA streaming to SPU.	
0x0D	Setfilter	Automatic ADPCM (CD-ROM XA) filter ignores sectors except those which have the same channel and file (parameters) in their subheader area.  This is the mechanism used to select which of multiple songs in a single XA to play. Setfilter does not affect actual reading (sector reads still occur for all sectors).	
0x0E	Setmode	Sets parameters such as read mode and spin speed.  See chart above the command list.	
0x0F	Getparam	returns status, mode, file, channel, ?, ?	
0x10	GetlocL	Retrieves first 6 (8?) bytes of last read sector (header) This is used to know where the sector came from, but is generally pointless in 2340 byte read mode.  All results are in BCD ($12 is considered track twelve, not eighteen) Command may execute concurrently with a read or play (GetlocL returns results immediately).	
0x11	GetlocP	Retrieves 8 of 12 bytes of sub-Q data for the last-read sector.  Same purpose as GetlocL, but more powerful, and works while playing audio.  All results are in BCD. See note	
0x13	GetTN	Get first track number and number of tracks in the TOC.	
0x14	GetTD	Gets start of specified track (does it return sector??)	
0x15	SeekL	Seek to Setloc's location in data mode (can only seek to data sectors, but is accurate to the sector)	
0x16	SeekP	Seek to Setloc's location in audio mode (can seek to any sector, but is only accurate to the second)	
0x19	Test	This function has many subcommands that are completely different. See ending notes	
NOTES
the sub-Q fromat is as follows
                   track:  track number ($AA for lead-out area)
                   index:  index number (INDEX lines in CUE sheets)
                   min:    minute number within track
                   sec:    second number within track
                   frame:  sector number within "sec" (0 to 74)
                   amin:   minute number on entire disk
                   asec:   second number on entire disk
                   aframe: sector number within "asec" (0 to 74)

Test subcommands
1A	ID
Returns copy protection status.  StatError for invalid data CD, StatStandby for valid PSX CD or audio CD.  The following bits I'm unsure about, but I think the 3rd byte has $80 bit for "CD denied" and $10 bit for "import".  $80 = copy, $90 = denied import, $10 = accepted import (Yaroze only).  The 5th through 8th bytes are the SCEx ASCII string from the CD. 
1B	ReadS
Read without automatic retry.
1C	Reset
Same as opening and closing the drive door.
1E	ReadTOC
Reread the Table of Contents without reset. 


To send a command:

- First send any arguments by writing 0 to CDREG0, then all arguments sequentially to CDREG2

- Then write 0 to CDREG0, and the command to CDREG1.

To wait for a command to complete:

- Wait until a CDrom irq occurs (bit 3 of the interrupt regs) The cause of the cdrom irq is in the low nibble of CDREG3. This is usually 3  on a succesful comletion. Failure to complete the command will result  in a 5. If you don't wish to use irq's you can just check for the  low nibble of cdreg3 to become something other than 0, but make sure  it doesn't get cleared in any irq setup by the bios or some such.

To Get the results

- Write a 1 to CDREG0, then read CDREG0, If bit 5 is set, read a return value from CDREG1, then read CDREG0 again repeat until bit 5 goes low.

To Clear the irq

- After command completion the irq cause should be cleared, do this by  writing a 1 to CDREG0 then 7 to CDREG2 and CDREG3. My guess is that  the write to CDREG2 clears the arguments previously set from some  buffer.  Note that irq's are queued, and if you clear the current, another may  come up directly..

To init the CD:

-Flush all irq's
-CDREG0=0
-CDREG3=0
-Com_Delay=4901 ($1f801020)
-Send 2 NOP's
-Command $0a, no args.  
-Demute

To set up the cd for audio playback

CDREG0=2
CDREG2=$80
CDREG3=0
CDREG0=3
CDREG1=$80
CDREG2=0
CDREG3=$20

Also don't forget to init the SPU. (CDvol and CD enable especially)

You should not send some commands while the CD is seeking. (ie. status returns with bit 6 set.) Thing is that the status only gets updated after a new command. I haven't tested this for other command, but for the play command ($03) you can just keep repeating the command and checking the status returned by that, for bit 6 to go low(and bit 7 to go high in this case) If you don't and try to do a getloc directly after the play command reports it's done, the cd will stop. (I guess the cd can't get it's current location while it's seeking, so the logic stops the seek to get an exact fix, but never restarts..)



19 subcommands.

For one reason or another, there is a counter that counts the number of SCEx strings received by the CD-ROM controller.

Be aware that the results for these commands can exceed 8 bytes.

0x04	Read SCEx counter (returned in 1st byte?)	
0x05	Reset SCEx counter.  This also sets 1A's SCEx response to  00 00 00 00, but doesn't appear to force a protection failure.	
0x20	Returns an ASCII string specifying where the CD-ROM firmware is intended to be used ("for Japan", "for U/C").	
0x22	Returns a chip number inside the PSX in use.	
0x23	Returns another chip number.	
0x24	Returns yet another chip number.  Same as 22's on some PSXs	



Root Counters

Overview
Root counters are timers in the PSX. There are 4 root counters.

Counter	Base address	Synced to	
0	0x1f80_1100	pixelclock	
1	0x1f80_1110	horizontal retrace	
2	0x1f80_1120	1/8 system clock	
3		vertical retrace	

Each have three registers, one with the current value, one with the counter mode, and one with a target value.

0x11n0	Count	[read]
31	16	15	0	
Garbage 	Count   	

Count	Current count value, 0x0000-0xffff
Upper word seems to contain only garbage.

0x11n4	Mode	[read/write]

31	10	9	8	7	6	5	4	3	2	1	0	
Garbage	Div	Clc		Iq2		Iq1	Tar		En	

En	0	Counter running
	1	Counter stopped (only counter 2)
Tar	0	Count to $ffff
	1	Count to value in target register
Iq1		Set both for IRQ on target reached.
Iq2
Clc	0	System clock (it seems)
	1	Pixel clock (counter 0)
		Horizontal retrace (counter 1)
Div	0	System clock (it seems)
	1	1/8 * System clock (counter 2)

When Clc and Div of the counters are zero, they all run at the same speed. This speed seems to be about 8 times the normal speed of root counter 2, which is specified as 1/8 the system clock.
Bits 10 to 31 seem to contain only garbage.

0x11n8	Count	[read/write]
31	16	15	0	
Garbage 	Target	

Target	Target value, 0x0000-0xffff
Upper word seems to contain only garbage.



Quick step-by-step:

To set up an interrupt using these counters you can do the following:
1 - Reset the counter. (Mode = 0)
2 - Set its target value, set mode.
3 - Enable corresponding bit in the interrupt mask register ($1f801074)
    bit 3 = Counter 3 (Vblank)
    bit 4 = Counter 0 (System clock)
    bit 5 = Counter 1 (Hor retrace)
    bit 6 = Counter 2 (Pixel)
4 - Open an event. (Openevent bios call - $b0, $08)
    With following arguments:
 a0-Rootcounter event descriptor or'd with the counter number.
    ($f2000000 - counter 0, $f2000001 - counter 1,$f2000002 - counter 2, $f2000003 - counter 3)
 a1-Spec = $0002 - interrupt event.
 a2-Mode = Interrupt handling ($1000)
 a3-Pointer to your routine to be excuted.
    The return value in V0 is the event identifier.

5 - Enable the event, with the corresponding bioscall ($b0,$0c) with the identifier as argument.

6 - Make sure interrupts are enabled. (Bit 0 and bit 10 of the COP0 status register must be set.)

Your handler just has to restore the registers it uses, and it should terminate with a normal jr ra.

To turn off the interrupt, first call disable event ($b0, $0d) and then close it using the Close event call ($b0,$09) both with the event number as argument.


Controllers

Overview
	The PSX uses a 9 pin device connecter for use with the PSX controller.  The controller port is exactly the same electricly as the memory card port. The only difference is the device driver that uses it, and it's external port shape. The controllers are accessed via the InitPAD StartPAD, StopPAD, PAD_init, and PAD_dr BIOS commands. These are covered in detail within the BIOS section of this document.  The controller is a type of "smart device" that communicates data serially via the port. Port informaton is as follows.


Pin	signal	dir	active	description	
1	dat	in	pos	data from pad or memory-card	
2	cmd	out	pos	command data to pad or memory-card	
3	+7V	--	--	+7.6V power source for CD-ROM drive	
4	gnd	--	--		
5	+3V	--	--	+3.6V power source for system	
6	sel	out	neg	select pad or memory-card	
7	clk	out	--	data shift clock 	
8	--	--	--	N.A.	
9	ack	in	neg	acknowladge signal from pad or memory-card	

1) direction(in/out) is based from PSX
2) metal edge in pad connecter is connected pin 4 and sheald calbe.
3) signal SEL in PAD1, PAD2 is separated.

Comminucation timing chart
Timing is compatible in the PAD as well as the Memory-card.

Overview
      ____                                                              _____
 SEL-     |____________________________________________________________|     
      ______        ____        ____        ____        ____        _________
 CLK        ||||||||    ||||||||    ||||||||    ||||||||    ||||||||         
      _______________________________________________________________________
 CMD       X  01h   XXXX  42h   XXXX  00h   XXXX  00h   XXXX  00h   XXXX     
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~     
           _____________________________________________________________     
 DAT  -----XXXXXXXXXXXXX   ID   XXXX   5Ah  XXXX  key1  XXXX  key2  XXXX-----
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~     

 ACK- ---------------|_|---------|_|---------|_|---------|_|-----------------

Top command. First comminucation(device check)
      ____                                                                   
 SEL-     |__________________________________________________________________
      ______   _   _   _   _   _   _   _   __________________   _   _   _   _
 CLK        |_| |_| |_| |_| |_| |_| |_| |_|                  |_| |_| |_| |_| 
      __________                                                  ___        
 CMD            |________________________________________________|   |_______
                                                             ____            
 DAT  -----XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX    |___________
                                                                             
 ACK- ----------------------------------------------|___|--------------------

	X = none, - = Hi-Z

0x81 is memory-card, 0x01 is standard-pad at top command.
serial data transfer is LSB-First format.
data is down edged output, PSX is read at up edge in shift clock.
PSX expects No-connection if not returned Acknoledge less than 100 usec.
clock pluse is 250KHz.
no need Acknoledge at last data.
Acknoledge signal width is more than 2 usec.
time is 16msec between SEL from previous SEL.
SEL- for memory card in PAD access.

Communication format with the Pad
	After the command 0x01h is sent to the pad drom the system, the pad replies with a one-byte PAD ID(0x5A), then it will send a 2-byte key code and extended code.

Normal Pad	timing flow ->
	10000000	1000010	1011010	1234567	1234567	
CMD	01h	42h	00h	00h	00h	
	xxxxxxxx	10000010	10100101	1234567	1234567	
DAT	----	41h	5ah	SW.1	SW.2	




data contents of normal PAD.(push low)
byte	b7	b6	b5	b4	b3	b2	b1	b0		
0	---	N.A.	
1	0x41	'A'	
2	0x5a	'Z'	
3	LEFT	DOWN	RGHT	UP	STA	1	1	SEL		
4	Square	X	O	Triangle	R1	L1	R2	L2		





data contents NEGCON(NAMCO analog controler, push low)
byte	b7	b6	b5	b4	b3	b2	b1	b0		
0	-----	N.A.	
1	0x23		
2	0x5a	'Z'	
3	LEFT	DOWN	RGHT	UP	STA	1	1	1		
4	1	1	A	B	R	1	1	1		
5	handle data   right:0x00, center:0x80		
6	I button ADC data (7bit unsigned)	00h to 7Fh	
7	II button ADC data	00h to 7Fh	
8	L button ADC data	00h to 7Fh	
unknown data bit length in +6 to +8 ADC datas. (7 or 8 may be)


mouse data contents(push low)
byte	b7	b6	b5	b4	b3	b2	b1	b0		
0	-----	N.A.	
1	0x12		
2	0x5a	'Z'	
3	1	1	1	1	1	1	1	1		
4	1	1	1	1	LEFT	RGHT	0	0		
5	V moves 8bitSigned up:+,dwn:-,stay:00		
6	H moves 8bitSigned up:+,dwn:-,stay:00		



Memory cards

Memory Card Format

	The memory card for the PSX is 128 kilobytes of non-volatile RAM. This is split into 16 blocks each containing 8 kilobytes each. The very first block is is a header block used as a directory and file allocation table  leaving 15 blocks left over for data storage. 
	The data blocks contains the program data file name, block name, icon, and other critical information.  The PSX accesses the data via a "frame" method. Each block is split into 64 frames, each 128 bytes.  The first frame (frame 0) is the file name, frames 1 to 3 contain the icon, (each frame of animation taking up one frame) leaving the rest of the frames for save data.


Terms and Data Format

	This is the format of the various objects within the memory card.

File Name
	Country code(2 bytes)+Product number(10 bytes)+identifier(8 bytes) An example of a product number is SCPS-0000. The identifier is a variation on the name of the game, for example FF8 will be FF0800, FF0801.  The format if the product is 4 characters, a hyphen, and then 5 characters. The actula characters don't really matter. With a PocketStation program, the product ID is a monochrome icon, a hyphen and the later part containing a "P" 

Country Code
	In Japan the code is BI, Europe is BE, and America is BA. An American PSX and use memory saves with the BI country code.  

Title
	The title is in Shift-JIS format with a max if 32 characters. ASCII can be used as ASCII is a subset of Shift-JIS.


XOR Code
	This is a checksum. Each byte is XORed one by one and the result is stored. Complies with the checksum protocol.

Link
	This is a sequence of 3 bytes to link blocks togeather to form one continuous data block. 

Data Size
Total Memory	128KB = 13,1072 bytes= 0x20000bytes	
1 Block	8KB = 8192 bytes = 0x2000 bytes	
1 Frame	128 bytes = 0x80 bytes	

Header Frame
+0x00	'M' (0x4D)	
+0x01	'C' (0x43)	
+0x02 - 0x7E	Unused (0x00) 	
+0x7F	XOR code  (usually 0x0E)	

Directory Frame
+0x00 	Availible bocks      upper 4 bits           A - Availible           5 - partially  used           F - Unusable      Lowe 4 bits           0 - Unused           1 - There is no link, but one will be here later           2 - mid link block           3 - terminiting link block           F - unusable Example            A0 - Open block            51 - In use, there will be a link in the next block            52 - In use, this is in a link and will link to another            53 - In use, this is the last in the link            FF - Unusable	
+0x01 - 0x03	00 00 00 When it's reservered it's FF FF FF	
+0x04 - 0x07	Use byte           00 00 00 - Open block middle link block, or end link block           Block * 0x2000 - No link, but will be a link          (00 20 00 - one blocks will be used)          (00 40 00 - two blocks will be used)          (00 E0 01 - 15 blocks will be used )	
+0x08-0x09	Link order Block 0-14           If the bock isn't in a link or if it's the last link in the line the line, it's 0xffff 	
+0x0A-0x0B	Country Code (BI, BA, BE)	
+0x0C+0x15	Product Code (AAAA-00000)            Japan SLPS, SCPS (from SCEI)           America SLUS, SCUS (from SCEA)           Europe SLES, SCES (from SCEE)	
+0x16-0x1D	Identifier           This Number is created unique to the current game played. Meaning the first time a game is saved on the card, every subsequent save has the same identifier, but it a new game is started from the beginning, that will have a different idenitifier.	
+0x1E-0x7E	Unused	
0x7F	XOR Code	
THE FOLLOWING DATA REPEATS FOR THE NEXT 15 BLOCKS, THEN BLOCK 1 STARTS

Block Structure
Frame 0
Title Frame
0x00
'S' (0x53)
0x01
'C' (0x43)
+0x02
Icon Display Flag
00...No icon
11...Icon has 1 frame of animation (static)
12...Icon has 2 frames
13...Icon has 3 frames
+0x03
Block Number (1-15)
0x04 - 0x43
 
Title
This is  the title in Shift-JIS format, it allows for 32 characters to be written
0x44 - 0x5F
Reserved(00h)
This is used for the Pocketstation.
0x60 - 0x7F
Icon 16 Color Palette Data
Frame 1
Frame 3
Icon Frame
0x00 - 0x7F
Icon Bitmap
1 Frame of animaton == 1 Frame of data.
If there is no Icon for this bock, it's data instead.
Frame 4
Data Frame
+0x00 - 0x7F
Save Data

Link Block
Frame 1	+0x00 - 0x7F	Save Data	

Data Transmission
	Data is trasmitted with exactly the same protocol as the Pad data is trasmitted/revived.  The pin out are exactly the same as well, the houseing, however is a different shape. 



Serial I/O

	The PSX has a 8 pin serial adapter that uses a non-standars protocol for data transmission and receiving. The pin outs are pictured here.


	The pot speed is able to go up to a maximum of 256K bps. Normally it's used at 56K. On connecton problems the port will attempt a reconnect, but may not fall back on a slower speed.  The link cable is wired is such.

	1 <-> 4
	2  NC
	3 <-> 6
	4 <-> 1
	5 <-> 8
	6 <-> 3
	7 <-> 7
	8 <-> 5 

The pins are like this (looking into the link cable connector looking into the pins of the cable connector) and the connector facing up) :

					CABLE
                     _________________________________
                    //                               \\
                   //                                 \\
         ________________                       ________________
        /       UP       \                     /        UP      \
        ------------------                     ------------------- 
 LEFT	  |1 2 3 4 5 6 7 8 | RIGHT          LEFT | 1 2 3 4 5 6 7 8 | RIGHT
	  ------------------                     -------------------




Parallel I/O

Overview
	The Parallel prt is is a sort of a faux name. It's really an expantion port. Any device connected to this port will have access to everything on the local bus. The address that the PIO port resides on is from 0x1f00_0000-0x1f00_ffff The following is a pin diagram of the PIO. 



Appendix A

Number systems

The Hexadecimal system is as follows
Decimal	Hexadecimal	
1	1	
2	2	
3	3	
4	4	
5	5	
6	6	
7	7	
8	8	
9	9	
10	A	
11	B	
12	C	
13	D	
14	E	
15	F	
16	10	
17	11	
18	12	
19	13	
20	14	
21	15	
22	16	
23	17	
24	18	
25	19	
26	1A	
27	1B	
28	1C	
29	1D	
30	1E	
31	1F	
32	20	
33	21	
...	...	
252	FC	
253	FD	
254	FE	
255	FF	



Appendix B

TBA


Appendix C

GPU command listing

Overview of packet commands:

 0x01     clear cache
 0x02     frame buffer rectangle draw
 0x20     monochrome 3 point polygon
 0x24     textured 3 point polygon
 0x28     monochrome 4 point polygon
 0x2c     textured 4 point polygon
 0x30     gradated 3 point polygon
 0x34     gradated textured 3 point polygon
 0x38     gradated 4 point polygon
 0x3c     gradated textured 4 point polygon
 0x40     monochrome line
 0x48     monochrome polyline
 0x50     gradated line
 0x58     gradated line polyline
 0x60     rectangle
 0x64     sprite
 0x68     dot
 0x70     8*8 rectangle
 0x74     8*8 sprite
 0x78     16*16 rectangle
 0x7c     16*16 sprite
 0x80     move image in frame buffer
 0xa0     send image to frame buffer
 0xc0     copy image from frame buffer
 0xe1     draw mode setting
 0xe2     texture window setting
 0xe3     set drawing area top left
 0xe4     set drawing area bottom right
 0xe5     drawing offset
 0xe6     mask setting






Appendix D

Glossary of terms

PSX	Playstation
SCEI	Sony Computer Entertainment Incorporated (Sony of Japan)
SCEA	Sony Computer Entertainment America (Sony of America)
SCEE	Sony Computer Entertainment Europe (Sony of Europe)
GTE	Geometry Transformation Engine
GPU	Graphics Processing Unit
CPU	Central Processing Unit
MDEC	Motion DEcoding Chip
PIO	Parallel Input/Output port
SPU	Sound Processing Unit
BIOS	Basic Input/Output System



Appendix E

Works cited - Bibliography

History of the Sony PlayStation taken from http://www.psxpower.com

The IDTR3051  , R3052  RISController  Hardware User's Manual Revision 1.4 July 15, 1994
1992, 1994 Integrated Device Technology, Inc.

System.txt, cdinfo.txt, gpu.txt, spu.txt, gte.txt
doomed@c64.org  http://psx.rules.org

gte-lite.txt
http://www.in-brb.de/~creature/

MDEC data from
jlo@ludd.luth.se and various people at PSXDEV mailing list
http://www.geocities.co.jp/Playtown/2004/
bero@geocities.co.jp

Memcard/PAD Data
HFB03536@nifty-serve.or.jp

PIO
bitmaster@bigfoot.com

Syscall
sgf22@cam.ac.uk

Mem card format: E-nash
http://www.vbug.or.jp/users/e-nash/ 
e-nash@i.am

Plus the many more at PSXDEV mailing list that helped ^_^

Exitcode 84905

