3D: The Transformation Pipeline
Have you ever wondered how a 3D game is made? Or more specifically, how graphics are displayed on the screen as 3D graphics? I have. In my previous post, I said that is not easy to acquire this knowledge. However, I think it is worth spending a significant amount of time in this area. This is especially true if you intend to develop a 3D game as a programmer – whether the 3D engine, physics or Artificial Intelligence (AI). As I mentioned in my previous post, I will spend the rest of this year learning how to build a 3D rendering engine from scratch.
You will probably be wondering why one wants to learn how to build a 3D rendering engine when there are so many commercial and open-source rendering engines out there. It is true that I will probably never be able to create the next Doom3 or Unreal engine. However, developing the Doom3-beater is not my initial goal; instead my goal is to build a solid foundation of 3D knowledge in order to truly understand, utilize and modify other 3D rendering engines. I want to know what I am doing when I use a 3D engine or graphic API’s “vertex shader” or its simple “drawCube” functions.
Whether I plan to use a pre-built 3D engine or to create my own using a graphics API (Slade3D, haha) like OpenGL or DirectX Graphics, I would like to understand what the API is doing behind the scene. I do not want to be in a situation where I have to spend ten years debugging some annoying unexpected result. That would be disastrous. Without understanding the underlying mathematics that drives a mathematical construct (the 3D world), I do not think I will go very far in the realm of 3D.
I spent the last few nights learning/refreshing about the basics of 3D. Particularly the transformation pipeline (what every vertex of a 3D object must go through) and how vertices and matrices mathematics are used in an engine. I will try to give a brief summary of what I learnt.
- I start of by understanding the geometrical representation of objects in 2D and 3D on the Cartesian plate. Doesn’t this sound like math? Well it is. I would advice anyone going into games programming to try to be familiar and practice your math.
- Then I proceeded to understanding the way the CPU/GPU render individual pixel on the screen. Information of individual vertices that defines a 3D object is stored on the memory as simple data structures. Some examples of these data structures are vertices, polygons, meshes, etc
- Next I move on to the important Transformation Pipeline. Knowledge of the previous items is needed here. The transformation pipeline (this sounds really cool) is a long “pipeline” every 3D vertex has to pass through in order to know which pixel on the screen it occupies.
- Each vertex in a polygon has to go through various mathematical transformation processes starting with translation and then rotation – to bring the polygon to a new coordinate space, known as the world space. (I.e. The vertices will have new coordinates relative to the world origin.)
- After the world space transformation, the polygon is then transformed again to another new coordinate space known as the view space. This is to create the ‘camera view’ for the players to view the world. Properties like camera positions, viewing direction and field of view are specified for this view – usually stored in a camera data structure.
- Then we need to add some sort of perspective, some sort of depth. Without it, our 3D world would not be called a 3D world right? So we need to transform again. This time to another coordinate space known as the projection space. This is done by dividing each view space X and Y coordinate by a view space Z (depth) coordinate. Many other particulars regarding the limitation of the 2D screen resolution must be dealt with here.
- The final step in the transformation pipeline is to find a pixel coordinate on the screen for the current projected vertex. This process is known as screen space mapping. There are some simple concepts regarding screen resolution to grasp here.
- Once all the vertices of a polygon are in screen space, we can then draw lines in between each point to render a simple primitive wire-frame object on screen.
- When I began to understand how polygons are rendered, I realized why high-polygon models drawn in 3D modeling packages cause significant reduction of application performance. It is due to the high number of calculations the CPU/GPU has to perform in order to get those vertices on the screen. Hence the need for different techniques to reduce the number of calculations. One of such optimizing technique is called back face culling. It assumes that the player will never be allowed see the back of a polygon and thus not having to render it. 3D rendering engines usually perform a fast and cheap test to see if a polygon I facing the player. In order to do the aforementioned test, I have to know how to determine which way a polygon is facing. The answer lies in the order we specify the vertices of the polygon. This is known as the polygon winding order.
I was somewhat shocked to realize how much math one needs to know to comprehend how a basic 3D polygon is rendered on the screen. Furthermore because of the huge amount of calculation the CPU/GPU needs to perform, no 3D API use trigonometric calculations we just discussed to their transformation pipeline; instead another mathematical construct, matrices (plural for matrix) are used.
Phew, that is a lot to learn. I wonder where John Carmack learnt all these from before he programmed Doom. In addition to all these 3D concepts (in its minute detail), I had to refresh a lot of math. I will continue with the math in the next post.