3D: The Transformation Pipeline
   Have you ever wondered how a 3D game is made? Or more specifically, how graphics are displayed on the screen as 3D graphics? I have. In my previous post, I said that is not easy to acquire this knowledge. However, I think it is worth spending a significant amount of time in this area. This is especially true if you intend to develop a 3D game as a programmer – whether the 3D engine, physics or Artificial Intelligence (AI). As I mentioned in my previous post, I will spend the rest of this year learning how to build a 3D rendering engine from scratch.
You will probably be wondering why one wants to learn how to build a 3D rendering engine when there are so many commercial and open-source rendering engines out there. It is true that I will probably never be able to create the next Doom3 or Unreal engine. However, developing the Doom3-beater is not my initial goal; instead my goal is to build a solid foundation of 3D knowledge in order to truly understand, utilize and modify other 3D rendering engines. I want to know what I am doing when I use a 3D engine or graphic API’s “vertex shader” or its simple “drawCube” functions.
   Whether I plan to use a pre-built 3D engine or to create my own using a graphics API (Slade3D, haha) like OpenGL or DirectX Graphics, I would like to understand what the API is doing behind the scene. I do not want to be in a situation where I have to spend ten years debugging some annoying unexpected result. That would be disastrous. Without understanding the underlying mathematics that drives a mathematical construct (the 3D world), I do not think I will go very far in the realm of 3D.
   I spent the last few nights learning/refreshing about the basics of 3D. Particularly the transformation pipeline (what every vertex of a 3D object must go through) and how vertices and matrices mathematics are used in an engine. I will try to give a brief summary of what I learnt.
    - I start of by understanding      the geometrical representation of objects in 2D and 3D on the Cartesian      plate. Doesn’t this sound like math? Well it is. I would advice anyone      going into games programming to try to be familiar and practice your math.
  
  - Then I proceeded to understanding      the way the CPU/GPU render individual pixel on the screen. Information of      individual vertices that defines a 3D object is stored on the memory as      simple data structures. Some examples of these data structures are vertices, polygons, meshes,      etc
  
  - Next I move on to the      important Transformation Pipeline. Knowledge of the previous items      is needed here. The transformation pipeline (this sounds really cool) is a      long “pipeline” every 3D vertex has to pass through in order to know which      pixel on the screen it occupies. 
  
    - Each vertex in a       polygon has to go through various mathematical transformation processes       starting with translation and then rotation – to bring the       polygon to a new coordinate space, known as the world space.       (I.e. The vertices will have new coordinates relative to the world       origin.)
 
 
    - After the world space       transformation, the polygon is then transformed again to another new       coordinate space known as the view space. This is to create the       ‘camera view’ for the players to view the world. Properties like camera       positions, viewing direction and field of view are specified for this       view – usually stored in a camera data structure.
 
 
    - Then we need to add       some sort of perspective, some sort of depth. Without it, our 3D world       would not be called a 3D world right? So we need to transform again. This       time to another coordinate space known as the projection space.       This is done by dividing each view space X and Y coordinate by a view       space Z (depth) coordinate. Many other particulars regarding the       limitation of the 2D screen resolution must be dealt with here.
 
 
    - The final step in the       transformation pipeline is to find a pixel coordinate on the screen for       the current projected vertex. This process is known as screen space       mapping. There are some simple concepts regarding screen resolution to       grasp here.
 
 
    - Once all the vertices       of a polygon are in screen space, we can then draw lines in between each       point to render a simple primitive wire-frame object on screen.
 
 
    - When I began to understand      how polygons are rendered, I realized why high-polygon models drawn in 3D      modeling packages cause significant reduction of application performance.      It is due to the high number of calculations the CPU/GPU has to perform in      order to get those vertices on the screen. Hence the need for different      techniques to reduce the number of calculations. One of such optimizing      technique is called back face culling. It assumes that the player      will never be allowed see the back of a polygon and thus not having to      render it. 3D rendering engines usually perform a fast and cheap test to      see if a polygon I facing the player. In order to do the aforementioned      test, I have to know how to determine which way a polygon is facing. The      answer lies in the order we specify the vertices of the polygon.      This is known as the polygon winding order. 
  
   I was somewhat shocked to realize how much math one needs to know to comprehend how a basic 3D polygon is rendered on the screen. Furthermore because of the huge amount of calculation the CPU/GPU needs to perform, no 3D API use trigonometric calculations we just discussed to their transformation pipeline; instead another mathematical construct, matrices (plural for matrix) are used. 
   Phew, that is a lot to learn. I wonder where John Carmack learnt all these from before he programmed Doom. In addition to all these 3D concepts (in its minute detail), I had to refresh a lot of math. I will continue with the math in the next post.