I've been looking around a bit and can't seem to find just what I"m looking for. I've found "canonical formulas," but what's the best way to use these? Do I have to scale every single vertex down? Or is there a better way?
A formula would really help me out, but I'm also looking for an explanation about the near and far z planes relative the viewer's position
Here is a reasonable source that derives an orthogonal project matrix:
Consider a few points: First, in eye space, your camera is positioned at the origin and looking directly down the z-axis. And second, you usually want your field of view to extend equally far to the left as it does to the right, and equally far above the z-axis as below. If that is the case, the z-axis passes directly through the center of your view volume, and so you have r = –l and t = –b. In other words, you can forget about r, l, t, and b altogether, and simply define your view volume in terms of a width w, and a height h, along with your other clipping planes f and n. If you make those substitutions into the orthographic projection matrix above, you get this rather simplified version:
All of the above gives you a matrix that looks like this (add rotation and translation as appropriate if you'd like your resulting transformation matrix to treat an arbitrary camera position and orientation).
(source: codeguru.com)