I have read that GLSL (specifically v1.0.17: my application is running under WebGL) compilers will optimize away redundant assignments such as:
gl_FragCoord = ProjectionMatrix * ModelViewMatrix * VertexPosition;
. . .
gl_FragCoord = ProjectionMatrix * ModelViewMatrix * VertexPosition;
Is the compiler also smart enough to perform the same optimization across function calls? For example:
void doSomething1(void) {
. . .
gl_FragCoord = ProjectionMatrix * ModelViewMatrix * VertexPosition;
}
void doSomething2(void) {
. . .
gl_FragCoord = ProjectionMatrix * ModelViewMatrix * VertexPosition;
}
void main(void) {
doSomething1();
doSomething2();
}
I downloaded the GPU ShaderAnalyzer from AMD and fed the following GLSL program into it:
uniform mat4 ModelViewMatrix;
attribute vec4 VertexPosition;
void doSomething1(void) {
gl_Position = ModelViewMatrix * VertexPosition;
}
void doSomething2(void) {
gl_Position = ModelViewMatrix * VertexPosition;
}
void main(void) {
doSomething1();
doSomething2();
}
This produced the following disassembly (or equivalent) on every card from Radeon HD 2400 to Radeon HD 6970:
; -------- Disassembly --------------------
00 CALL_FS NO_BARRIER
01 ALU: ADDR(32) CNT(16) KCACHE0(CB0:0-15)
0 x: MUL ____, R1.w, KC0[3].w
y: MUL ____, R1.w, KC0[3].z
z: MUL ____, R1.w, KC0[3].y
w: MUL ____, R1.w, KC0[3].x
1 x: MULADD R127.x, R1.z, KC0[2].w, PV0.x
y: MULADD R127.y, R1.z, KC0[2].z, PV0.y
z: MULADD R127.z, R1.z, KC0[2].y, PV0.z
w: MULADD R127.w, R1.z, KC0[2].x, PV0.w
2 x: MULADD R127.x, R1.y, KC0[1].w, PV1.x
y: MULADD R127.y, R1.y, KC0[1].z, PV1.y
z: MULADD R127.z, R1.y, KC0[1].y, PV1.z
w: MULADD R127.w, R1.y, KC0[1].x, PV1.w
3 x: MULADD R1.x, R1.x, KC0[0].x, PV2.w
y: MULADD R1.y, R1.x, KC0[0].y, PV2.z
z: MULADD R1.z, R1.x, KC0[0].z, PV2.y
w: MULADD R1.w, R1.x, KC0[0].w, PV2.x
02 EXP_DONE: POS0, R1
03 EXP_DONE: PARAM0, R0.____
04 ALU: ADDR(48) CNT(1)
4 x: NOP ____
05 NOP NO_BARRIER
END_OF_PROGRAM
Then I commented out the doSomething2()
function and its call in the main
method. The result was exactly the same: every shader in AMD's tool optimized out the redundant math. So the answer to this question is yes: in the general case, GLSL compilers will be smart enough to perform this optimization, with the caveat that @nicol-bolas pointed out in his comment: compiler optimizations are specific to each compiler, and there is no 100% guarantee that this will be true for all compilers. The safest bet is, of course, to perform such optimizations yourself whenever possible -- but it's nice to know that this is the case when for whatever reason you can't.
UPDATE: I compiled the same program, with and without commenting out the second function call, under Cg (one of NVIDIA's compilers), and in both cases it produced the following:
mul r0, v0.y, c1
mad r0, v0.x, c0, r0
mad r0, v0.z, c2, r0
mad oPos, v0.w, c3, r0
So yes, NVIDIA optimizes it too -- or at least, the Cg compiler does. I found claims that Cg-compiled code runs on Intel GPUs, but this is outside the realm of my expertise, so take that for what it is.
If anyone wants to add test cases to this, feel free, but at this point I feel the question has been answered suitably.