This simply baffles me. I've just downloaded a 1.5GB tarball of the Chrome source code. The same code compiled compresses to about 50MB.
Why is there such a discrepancy between the size of the source code, and the size of the executable?
A list of things that could cause this.
The executable has no need of whitespace, comments or any of the nice formatting stuff. The source code might have TONS of documentation and whitespace just to make the code readable and all this takes up space.
The source code might bring along with it a LOT of other code to test the application. But this test code doesn't ever make it to the final application.
Documentation that is included with the code. Depending on the format, .doc or .docx files, the documentation might be huge.
Someone else mentioned that source control comments might be in the code as well. Icluding commit messages in source code can make the files large as well.
I don't know how/when you did the file comparison but if you did it AFTER compile time then you might have included the compile artifacts ( the *.o files ) in your calculation as well. So you might be perceiving the source code to be 1.5GB when it's really only 750 MB ( roughly speaking ).
Depending on the compiler and how good it is, it might generate less assembly code and thus create smaller file. Although I think most compilers today are reasonable and this shouldn't account for too much size variance. ( but i could be wrong, i'm not a compiler person )
If the application is being statically compiled with all the libraries it would be bigger because now it has to contain it's dependencies within it. However, if the libraries are dynamically linked/loaded the executable itself might be drastically smaller since it will just link to the libraries during runtime and only load them as needed.
Was the tarball 1.5GB or was the expanded tarball 1.5GB?
Anyway, lots of factors could be at play here.