Given an input image, predict an output image that has been altered by some matrix transformation.
The important part is given an input image the network has not seen before, be able to perform the same matrix transformation on that input image as if we performed the matrix transformation on that image.
I have tried experimenting with an autoencoder however I find it overfits quite significantly. The network essentially ends up learning mappings between pixels in the input and output rather than the transformation that turns the input to the output.
What's the best approach for this task identifying the matrix transformation?
This sounds like a task that is both eminently doable (you want to learn a linear warp from examples), and one for which a neural network is eminently redundant. Neural networks (particularly the deep variety) are useful for modeling transformations whose functional form is a-priori unknown, highly non-linear, very complex and changing significantly from one part of the input space to another. None of these conditions seem to apply to the problem you state.
The hard part of predicting a linear warp is not the warp itself - it's finding which output image points correspond to which input ones. When that is achieved, estimating the warp itself is a trivial application of linear least-squares.
The point correspondence problem can be as complex as you can make it - imagine matching an aerial photo of London today to one taken at ground level in 1940 during the Blitz - and it is very hard to express in functional or rule-based form.