AMD provides lots of ressources regarding what instructions can be run on their integrated GPUs: http://developer.amd.com/wordpress/media/2013/12/AMD_GCN3_Instruction_Set_Architecture_rev1.1.pdf
However they do not explain how to dispatch Kernels to the GPU in the first place. How does that work? Is it done by assembly instructions, or by some driver controlled through a library?
In short: What would an assembly-only Hello World for an AMD GPU look like?
In broad terms you'd allocate some memory on the GPU's RAM, load the program binary to it and then issue an execution command to the GPU. The details of those operations are subject of the GPU driver and are exposed toward user space through APIs like OpenCL or Vulkan Compute, where instead of the raw GPU specific instructions a machine independent representation (think like Java binaries) is compiled to GPU instructions in-situ by a Just In Time compiler.
The most elegant approach of course would be heterogenous, fat binaries, where the OS takes care of the details, and you'd issue a kernel execution by calling a trampoline function – this is in fact how CUDA does it. Unfortunately CUDA isn't open source.
However the drivers for AMD GPUs are open source, so if you want to get into the gory details of how this actually works down at the lowest levels, your best bet would probably reading the source code of the GCN drivers that are part of Mesa, specifically their implementation of Vulkan compute and OpenCL.
The shortest "Hello GPU" program would either be:
if you want to use the "regular" drivers, a simple OpenCL or Vulkan compute program.
if you want to do everything yourself: Implementing the whole of what Mesa does for basic infrastructure support of GCN3, plus a minimal frontend that uses that to load the binary and dispatch the execution.