Most of the implementations I find require a hardware instruction to do this. However I strongly doubt this is required (if it is, I can't figure out why...)
You don't need a test and set instruction to get mutual exclusion locking, if thats what you're asking. Dijkstra described the first mutual exclusion algorithm I am aware of, in 1965. The title of the paper was "Solution of a problem in concurrent programming control", search Google for a copy near you. The original algorithm required no special support from the hardware at all, but providing an atomic instruction in the CPU dramatically improves the performance.
Test-and-set, atomic swap, and load-linked + store-conditional are all common primitives for CPUs to provide. All can be used to implement mutual exclusion, which can then be used to implement whatever locking semantics you want.