So I need to create a literal representation of a TM that finds the most occurring char on the tape and erases everything else. The TM has only one tape and inputs would look like this:
#a# => #a#
#aaabbaac# => #a#
The alphabet is {a,b,c,d}.
I need some hints with this cause I found myself stuck. My first idea was to delete chars in order (eg. for any 'a', try to delete another 'b' then 'c' then 'd' if possible) so at the end there will only remain the most occurring one but this seems way too complicated.
Any ideas?
As with any programming task, the idea is to break it into manageable pieces you know how to encode in the machine that will run your program. Once we have a plan, we can worry about how to simplify it and write down the answer.
Your idea is not a bad one: we can see what the first non-erased symbol is, then scan right and remove up to one instance of every other non-erased symbol until we get to a blank; then, reset and repeat until you end up not erasing any symbol in the last pass. Then, find the last remaining non-erased symbol, copy it to the front of the tape, and blank out the rest until you find blanks on the right. This is the design.
Now we can talk about implementation. We need an encoding of states and transitions that is going to have the effect of doing what is described above. The first thing we need to do is read the tape symbol to figure out what the first non-erased symbol is. We know the TM needs an initial state anyway, so that's what this can be. We will call the state q0
.
If we are in q0
and we see a blank, we can always take this to mean that the tape is empty and we can halt accept. Note that the problem description does not cover two edge cases: the empty string; strings with equal numbers of multiple symbols. If some symbols occur the same maximal number of times, do we show them all or do we show nothing? This derivation assumes we are OK with showing nothing.
If in q0
we see a
on the tape, we need to scan right and remove up to one b
, c
and d
. We might even call the state qbcd
to remind ourselves of what we're looking for. Upon reading a
, we need to erase it; we don't necessarily want to overwrite it with blank, so we can use a special symbol B
to indicate erasure. We get similar transitions to qacd
, qabd
and qabc
for other tape possibilities.
In state qxyz
, where x
, y
and z
stand in for some combination of three of a
, b
, c
and d
, we are looking to erase one of x
, y
or z
but not the remaining symbol. So, if we see the remaining symbol, we leave the tape alone and move right; if we see symbol x
, y
or z
then we transition to the state corresponding to the two symbols remaining to be removed. There will be six such states: qab
, qac
, qad
, qbc
, qbd
, qcd
.
In state qxy
, where x
and y
stand in for some combination of two of a
, b
, c
and d
, we are looking to erase one of x
or y
but not the other two symbols. So, if we see the other symbols, we leave the tape alone and move right; if we see symbol x
or y
then we transition to the state corresponding to the only symbol remaining to be removed. There will be four such states: qa
, qb
, qc
, qd
.
In state qx
, where x
stands in for one of a
, b
, c
and d
, we are looking to erase x
but none of the other three symbols. So, if we see other three symbols, we leave the tape alone and move right; if we see symbol x
then we transition to the state indicating one of each symbol has been removed. Call this state qR
.
If in states qxyz
, qxy
or qx
you find a real blank symbol - meaning input has been exhausted - that means that you didn't find some of the symbols you were looking to remove. This is fine; in this case, we can transition to the same state as mentioned in the preceding paragraph: the one indicating we have completed a pass and are ready to repeat the process.
In state qR
, you rewind the tape back to the beginning of the tape where the first real blank is, and then transition to another state we'll call qF
. The only job of qF
is to scan right and find the first symbol that isn't B
. Then, it behaves exactly like the initial state and repeats the process above. Note that all states should ignore B
s when reading the tape, and pass over them. We can even just reuse q0
since the input tape won't have any B
s on it initially; it is safe to overload q0
with this extra functionality.
If you reach the right of the tape - the blank after all the original input - in one of the states qxyz
, that means you erased the fourth symbol w
but didn't find any of the other symbols to erase in the same pass. That means that w
was the symbol with the most instances originally. When we detect this special condition, we can transition to new states qa'
, qb'
, qc'
and qd'
which return to the beginning of the tape, then transition to states qa''
, qb''
, qc''
and qd''
to write their inputs down, then transition to qE
to finally erase the rest of the tape, halt-accepting when the blank originally at the end of the input is reached.
What does this look like in a TM?
state tape | new state new tape head direction
-------------------------------------------------------------
// find and erase the first symbol /////////////////////////
-------------------------------------------------------------
q0 # | halt_accept # same
q0 B | q0 B right
q0 a | qbcd B right
q0 b | qacd B right
q0 c | qabd B right
q0 d | qabc B right
-------------------------------------------------------------
// look for any of three targets ////////////////////////////
-------------------------------------------------------------
qbcd # | qa' # left
qbcd B,a | qbcd B,a right
qbcd b | qcd B right
qbcd c | qbd B right
qbcd d | qbc B right
-------------------------------------------------------------
qacd # | qb' # left
qacd B,b | qacd B,b right
qacd a | qcd B right
qacd c | qad B right
qacd d | qac B right
-------------------------------------------------------------
qabd # | qc' # left
qabd B,c | qabd B,c right
qabd a | qbd B right
qabd b | qad B right
qabd d | qab B right
-------------------------------------------------------------
qabc # | qd' # left
qabc B,d | qabc B,d right
qabc a | qbc B right
qabc b | qac B right
qabc c | qab B right
-------------------------------------------------------------
// look for any of two targets //////////////////////////////
-------------------------------------------------------------
qab # | qR # left
qab B,c,d | qab B,c,d right
qab a | qb B right
qab b | qa B right
-------------------------------------------------------------
qac # | qR # left
qac B,b,d | qac B,b,d right
qac a | qc B right
qac c | qb B right
-------------------------------------------------------------
qad # | qR # left
qad B,b,c | qad B,b,c right
qad a | qd B right
qad d | qa B right
-------------------------------------------------------------
qbc # | qR # left
qbc B,a,d | qbc B,a,d, right
qbc b | qc B right
qbc c | qb B right
-------------------------------------------------------------
qbd # | qR # left
qbd B,a,c | qbd B,a,c right
qbd b | qd B right
qbd d | qb B right
-------------------------------------------------------------
qcd # | qR # left
qcd B,a,b | qcd B,a,b right
qcd c | qd B right
qcd d | qc B right
-------------------------------------------------------------
// look for single target ///////////////////////////////////
-------------------------------------------------------------
qa #,a | qR #,B left
qa B,b,c,d | qa B,b,c,d right
-------------------------------------------------------------
qb #,b | qR #,B left
qb B,a,c,d | qb B,a,c,d right
-------------------------------------------------------------
qc #,c | qR #,B left
qc B,a,b,d | qc B,a,b,d right
-------------------------------------------------------------
qd #,d | qR #,B left
qd B,a,b,c | qd B,a,b,c right
-------------------------------------------------------------
// scan back to the beginning of the tape ///////////////////
-------------------------------------------------------------
qR # | q0 # right
qR B,a,b,c,d| qR B,a,b,c,d left
-------------------------------------------------------------
qa' # | qa'' # right
qa' B,a,b,c,d| qa' B,a,b,c,d left
-------------------------------------------------------------
qb' # | qb'' # right
qb' B,a,b,c,d| qb' B,a,b,c,d left
-------------------------------------------------------------
qc' # | qc'' # right
qc' B,a,b,c,d| qc' B,a,b,c,d left
-------------------------------------------------------------
qd' # | qd'' # right
qd' B,a,b,c,d| qd' B,a,b,c,d left
-------------------------------------------------------------
// write the output if we found one /////////////////////////
-------------------------------------------------------------
qa'' # | halt_accept # same
qa'' B,a,b,c,d| qE a right
-------------------------------------------------------------
qb'' # | halt_accept # same
qb'' B,a,b,c,d| qE b right
-------------------------------------------------------------
qc'' # | halt_accept # same
qc'' B,a,b,c,d| qE c right
-------------------------------------------------------------
qd'' # | halt_accept # same
qd'' B,a,b,c,d| qE d right
-------------------------------------------------------------
// erase the rest of the input tape /////////////////////////
-------------------------------------------------------------
qE # | halt_accept # same
qE B,a,b,c,d| qE # right
If you'd rather leave the tape head at the front of the tape, you can write special tape symbols like A
, B
, C
and D
in the double-prime states, scan to the end, and erase backwards until you find #
or one of the above symbols. That means a couple extra states but is conceptually straightforward.
This TM has 24 states, 96 transitions (if fully expanded) and erases at least one symbol in every pass; therefore, its runtime is quadratic in the worst case: input size 4n, n of each symbol, algorithm performs n passes and on the order of n steps in each pass, plus some o(n^2) stuff at the end to erase the tape.
Perhaps this is what you had in mind and thought was too complicated. I grant that it took a long time to write down. However, this is conceptually very simple and likely optimal for a single tape TM.