turing-machines

Turing machine to find most occurring char on tape


So I need to create a literal representation of a TM that finds the most occurring char on the tape and erases everything else. The TM has only one tape and inputs would look like this:

#a# => #a# #aaabbaac# => #a#

The alphabet is {a,b,c,d}.

I need some hints with this cause I found myself stuck. My first idea was to delete chars in order (eg. for any 'a', try to delete another 'b' then 'c' then 'd' if possible) so at the end there will only remain the most occurring one but this seems way too complicated.

Any ideas?


Solution

  • As with any programming task, the idea is to break it into manageable pieces you know how to encode in the machine that will run your program. Once we have a plan, we can worry about how to simplify it and write down the answer.

    Your idea is not a bad one: we can see what the first non-erased symbol is, then scan right and remove up to one instance of every other non-erased symbol until we get to a blank; then, reset and repeat until you end up not erasing any symbol in the last pass. Then, find the last remaining non-erased symbol, copy it to the front of the tape, and blank out the rest until you find blanks on the right. This is the design.

    Now we can talk about implementation. We need an encoding of states and transitions that is going to have the effect of doing what is described above. The first thing we need to do is read the tape symbol to figure out what the first non-erased symbol is. We know the TM needs an initial state anyway, so that's what this can be. We will call the state q0.

    If we are in q0 and we see a blank, we can always take this to mean that the tape is empty and we can halt accept. Note that the problem description does not cover two edge cases: the empty string; strings with equal numbers of multiple symbols. If some symbols occur the same maximal number of times, do we show them all or do we show nothing? This derivation assumes we are OK with showing nothing.

    If in q0 we see a on the tape, we need to scan right and remove up to one b, c and d. We might even call the state qbcd to remind ourselves of what we're looking for. Upon reading a, we need to erase it; we don't necessarily want to overwrite it with blank, so we can use a special symbol B to indicate erasure. We get similar transitions to qacd, qabd and qabc for other tape possibilities.

    In state qxyz, where x, y and z stand in for some combination of three of a, b, c and d, we are looking to erase one of x, y or z but not the remaining symbol. So, if we see the remaining symbol, we leave the tape alone and move right; if we see symbol x, y or z then we transition to the state corresponding to the two symbols remaining to be removed. There will be six such states: qab, qac, qad, qbc, qbd, qcd.

    In state qxy, where x and y stand in for some combination of two of a, b, c and d, we are looking to erase one of x or y but not the other two symbols. So, if we see the other symbols, we leave the tape alone and move right; if we see symbol x or y then we transition to the state corresponding to the only symbol remaining to be removed. There will be four such states: qa, qb, qc, qd.

    In state qx, where x stands in for one of a, b, c and d, we are looking to erase x but none of the other three symbols. So, if we see other three symbols, we leave the tape alone and move right; if we see symbol x then we transition to the state indicating one of each symbol has been removed. Call this state qR.

    If in states qxyz, qxy or qx you find a real blank symbol - meaning input has been exhausted - that means that you didn't find some of the symbols you were looking to remove. This is fine; in this case, we can transition to the same state as mentioned in the preceding paragraph: the one indicating we have completed a pass and are ready to repeat the process.

    In state qR, you rewind the tape back to the beginning of the tape where the first real blank is, and then transition to another state we'll call qF. The only job of qF is to scan right and find the first symbol that isn't B. Then, it behaves exactly like the initial state and repeats the process above. Note that all states should ignore Bs when reading the tape, and pass over them. We can even just reuse q0 since the input tape won't have any Bs on it initially; it is safe to overload q0 with this extra functionality.

    If you reach the right of the tape - the blank after all the original input - in one of the states qxyz, that means you erased the fourth symbol w but didn't find any of the other symbols to erase in the same pass. That means that w was the symbol with the most instances originally. When we detect this special condition, we can transition to new states qa', qb', qc' and qd' which return to the beginning of the tape, then transition to states qa'', qb'', qc'' and qd'' to write their inputs down, then transition to qE to finally erase the rest of the tape, halt-accepting when the blank originally at the end of the input is reached.

    What does this look like in a TM?

    state    tape     |   new state    new tape    head direction
    -------------------------------------------------------------
    // find and erase the first symbol  /////////////////////////
    -------------------------------------------------------------
    q0       #        |   halt_accept  #           same
    q0       B        |   q0           B           right
    q0       a        |   qbcd         B           right
    q0       b        |   qacd         B           right
    q0       c        |   qabd         B           right
    q0       d        |   qabc         B           right
    -------------------------------------------------------------
    // look for any of three targets ////////////////////////////
    -------------------------------------------------------------
    qbcd     #        |   qa'          #           left
    qbcd     B,a      |   qbcd         B,a         right
    qbcd     b        |   qcd          B           right
    qbcd     c        |   qbd          B           right
    qbcd     d        |   qbc          B           right
    -------------------------------------------------------------
    qacd     #        |   qb'          #           left
    qacd     B,b      |   qacd         B,b         right
    qacd     a        |   qcd          B           right
    qacd     c        |   qad          B           right
    qacd     d        |   qac          B           right
    -------------------------------------------------------------
    qabd     #        |   qc'          #           left
    qabd     B,c      |   qabd         B,c         right
    qabd     a        |   qbd          B           right
    qabd     b        |   qad          B           right
    qabd     d        |   qab          B           right
    -------------------------------------------------------------
    qabc     #        |   qd'          #           left
    qabc     B,d      |   qabc         B,d         right
    qabc     a        |   qbc          B           right
    qabc     b        |   qac          B           right
    qabc     c        |   qab          B           right
    -------------------------------------------------------------
    // look for any of two targets //////////////////////////////
    -------------------------------------------------------------
    qab      #        |   qR           #           left
    qab      B,c,d    |   qab          B,c,d       right
    qab      a        |   qb           B           right
    qab      b        |   qa           B           right
    -------------------------------------------------------------
    qac      #        |   qR           #           left
    qac      B,b,d    |   qac          B,b,d       right
    qac      a        |   qc           B           right
    qac      c        |   qb           B           right
    -------------------------------------------------------------
    qad      #        |   qR           #           left
    qad      B,b,c    |   qad          B,b,c       right
    qad      a        |   qd           B           right
    qad      d        |   qa           B           right
    -------------------------------------------------------------
    qbc      #        |   qR           #           left
    qbc      B,a,d    |   qbc          B,a,d,      right
    qbc      b        |   qc           B           right
    qbc      c        |   qb           B           right
    -------------------------------------------------------------
    qbd      #        |   qR           #           left
    qbd      B,a,c    |   qbd          B,a,c       right
    qbd      b        |   qd           B           right
    qbd      d        |   qb           B           right
    -------------------------------------------------------------
    qcd      #        |   qR           #           left
    qcd      B,a,b    |   qcd          B,a,b       right
    qcd      c        |   qd           B           right
    qcd      d        |   qc           B           right
    -------------------------------------------------------------
    // look for single target ///////////////////////////////////
    -------------------------------------------------------------
    qa       #,a      |   qR           #,B         left
    qa       B,b,c,d  |   qa           B,b,c,d     right
    -------------------------------------------------------------
    qb       #,b      |   qR           #,B         left
    qb       B,a,c,d  |   qb           B,a,c,d     right
    -------------------------------------------------------------
    qc       #,c      |   qR           #,B         left
    qc       B,a,b,d  |   qc           B,a,b,d     right
    -------------------------------------------------------------
    qd       #,d      |   qR           #,B         left
    qd       B,a,b,c  |   qd           B,a,b,c     right
    -------------------------------------------------------------
    // scan back to the beginning of the tape ///////////////////
    -------------------------------------------------------------
    qR       #        |   q0           #           right
    qR       B,a,b,c,d|   qR           B,a,b,c,d   left
    -------------------------------------------------------------
    qa'      #        |   qa''         #           right
    qa'      B,a,b,c,d|   qa'          B,a,b,c,d   left
    -------------------------------------------------------------
    qb'      #        |   qb''         #           right
    qb'      B,a,b,c,d|   qb'          B,a,b,c,d   left
    -------------------------------------------------------------
    qc'      #        |   qc''         #           right
    qc'      B,a,b,c,d|   qc'          B,a,b,c,d   left
    -------------------------------------------------------------
    qd'      #        |   qd''         #           right
    qd'      B,a,b,c,d|   qd'          B,a,b,c,d   left
    -------------------------------------------------------------
    // write the output if we found one /////////////////////////
    -------------------------------------------------------------
    qa''     #        |   halt_accept  #           same
    qa''     B,a,b,c,d|   qE           a           right
    -------------------------------------------------------------
    qb''     #        |   halt_accept  #           same
    qb''     B,a,b,c,d|   qE           b           right
    -------------------------------------------------------------
    qc''     #        |   halt_accept  #           same
    qc''     B,a,b,c,d|   qE           c           right
    -------------------------------------------------------------
    qd''     #        |   halt_accept  #           same
    qd''     B,a,b,c,d|   qE           d           right
    -------------------------------------------------------------
    // erase the rest of the input tape /////////////////////////
    -------------------------------------------------------------
    qE       #        |   halt_accept  #           same
    qE       B,a,b,c,d|   qE           #           right
    

    If you'd rather leave the tape head at the front of the tape, you can write special tape symbols like A, B, C and D in the double-prime states, scan to the end, and erase backwards until you find # or one of the above symbols. That means a couple extra states but is conceptually straightforward.

    This TM has 24 states, 96 transitions (if fully expanded) and erases at least one symbol in every pass; therefore, its runtime is quadratic in the worst case: input size 4n, n of each symbol, algorithm performs n passes and on the order of n steps in each pass, plus some o(n^2) stuff at the end to erase the tape.

    Perhaps this is what you had in mind and thought was too complicated. I grant that it took a long time to write down. However, this is conceptually very simple and likely optimal for a single tape TM.