algorithmoeis

First appearance in Stern's Diatomic Sequence


You get an integer n and you need to find the index of its first appearance in Stern's Diatomic Sequence.

The sequence is defined like this:

a[0]     = 0
a[1]     = 1
a[2*i]   = a[i]
a[2*i+1] = a[i] + a[i+1]

See MathWorld.

Because n can be up to 400000, it's not a good idea to brute-force it, especially since the time limit is 4000 ms.

The sequence is pretty odd: first occurrence of 8 is 21, but first occurrence of 6 is 33.

Any ideas how to solve this?

Maybe this might help: OEIS


Solution

  • We can easily solve for the first occurrence of a number in the range of 400000 in under four seconds:

    Prelude Diatomic> firstDiatomic 400000
    363490989
    (0.03 secs, 26265328 bytes)
    Prelude Diatomic> map firstDiatomic [400000 .. 400100]
    [363490989,323659475,580472163,362981813,349334091,355685483,346478235,355707595
    ,291165867,346344083,347155797,316314293,576398643,315265835,313171245,355183267
    ,315444051,315970205,575509833,311741035,340569429,313223987,565355925,296441165
    ,361911645,312104147,557145429,317106853,323637939,324425077,610613547,311579309
    ,316037811,311744107,342436533,348992869,313382235,325406123,355818699,312128723
    ,347230875,324752171,313178421,312841811,313215645,321754459,576114987,325793195
    ,313148763,558545581,355294101,359224397,345462093,307583675,355677549,312120731
    ,341404245,316298389,581506779,345401947,312109779,316315061,315987123,313447771
    ,361540179,313878107,304788843,325765547,316036275,313731751,355635795,312035947
    ,346756533,313873883,349358379,357393763,559244877,313317739,325364139,312128107
    ,580201947,358182323,314944173,357403987,584291115,312158827,347448723,363246413
    ,315935571,349386085,315929427,312137323,357247725,313207657,320121429,356954923
    ,557139285,296392013,576042123,311726765,296408397]
    (2.45 secs, 3201358192 bytes)
    

    The key to it is the Calkin-Wilf tree.

    Starting from the fraction 1/1, it is built by the rule that for a node with the fraction a/b, its left child carries the fraction a/(a+b), and its right child the fraction (a+b)/b.

                            1/1
                           /   \
                          /     \
                         /       \
                      1/2         2/1
                     /   \       /   \
                   1/3   3/2   2/3   3/1
    

    etc. The diatomic sequence (starting at index 1) is the sequence of numerators of the fractions in the Calkin-Wilf tree, when that is traversed level by level, each level from left to right.

    If we look at the tree of indices

                             1
                            / \
                           /   \
                          /     \
                         2       3
                        / \     / \
                       4   5   6   7
                      / \ 
                     8   9 ...
    

    we can easily verify that the node at index k in the Calkin-Wilf tree carries the fraction a[k]/a[k+1] by induction.

    That is obviously true for k = 1 (a[1] = a[2] = 1), and from then on,

    All positive reduced fractions occur exactly once in the Calkin-Wilf tree (left as an exercise for the reader), hence all positive integers occur in the diatomic sequence.

    We can find the node in the Calkin-Wilf tree from the index by following the binary representation of the index, from the most significant bit to the least, for a 1-bit we go to the right child and for a 0-bit to the left. (For that, it is nice to augment the Calkin-Wilf tree with a node 0/1 whose right child is the 1/1 node, so that we need have a step for the most significant set bit of the index.)

    Now, that doesn't yet help very much to solve the problem at hand.

    But, let us first solve a related problem: For a reduced fraction p/q, determine its index.

    Suppose that p > q. Then we know that p/q is a right child, and its parent is (p-q)/q. If also p-q > q, we have again a right child, whose parent is (p - 2*q)/q. Continuing, if

    p = a*q + b, 1 <= b < q
    

    then we reach the p/q node from the b/q node by going to the right child a times.

    Now we need to find a node whose numerator is smaller than its denominator. That is of course the left child of its parent. The parent of b/q is b/(q-b) then. If

    q = c*b + d, 1 <= d < b
    

    we have to go to the left child c times from the node b/d to reach b/q.

    And so on.

    We can find the way from the root (1/1) to the p/q node using the continued fraction (I consider only simple continued fractions here) expansion of p/q. Let p > q and

    p/q = [a_0, a_1, ..., a_r,1]
    

    the continued fraction expansion of p/q ending in 1.

    For p < q, we must end going to the left, hence start going to the left for even r and start going to the right for odd r.

    We have thus found a close connection between the binary representation of the index and the continued fraction expansion of the fraction carried by the node via the path from the root to the node.

    Let the run-length-encoding of the index k be

    [c_1, c_2, ..., c_j]           (all c_i > 0)
    

    i.e. the binary representation of k starts with c_1 ones, followed by c_2 zeros, then c_3 ones etc., and ending with c_j

    Then [c_j, c_(j-1), ..., c_2, c_1] is the continued fraction expansion of a[k]/a[k+1] whose length has the same parity as k (every rational has exactly two continued fraction expansions, one with odd length, the other with even length).

    The RLE gives the path from the 0/1 node above 1/1 to a[k]/a[k+1]. The length of the path is

    1. the number of bits necessary to represent k, and
    2. the sum of the partial quotients in the continued fraction expansion.

    Now, to find the index of the first occurrence of n > 0 in the diatomic sequence, we first observe that the smallest index must necessarily be odd, since a[k] = a[k/2] for even k. Let the smallest index be k = 2*j+1. Then

    1. the length of the RLE of k is odd,
    2. the fraction at the node with index k is a[2*j+1]/a[2*j+2] = (a[j] + a[j+1])/a[j+1], hence it is a right child.

    So the smallest index k with a[k] = n corresponds to the left-most ending of all the shortest paths to a node with numerator n.

    The shortest paths correspond to the continued fraction expansions of n/m, where 0 < m <= n is coprime to n [the fraction must be reduced] with the smallest sum of the partial quotients.

    What kind of length do we need to expect? Given a continued fraction p/q = [a_0, a_1, ..., a_r] with a_0 > 0 and sum

    s = a_0 + ... + a_r
    

    the numerator p is bounded by F(s+1) and the denominator q by F(s), where F(j) is the j-th Fibonacci number. The bounds are sharp, for a_0 = a_1 = ... = a_r = 1 the fraction is F(s+1)/F(s).

    So if F(t) < n <= F(t+1), the sum of the partial quotients of the continued fraction expansion (either of the two) is >= t. Often there is an m such that the sum of the partial quotients of the continued fraction expansion of n/m is exactly t, but not always:

    F(5) = 5 < 6 <= F(6) = 8
    

    and the continued fraction expansions of the two reduced fractions 6/m with 0 < m <= 6 are

    6/1 = [6]          (alternatively [5,1])
    6/5 = [1,4,1]      (alternatively [1,5])
    

    with sum of the partial quotients 6. However, the smallest possible sum of partial quotients is never much larger (the largest I'm aware of is t+2).

    The continued fraction expansions of n/m and n/(n-m) are closely related. Let's assume that m < n/2, and let

    n/m = [a_0, a_1, ..., a_r]
    

    Then a_0 >= 2,

    (n-m)/m = [a_0 - 1, a_1, ..., a_r]
    

    and since

    n/(n-m) = 1 + m/(n-m) = 1 + 1/((n-m)/m)
    

    the continued fraction expansion of n/(n-m) is

    n/(n-m) = [1, a_0 - 1, a_1, ..., a_r]
    

    In particular, the sum of the partial quotients is the same for both.

    Unfortunately, I'm not aware of a way to find the m with the smallest sum of partial quotients without brute force, so the algorithm is (I assume n > 2

    1. for 0 < m < n/2 coprime to n, find the continued fraction expansion of n/m, collecting the ones with the smallest sum of the partial quotients (the usual algorithm produces expansions whose last partial quotient is > 1, we assume that).

    2. Adjust the found continued fraction expansions [those are not large in number] it the following way:

      • if the CF [a_0, a_1, ..., a_r] has even length, convert it to [a_0, a_1, ..., a_(r-1), a_r - 1, 1]
      • otherwise, use [1, a_0 - 1, a_1, ..., a_(r-1), a_r - 1, 1]

      (that chooses the one between n/m and n/(n-m) leading to the smaller index)

    3. reverse the continued fractions to obtain the run-length-encodings of the corresponding indices

    4. choose the smallest among them.

    In step 1, it is useful to use the smallest sum found so far to short-cut.

    Code (Haskell, since that's easiest):

    module Diatomic (diatomic, firstDiatomic, fuscs) where
    
    import Data.List
    
    strip :: Int -> Int -> Int
    strip p = go
      where
        go n = case n `quotRem` p of
                 (q,r) | r == 0    -> go q
                       | otherwise -> n
    
    primeFactors :: Int -> [Int]
    primeFactors n
        | n < 1             = error "primeFactors: non-positive argument"
        | n == 1            = []
        | n `rem` 2 == 0    = 2 : go (strip 2 (n `quot` 2)) 3
        | otherwise         = go n 3
          where
            go 1 _ = []
            go m p
                | m < p*p   = [m]
                | r == 0    = p : go (strip p q) (p+2)
                | otherwise = go m (p+2)
                  where
                    (q,r) = m `quotRem` p
    
    contFracLim :: Int -> Int -> Int -> Maybe [Int]
    contFracLim = go []
      where
        go acc lim n d = case n `quotRem` d of
                           (a,b) | lim < a -> Nothing
                                 | b == 0  -> Just (a:acc)
                                 | otherwise -> go (a:acc) (lim - a) d b
    
    fixUpCF :: [Int] -> [Int]
    fixUpCF [a]
        | a < 3     = [a]
        | otherwise = [1,a-2,1]
    fixUpCF xs
        | even (length xs) = case xs of
                               (1:_) -> fixEnd xs
                               (a:bs) -> 1 : (a-1) : bs
        | otherwise        = case xs of
                               (1:_) -> xs
                               (a:bs) -> 1 : fixEnd ((a-1):bs)
    
    fixEnd :: [Int] -> [Int]
    fixEnd [a,1] = [a+1]
    fixEnd [a] = [a-1,1]
    fixEnd (a:bs) = a : fixEnd bs
    fixEnd _ = error "Shouldn't have called fixEnd with an empty list"
    
    cfCompare :: [Int] -> [Int] -> Ordering
    cfCompare (a:bs) (c:ds) = case compare a c of
                                EQ -> cfCompare ds bs
                                cp -> cp
    
    fibs :: [Integer]
    fibs = 0 : 1 : zipWith (+) fibs (tail fibs)
    
    toNumber :: [Int] -> Integer
    toNumber = foldl' ((+) . (*2)) 0 . concat . (flip (zipWith replicate) $ cycle [1,0])
    
    fuscs :: Integer -> (Integer, Integer)
    fuscs 0 = (0,1)
    fuscs 1 = (1,1)
    fuscs n = case n `quotRem` 2 of
                (q,r) -> let (a,b) = fuscs q
                         in if r == 0
                              then (a,a+b)
                              else (a+b,b)
    diatomic :: Integer -> Integer
    diatomic = fst . fuscs
    
    firstDiatomic :: Int -> Integer
    firstDiatomic n
        | n < 0     = error "Diatomic sequence has no negative terms"
        | n < 2     = fromIntegral n
        | n == 2    = 3
        | otherwise = toNumber $ bestCF n
    
    bestCF :: Int -> [Int]
    bestCF n = check [] estimate start
      where
        pfs = primeFactors n
        (step,ops) = case pfs of
                       (2:xs) -> (2,xs)
                       _      -> (1,pfs)
        start0 = (n-1) `quot` 2
        start | even n && even start0 = start0 - 1
              | otherwise             = start0
        eligible k = all ((/= 0) . (k `rem`)) ops
        estimate = length (takeWhile (<= fromIntegral n) fibs) + 2
        check candidates lim k
            | k < 1 || n `quot` k >= lim = if null candidates
                                             then check [] (2*lim) start
                                             else minimumBy cfCompare candidates
            | eligible k = case contFracLim lim n k of
                             Nothing -> check candidates lim (k-step)
                             Just cf -> let s = sum cf
                                        in if s < lim
                                             then check [fixUpCF cf] s (k - step)
                                             else check (fixUpCF cf : candidates) lim (k-step)
            | otherwise = check candidates lim (k-step)