algorithmlanguage-agnosticcountclarion

How to count each digit in a range of integers?


Imagine you sell those metallic digits used to number houses, locker doors, hotel rooms, etc. You need to find how many of each digit to ship when your customer needs to number doors/houses:

The obvious solution is to do a loop from the first to the last number, convert the counter to a string with or without zeros to the left, extract each digit and use it as an index to increment an array of 10 integers.

I wonder if there is a better way to solve this, without having to loop through the entire integers range.

Solutions in any language or pseudocode are welcome.


Edit:

Answers review
John at CashCommons and Wayne Conrad comment that my current approach is good and fast enough. Let me use a silly analogy: If you were given the task of counting the squares in a chess board in less than 1 minute, you could finish the task by counting the squares one by one, but a better solution is to count the sides and do a multiplication, because you later may be asked to count the tiles in a building.
Alex Reisner points to a very interesting mathematical law that, unfortunately, doesn’t seem to be relevant to this problem.
Andres suggests the same algorithm I’m using, but extracting digits with %10 operations instead of substrings.
John at CashCommons and phord propose pre-calculating the digits required and storing them in a lookup table or, for raw speed, an array. This could be a good solution if we had an absolute, unmovable, set in stone, maximum integer value. I’ve never seen one of those.
High-Performance Mark and strainer computed the needed digits for various ranges. The result for one millon seems to indicate there is a proportion, but the results for other number show different proportions.
strainer found some formulas that may be used to count digit for number which are a power of ten. Robert Harvey had a very interesting experience posting the question at MathOverflow. One of the math guys wrote a solution using mathematical notation.
Aaronaught developed and tested a solution using mathematics. After posting it he reviewed the formulas originated from Math Overflow and found a flaw in it (point to Stackoverflow :).
noahlavine developed an algorithm and presented it in pseudocode.

A new solution
After reading all the answers, and doing some experiments, I found that for a range of integer from 1 to 10n-1:

The first formula was found by strainer (and probably by others), and I found the other two by trial and error (but they may be included in other answers).

For example, if n = 6, range is 1 to 999,999:

These numbers can be checked using High-Performance Mark results.

Using these formulas, I improved the original algorithm. It still loops from the first to the last number in the range of integers, but, if it finds a number which is a power of ten, it uses the formulas to add to the digits count the quantity for a full range of 1 to 9 or 1 to 99 or 1 to 999 etc. Here's the algorithm in pseudocode:

integer First,Last //First and last number in the range
integer Number     //Current number in the loop
integer Power      //Power is the n in 10^n in the formulas
integer Nines      //Nines is the resut of 10^n - 1, 10^5 - 1 = 99999
integer Prefix     //First digits in a number. For 14,200, prefix is 142
array 0..9  Digits //Will hold the count for all the digits

FOR Number = First TO Last
  CALL TallyDigitsForOneNumber WITH Number,1  //Tally the count of each digit 
                                              //in the number, increment by 1
  //Start of optimization. Comments are for Number = 1,000 and Last = 8,000.
  Power = Zeros at the end of number //For 1,000, Power = 3
  IF Power > 0                       //The number ends in 0 00 000 etc 
    Nines = 10^Power-1                 //Nines = 10^3 - 1 = 1000 - 1 = 999
    IF Number+Nines <= Last            //If 1,000+999 < 8,000, add a full set
      Digits[0-9] += Power*10^(Power-1)  //Add 3*10^(3-1) = 300 to digits 0 to 9
      Digits[0]   -= -Power              //Adjust digit 0 (leading zeros formula)
      Prefix = First digits of Number    //For 1000, prefix is 1
      CALL TallyDigitsForOneNumber WITH Prefix,Nines //Tally the count of each 
                                                     //digit in prefix,
                                                     //increment by 999
      Number += Nines                    //Increment the loop counter 999 cycles
    ENDIF
  ENDIF 
  //End of optimization
ENDFOR  

SUBROUTINE TallyDigitsForOneNumber PARAMS Number,Count
  REPEAT
    Digits [ Number % 10 ] += Count
    Number = Number / 10
  UNTIL Number = 0

For example, for range 786 to 3,021, the counter will be incremented:

Total: 28 cycles Without optimization: 2,235 cycles

Note that this algorithm solves the problem without leading zeros. To use it with leading zeros, I used a hack:

If range 700 to 1,000 with leading zeros is needed, use the algorithm for 10,700 to 11,000 and then substract 1,000 - 700 = 300 from the count of digit 1.

Benchmark and Source code

I tested the original approach, the same approach using %10 and the new solution for some large ranges, with these results:

Original             104.78 seconds
With %10              83.66
With Powers of Ten     0.07

A screenshot of the benchmark application:
alt text
(source: clarion.sca.mx)

If you would like to see the full source code or run the benchmark, use these links:

Accepted answer

noahlavine solution may be correct, but l just couldn’t follow the pseudo code, I think there are some details missing or not completely explained.

Aaronaught solution seems to be correct, but the code is just too complex for my taste.

I accepted strainer’s answer, because his line of thought guided me to develop this new solution.


Solution

  • To reel of the digits from a number, we'd only ever need to do a costly string conversion if we couldnt do a mod, digits can most quickly be pushed of a number like this:

    feed=number;
    do
    { digit=feed%10;
      feed/=10; 
      //use digit... eg. digitTally[digit]++;
      }
    while(feed>0)
    

    that loop should be very fast and can just be placed inside a loop of the start to end numbers for the simplest way to tally the digits.

    To go faster, for larger range of numbers, im looking for an optimised method of tallying all digits from 0 to number*10^significance (from a start to end bazzogles me)

    here is a table showing digit tallies of some single significant digits.. these are inclusive of 0, but not the top value itself, -that was an oversight but its maybe a bit easier to see patterns (having the top values digits absent here) These tallies dont include trailing zeros,

      1 10 100 1000 10000 2 20 30 40 60 90 200 600 2000  6000
    
    0 1 1  10  190  2890  1  2  3  4  6  9  30 110  490  1690
    1 0 1  20  300  4000  1 12 13 14 16 19 140 220 1600  2800
    2 0 1  20  300  4000  0  2 13 14 16 19  40 220  600  2800
    3 0 1  20  300  4000  0  2  3 14 16 19  40 220  600  2800
    4 0 1  20  300  4000  0  2  3  4 16 19  40 220  600  2800
    5 0 1  20  300  4000  0  2  3  4 16 19  40 220  600  2800
    6 0 1  20  300  4000  0  2  3  4  6 19  40 120  600  1800
    7 0 1  20  300  4000  0  2  3  4  6 19  40 120  600  1800
    8 0 1  20  300  4000  0  2  3  4  6 19  40 120  600  1800
    9 0 1  20  300  4000  0  2  3  4  6  9  40 120  600  1800
    

    edit: clearing up my origonal thoughts:

    from the brute force table showing tallies from 0 (included) to poweroTen(notinc) it is visible that a majordigit of tenpower:

    increments tally[0 to 9] by md*tp*10^(tp-1)
    increments tally[1 to md-1] by 10^tp
    decrements tally[0] by (10^tp - 10) 
    (to remove leading 0s if tp>leadingzeros)
    can increment tally[moresignificantdigits] by self(md*10^tp) 
    (to complete an effect)
    

    if these tally adjustments were applied for each significant digit, the tally should be modified as though counted from 0 to end-1

    the adjustments can be inverted to remove preceeding range (start number)

    Thanks Aaronaught for your complete and tested answer.