phpalgorithmword-wraptext-processing

Balanced word wrap (Minimum raggedness) in PHP


I'm going to make a word wrap algorithm in PHP. I want to split small chunks of text (short phrases) in n lines of maximum m characters (n is not given, so there will be as much lines as needed). The peculiarity is that lines length (in characters) has to be much balanced as possible across lines.

Example of input text:

How to do things

Wrong output (this is the normal word-wrap behavior), m=6:

How to
do
things

Desired output, always m=6:

How 
to do 
things

Does anyone have suggestions or guidelines on how to implement this function? Basically, I'm searching something for pretty print short phrases on two or three (as much as possible) equal length lines.


Update: It seems I'm searching exactly for a Minimum raggedness word wrap algorithm. But I can't find any implementation in a real programming language (anyone, then I can convert it in PHP).


Update 2: I started a bounty for this. Is it possible that do not exist any public implementation of Minimum raggedness algorithm in any procedural language? I need something written in a way that can be translated into procedural instructions. All I can find now is just a bounch of (generic) equation that however need a optimal searching procedure. I will be grateful also for an implementation that can only approximate that optimal searching algorithm.


Solution

  • Quick and dirty, in c++

    #include <sstream>
    #include <iostream>
    #include <vector>
    #include <cstdlib>
    #include <memory.h>
    
    using namespace std;
    
    int cac[1000][1000];
    string res[1000][1000];
    vector<string> words;
    int M;
    
    int go(int a, int b){
        if(cac[a][b]>= 0) return cac[a][b];
        if(a == b) return 0;
    
        int csum = -1;
        for(int i=a; i<b; ++i){
        csum += words[i].size() + 1;
        }
        if(csum <= M || a == b-1){
        string sep = "";
            for(int i=a; i<b; ++i){
                res[a][b].append(sep);
                res[a][b].append(words[i]);
                sep = " ";
        }
        return cac[a][b] = (M-csum)*(M-csum);
        }
    
        int ret = 1000000000;
        int best_sp = -1;
        for(int sp=a+1; sp<b; ++sp){
        int cur = go(a, sp) + go(sp,b);
        if(cur <= ret){
            ret = cur;
            best_sp = sp;
        }
        }
        res[a][b] = res[a][best_sp] + "\n" + res[best_sp][b];
        return cac[a][b] = ret;
    }
    
    
    int main(int argc, char ** argv){
        memset(cac, -1, sizeof(cac));
        M = atoi(argv[1]);
        string word;
        while(cin >> word) words.push_back(word);
        go(0, words.size());
        cout << res[0][words.size()] << endl;
    }
    

    Test:

    $ echo "The quick brown fox jumps over a lazy dog" |./a.out 10
    The quick
    brown fox
    jumps over
    a lazy dog
    

    EDIT: just looked at the wikipedia page for minimum raggedness word wrap. Changed algorithm to the given one (with squared penalties)