javaalgorithmpagerank

Page Rank algorithm computing wrong page ranks


I am trying to implement a page rank algorithm.

I have total of 5 web pages (see image below). Following image represents a graph and shows which web page contains links to which page.

enter image description here

I have stored this linkage of web pages in a HashMap in such a way that each web page's unique link is stored as key and a HashSet containing all the links to the web pages that a given web page points to is stored as value of that key. (see image below)

enter image description here

Each web page is represented by its unique link. Above mentioned HashMap is represented in code as

HashMap<URI, HashSet<URI>> graph = new HashMap<>();

I have selected the decay value equal to 0.85 and epsilon equal to 0.00001

Problem

After generating the above mentioned Hashmap, i am computing the page rank of each web page.

Final converged page ranks should be

enter image description here

but my actual page ranks are

Page A = 0.3170604814815385
Page B = 0.18719407056490575
Page C = 0.13199010955519944
Page D = 0.31131469834360015
Page E = 0.05244064005475638

Actual values for each page are OK as the difference between actual and expected value for each page is less than the selected epsilon value except for the Page D.

I have tried different inputs in to this page rank algorithm, no matter what i try, i always have one or two web pages for which page rank values are not correct. Algorithm is returning before all the pages have converged page ranks, i.e difference between old ranks and new ranks for each page less than epsilon value.

Question

What am i doing wrong? Why is my page rank algorithm returning page ranks before all of them have converged?

Code

Following function makes generates the HashMap shown in the images above.

private static HashMap<URI, HashSet<URI>> makeGraph(HashSet<WebPage> webpages) {
        HashMap<URI, HashSet<URI>> webPagesGraph = new HashMap<>();
        HashSet<URI> singleWebPageLinks;

        HashSet<URI> availableWebPages = new HashSet<>();

        // add all the web pages available in data set in a collection
        for (WebPage doc : webpages) {
            availableWebPages.add(doc.getUri());
        }

        for (WebPage doc : webpages) {
            singleWebPageLinks = new HashSet<>();
            for (URI link : doc.getLinks()) {
                // if link is not pointing to the web page itself and is available in data set
                if (!link.equals(doc.getUri()) && availableWebPages.contains(link)) {
                    singleWebPageLinks.add(link);
                }
            }

            webPagesGraph.put(doc.getUri(), singleWebPageLinks);
        }

        return webPagesGraph;
}

Following function computes Page Ranks

private static HashMap<URI, Double> makePageRanks(HashMap<URI, HashSet<URI>> graph,
                                                   double decay,
                                                   int limit,
                                                   double epsilon) {

        // Step 1: The initialize step should go here
        HashMap<URI, Double> oldPageRanks = new HashMap<>();
        HashMap<URI, Double> newPageRanks = new HashMap<>();

        double singleWebPageNewRank;
        int numLinkedPagesBySinglePage;
        double singleWebPageOldRank;
        boolean haveConverged = true;
        double rank;

        // provide ranks to each web page
        // initially the rank given to each page is 1/(total no. of web pages).
        // also give new page rank to each page equal to zero
        for (URI key : graph.keySet()) {
            oldPageRanks.put(key, (double) 1 / graph.size());
            newPageRanks.put(key, 0.0);
        }

        for (int i = 0; i < limit; i++) {
            // Step 2: The update step should go here

            for (URI uri : graph.keySet()) {

                singleWebPageOldRank = oldPageRanks.get(uri);

                numLinkedPagesBySinglePage = graph.get(uri).size();

                // if any web page doesn't have any outgoing links to any other
                // web page, increase the new page rank for every web page
                if (numLinkedPagesBySinglePage == 0) {
                    for (URI u : newPageRanks.keySet()) {
                        singleWebPageNewRank = decay * (singleWebPageOldRank / graph.size());
                        saveNewRank(newPageRanks, u, singleWebPageNewRank);
                    }
                } // increase the new page rank of every web page that is pointed to
                // by current web page
                else {
                    for (URI linkedWebPageURI : graph.get(uri)) {
                        singleWebPageNewRank = decay * (singleWebPageOldRank / numLinkedPagesBySinglePage);
                        saveNewRank(newPageRanks, linkedWebPageURI, singleWebPageNewRank);
                    }
                }
            }

            // account for random user/surfer by adding (1 - decay) / (total no. of web pages)
            // to each web page's new rank
            for (URI uri : newPageRanks.keySet()) {
                rank = newPageRanks.get(uri);
                rank = rank + ((1 - decay) / graph.size());
                newPageRanks.put(uri, rank);

                // check for convergence
                // check if difference b/w old rand and new rank for each web page
                // is less than epsilon or not
                // if difference between old and new ranks is greater than or
                // equal to epsilon even for one web page, ranks haven't converged
                if (oldPageRanks.get(uri) - newPageRanks.get(uri) >= epsilon) {
                    haveConverged = false;
                }
            }

            if (haveConverged) {
                return oldPageRanks;
            } else {
                haveConverged = true;
                overWriteOldRanksWithNewRanks(oldPageRanks, newPageRanks);
            }
        }

        return oldPageRanks;
    }

Following two functions are utility functions that are called from within makePageRanks function

// save the new page rank for a given web page by adding the passed new page rank to
// its previously saved page rank and then saving the new rank
private static void saveNewRank(HashMap<URI, Double> newPageRanks, URI pageURI, double pageNewRank) {
      pageNewRank += newPageRanks.get(pageURI);
      newPageRanks.put(pageURI, pageNewRank);
}

// overwrite old page ranks for next iteration
private static void overWriteOldRanksWithNewRanks(HashMap<URI, Double> oldRanks, HashMap<URI, Double> newRanks) {
    for (URI key : newRanks.keySet()) {
        oldRanks.put(key, newRanks.get(key));
        // make new rank for each web page equal to zero before next iteration
        newRanks.put(key, 0.0);
    }
}

Following is the simple WebPage class

public class WebPage {

    private ArrayList<String> words;
    private URI uri;
    private ArrayList<URI> links;

    WebPage(URI uri, ArrayList<String> words, ArrayList<URI> links) {
        this.words = words;
        this.uri = uri;
        this.links = links;
    }

    public ArrayList<String> getWords() {
        return words;
    }

    public URI getUri() {
        return uri;
    }

    public ArrayList<URI> getLinks() {
        return links;
    } 
}

finally the main method for anyone wanting to see what input i am giving to page rank algorithm

public static void main(String[] args) {
        ArrayList<URI> pageALinks = new ArrayList<>();
        pageALinks.add(createURI("www.b.com"));
        pageALinks.add(createURI("www.d.com"));
        URI pageAURI = createURI("www.a.com");
        WebPage pageA = new WebPage(pageAURI, new ArrayList<>(), pageALinks);


        ArrayList<URI> pageBLinks = new ArrayList<>();
        pageBLinks.add(createURI("www.c.com"));
        pageBLinks.add(createURI("www.d.com"));
        URI pageBURI = createURI("www.b.com");
        WebPage pageB = new WebPage(pageBURI, new ArrayList<>(), pageBLinks);


        ArrayList<URI> pageCLinks = new ArrayList<>();
        URI pageCURI = createURI("www.c.com");
        WebPage pageC = new WebPage(pageCURI, new ArrayList<>(), pageCLinks);


        ArrayList<URI> pageDLinks = new ArrayList<>();
        pageDLinks.add(createURI("www.a.com"));
        URI pageDURI = createURI("www.d.com");
        WebPage pageD = new WebPage(pageDURI, new ArrayList<>(), pageDLinks);


        ArrayList<URI> pageELinks = new ArrayList<>();
        pageELinks.add(createURI("www.d.com"));
        URI pageEURI = createURI("www.e.com");
        WebPage pageE = new WebPage(pageEURI, new ArrayList<>(), pageELinks);


        HashSet<WebPage> pages = new HashSet<>();
        pages.add(pageA);
        pages.add(pageB);
        pages.add(pageC);
        pages.add(pageD);
        pages.add(pageE);


        HashMap<URI, HashSet<URI>> graph = makeGraph(pages);
        HashMap<URI, Double> map = makePageRanks(graph, 0.85, 100, 0.00001); 
}

Solution

  • Summary: You're testing for the wrong value. You have to reduce your code's epsilon value to make the page rank get within 0.00001 of the desired value. Two successive guesses within 0.00001 do not imply that result.

    In addition to the problems I've noted in comments, I believe I see your problem. It's a conceptual problem in convergence. It appears that the requirement of the unit test is to converge to within epsilon of a predetermined value. You have not written an algorithm for that. Your test

    if (oldPageRanks.get(uri) - newPageRanks.get(uri) >= epsilon)
    

    Checks whether two successive approximations are within that value. This does not guarantee that the new page rank is within epsilon of the ultimate value. The calculus / topological definition of "close" neighborhood reads something like the following, for a guess x and a reference (correct) point z.

    abs(x - z) < delta  ==>  abs(f(x) - f(z)) < epsilon
    

    You may have confused delta and epsilon.

    If the gradient of your approximation function is outside the range [-1, +1], then you will likely trip over this mistake. You need to find the delta value for which this holds, and then use that quantity in place of your current epsilon. This is a simple change in the epsilon value that you feed to your function.