graph-theoryrelationshipgenealogyfamily-tree

Calculate Family Relationship from Genealogical Data


I would like to be able to calculate the family relationship between two individuals in a family tree, given the following data schema (simplified from my actual data schema, only showing columns that directly apply to this problem):

individual
----------
id
gender

child
----------
child_id
father_id
mother_id

With this structure, how can one calculate the relationship between two individual id's (i.e. cousin, great grand uncle, etc.).

Also, as there are actually two relationships (i.e. A-B may be nephew whereas B-A is uncle), how can one generate the complement to the other (given uncle, and assuming we know gender, how might we generate nephew?). This is more of a trivial question, the former is what I'm really interested in.

Thanks all!


Solution

  • Below is my PHP implementation of my algorithm to calculate relationship. This is based upon the data schema I outlined in the original question. This only finds the "closest" i.e. shortest-path relationship between the two individuals, it does not resolve compound relationships such as half-siblings or double cousins.

    Note that data access functions such as get_father and get_gender are written in the style of a database abstraction layer I always use. It should be fairly straightforward to understand what is going on, basically all dbms-specific functions such as mysql_query are replaced with generalized functions such as db_query; it is not very complicated at all, especially in the examples in this code, but feel free to post questions in comments if it isn't clear.

    <?php
    /* Calculate relationship "a is the ___ of b" */
    
    define("GENDER_MALE", 1);
    define("GENDER_FEMALE", 2);
    
    function calculate_relationship($a_id, $b_id)
    {
      if ($a_id == $b_id) {
        return 'self';
      }
    
      $lca = lowest_common_ancestor($a_id, $b_id);
      if (!$lca) {
        return false;
      }
      $a_dist = $lca[1];
      $b_dist = $lca[2];
    
      $a_gen = get_gender($a_id);
    
      // DIRECT DESCENDANT - PARENT
      if ($a_dist == 0) {
        $rel = $a_gen == GENDER_MALE ? 'father' : 'mother';
        return aggrandize_relationship($rel, $b_dist);
      }
      // DIRECT DESCENDANT - CHILD
      if ($b_dist == 0) {
        $rel = $a_gen == GENDER_MALE ? 'son' : 'daughter';
        return aggrandize_relationship($rel, $a_dist);
      }
    
      // EQUAL DISTANCE - SIBLINGS / PERFECT COUSINS
      if ($a_dist == $b_dist) {
        switch ($a_dist) {
          case 1:
            return $a_gen == GENDER_MALE ? 'brother' : 'sister';
            break;
          case 2:
            return 'cousin';
            break;
          default:
            return ordinal_suffix($a_dist - 2).' cousin';
        }
      }
    
      // AUNT / UNCLE
      if ($a_dist == 1) {
        $rel = $a_gen == GENDER_MALE ? 'uncle' : 'aunt';
        return aggrandize_relationship($rel, $b_dist, 1);
      }
      // NEPHEW / NIECE
      if ($b_dist == 1) {
        $rel = $a_gen == GENDER_MALE ? 'nephew' : 'niece';
        return aggrandize_relationship($rel, $a_dist, 1);
      }
    
      // COUSINS, GENERATIONALLY REMOVED
      $cous_ord = min($a_dist, $b_dist) - 1;
      $cous_gen = abs($a_dist - $b_dist);
      return ordinal_suffix($cous_ord).' cousin '.format_plural($cous_gen, 'time', 'times').' removed';
    } //END function calculate_relationship
    
    function aggrandize_relationship($rel, $dist, $offset = 0) {
      $dist -= $offset;
      switch ($dist) {
        case 1:
          return $rel;
          break;
        case 2:
          return 'grand'.$rel;
          break;
        case 3:
          return 'great grand'.$rel;
          break;
        default:
          return ordinal_suffix($dist - 2).' great grand'.$rel;
      }
    } //END function aggrandize_relationship
    
    function lowest_common_ancestor($a_id, $b_id)
    {
      $common_ancestors = common_ancestors($a_id, $b_id);
    
      $least_distance = -1;
      $ld_index = -1;
    
      foreach ($common_ancestors as $i => $c_anc) {
        $distance = $c_anc[1] + $c_anc[2];
        if ($least_distance < 0 || $least_distance > $distance) {
          $least_distance = $distance;
          $ld_index = $i;
        }
      }
    
      return $ld_index >= 0 ? $common_ancestors[$ld_index] : false;
    } //END function lowest_common_ancestor
    
    function common_ancestors($a_id, $b_id) {
      $common_ancestors = array();
    
      $a_ancestors = get_ancestors($a_id);
      $b_ancestors = get_ancestors($b_id);
    
      foreach ($a_ancestors as $a_anc) {
        foreach ($b_ancestors as $b_anc) {
          if ($a_anc[0] == $b_anc[0]) {
            $common_ancestors[] = array($a_anc[0], $a_anc[1], $b_anc[1]);
            break 1;
          }
        }
      }
    
      return $common_ancestors;
    } //END function common_ancestors
    
    function get_ancestors($id, $dist = 0)
    {
      $ancestors = array();
    
      // SELF
      $ancestors[] = array($id, $dist);
    
      // PARENTS
      $parents = get_parents($id);
      foreach ($parents as $par) {
        if ($par != 0) {
          $par_ancestors = get_ancestors($par, $dist + 1);
          foreach ($par_ancestors as $par_anc) {
            $ancestors[] = $par_anc;
          }
        }
      }
    
      return $ancestors;
    } //END function get_ancestors
    
    function get_parents($id)
    {
      return array(get_father($id), get_mother($id));
    } //END function get_parents
    
    function get_father($id)
    {
      $res = db_result(db_query("SELECT father_id FROM child WHERE child_id = %s", $id));
      return $res ? $res : 0;
    } //END function get_father
    
    function get_mother($id)
    {
      $res = db_result(db_query("SELECT mother_id FROM child WHERE child_id = %s", $id));
      return $res ? $res : 0;
    } //END function get_mother
    
    function get_gender($id)
    {
      return intval(db_result(db_query("SELECT gender FROM individual WHERE id = %s", $id)));
    }
    
    function ordinal_suffix($number, $super = false)
    {
      if ($number % 100 > 10 && $number %100 < 14) {
        $os = 'th';
      } else if ($number == 0) {
        $os = '';
      } else {
        $last = substr($number, -1, 1);
    
        switch($last) {
          case "1":
            $os = 'st';
            break;
          case "2":
            $os = 'nd';
            break;
          case "3":
            $os = 'rd';
            break;
          default:
            $os = 'th';
        }
      }
    
      $os = $super ? '<sup>'.$os.'</sup>' : $os;
    
      return $number.$os;
    } //END function ordinal_suffix
    
    function format_plural($count, $singular, $plural)
    {
      return $count.' '.($count == 1 || $count == -1 ? $singular : $plural);
    } //END function plural_format
    
    ?>
    

    As I had mentioned previously, the algorithm to determine LCA is far less than optimal. I plan to post a separate question to optimize that, and another to address the problem of calculating compound relationships such as double cousins.

    Many thanks to everyone who helped prod me in the right direction! With your tips, this turned out to be much easier than I originally thought.