statisticschi-squared

Trying to rule out astrology but something is wrong


I am trying to rule out a possible astrology effect on populations as a statistically insignificant effect but to no avail. I am using Pearson's Chi Square test on two distributions of sun signs from two different populations one of astronaut pilots and the other one of celebrities. Something must be wrong but I failed to find it, probably on the statistics side.

import numpy as np
import pandas as pd
import ephem
from collections import Counter, namedtuple
import matplotlib.pyplot as plt
from scipy import stats
models = pd.read_csv('models.csv', delimiter=',')
astronauts = pd.read_csv('astronauts.csv', delimiter=',')
models = models.sample(229)
astronauts = astronauts.sample(229)

sun = ephem.Sun()


def get_planet_constellation(planet, dataset):
    person_planet_constellation = []
    for person in dataset['Birth Date']:
        planet.compute(person)
        person_planet_constellation += [ephem.constellation(planet)[1]]
    return person_planet_constellation


def plot_bar_group(planet, data1, data2):
    fig, ax = plt.subplots()
    plt.bar(data1.keys(), data1.values(), alpha=0.5)
    plt.bar(data2.keys(), data2.values(), alpha=0.5)
    plt.legend(['astronauts', 'models'])
    ylabel = 'Percentages of ' + planet.name + ' in constellation'
    ax.set_ylabel(ylabel)
    title = 'Histogram of ' + planet.name + ' in constellation by group'
    ax.set_title(title)
    plt.show()


astronaut_sun_constellation = Counter(
    get_planet_constellation(sun, astronauts))
model_sun_constellation = Counter(get_planet_constellation(sun, models))


plot_bar_group(sun, astronaut_sun_constellation, model_sun_constellation)

a = list(astronaut_sun_constellation.values())
b = list(model_sun_constellation.values())
s = np.array([a, b])

stat, p, dof, expected = stats.chi2_contingency(s)
print(stat, p, dof, expected)

prob = 0.95
critical = stats.chi2.ppf(prob, dof)
if abs(stat) >= critical:
    print('Dependent (reject H0)')
else:
    print('Independent (fail to reject H0)')

# interpret p-value
alpha = 1.0 - prob
if p <= alpha:
    print('Dependent (reject H0)')
else:
    print('Independent (fail to reject H0)')

https://www.dropbox.com/s/w7rye6m5lbihjlh/astronauts.csv https://www.dropbox.com/s/xlxanr0pxqtxcvv/models.csv


Solution

  • I have eventually found the bug, it was on passing the counter as a list to the chisquare function, it must be sorted first, otherwise chisquare sees a major difference in the counters values. All astrology effects now are insignificant as expected at the level of 0.95