I am back with a more mathematical problem in R, but I am kind of suspicious of my code so that is why I am posting here this question. I am trying to apply some formulas for creating some indexes that I am going to be using for statistical analysis. They are trying to measure positions of different observations and to then apply it to a more distance cluster index which try to measure how clustered this observations are for each group (basically polarization within a group). I apply it to two position frames, sort of. I say this so you get an idea. The formulas come from previous research so I am doubtful that I will be changing them (at least for now).
First, the position formula:
Then, the polarization one:
The output of creating the new columns using these formulas is just crazy. They yield for the position sometimes inf
and -inf
which just destroys completely the measurement, and not letting me use it for analysis, which then affects the polarization index which becomes a NaN
. I do not why this is the case (maybe I have made a terrible mistake but I do not know where exactly), the only reasonable thing I could think of is that maybe the denominator can be 0
, and dividing by 0
could create such problems (since it is undefined mathemathically). I leave below a MWE of data so you can see what is happening as well as the basic code I am using for creating the variables.
The MWE of the data (dput(head(data,10))
). Just as a clarification, the error happens for both, even if the head of the data only appear for one of the position frames.
structure(list(countryname = c("Sweden", "Sweden", "Sweden",
"Sweden", "Sweden", "Sweden", "Sweden", "Sweden", "Sweden", "Sweden"
), partyname = c("Green Ecology Party", "Left Party", "Social Democratic Labour Party",
"Liberal People’s Party", "Christian Democratic Community Party",
"Moderate Coalition Party", "Centre Party", "New Democracy",
"Green Ecology Party", "Left Party"), partyabbrev = c("MP", "V",
"SAP", "FP", "KdS", "MSP", "CP", "NyD", "MP", "V"), edate = structure(c(7927,
7927, 7927, 7927, 7927, 7927, 7927, 7927, 9026, 9026), class = "Date"),
date = c(199109L, 199109L, 199109L, 199109L, 199109L, 199109L,
199109L, 199109L, 199409L, 199409L), pervote = c(3.383, 4.513,
37.705, 9.128, 7.135, 21.924, 8.503, 6.732, 5.023, 6.174),
per101_bal = c(0.0131533759186757, 0, 0, 0, 0, 0.00552656143923965,
0, 0.0610648953779551, 0, 0), per102_bal = c(0, 0, 0, 0,
0, 0, 0, 0, 0, 0), per103_bal = c(0, 0.0241945356407545,
0, 0, 0, 0, 0, 0, 0, 0), per104_bal = c(0, 0, 0, 0, 0, 0.0331487610319077,
0, 0.0152688765332541, 0, 0), per105_bal = c(0.0526363990200792,
0.0161259807467587, 0, 0, 0, 0, 0, 0, 0.027027027027027,
0), per106_bal = c(0.0131533759186757, 0.00806855489399588,
0, 0, 0.0238122675606307, 0, 0.0083311833505332, 0.00381987182207886,
0, 0), per107_bal = c(0.0526363990200792, 0.112904123309777,
0.0784338716050012, 0.0688056308360199, 0.0952385941019329,
0.0110531228784793, 0.0166623667010664, 0.00762913288909639,
0, 0.0440225813593093), per108_bal = c(0, 0.0161259807467587,
0.0392169358025006, 0.0963304519824807, 0.0238122675606307,
0.0828772063815343, 0.0083311833505332, 0.0228980094223505,
0, 0), per109_bal = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), per110_bal = c(0.10526135036747,
0.0322630905347504, 0, 0, 0, 0, 0, 0, 0.0675675675675676,
0.056608368386097), per201_bal = c(0.0921079744487946, 0,
0, 0.174306741847233, 0.0952385941019329, 0.0828772063815343,
0.0250042999656003, 0.0954225202665422, 0, 0.00628735886650432
), per202_bal = c(0.10526135036747, 0.0806410327750264, 0.0392169358025006,
0.0825744634393825, 0.0476140589806715, 0.0110531228784793,
0.0416666666666667, 0.0458066295997623, 0.0135135135135135,
0.0754704449856099), per203_bal = c(0, 0, 0, 0, 0, 0, 0,
0.0114490047111752, 0.0135135135135135, 0), per204_bal = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0), per301_bal = c(0, 0, 0, 0, 0.0238122675606307,
0, 0.0333354833161335, 0.0114490047111752, 0, 0), per302_bal = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0), per303_bal = c(0, 0, 0, 0, 0.0238122675606307,
0.0331487610319077, 0, 0.114500657866814, 0, 0), per304_bal = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0), per305_bal = c(0, 0, 0.0392169358025006,
0, 0, 0, 0.0083311833505332, 0, 0, 0), per401_bal = c(0,
0, 0, 0.0688056308360199, 0, 0.0828772063815343, 0.0083311833505332,
0.0343470141335257, 0, 0), per402_bal = c(0, 0, 0.0196020953084972,
0.0412808096895591, 0, 0.099446283095723, 0.0666709666322669,
0.0572556343109376, 0, 0), per403_bal = c(0.0131533759186757,
0.0403205163875132, 0, 0.00458532951436608, 0, 0, 0, 0.019088748355333,
0.108108108108108, 0.100630949745406), per404_bal = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0.0125747177330086), per405_bal = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0), per406_bal = c(0, 0, 0, 0, 0,
0, 0, 0, 0, 0), per407_bal = c(0, 0, 0, 0, 0, 0, 0, 0, 0,
0), per408_bal = c(0, 0, 0.00980104765424861, 0.0137559885430982,
0, 0, 0.0083311833505332, 0, 0.0135135135135135, 0), per409_bal = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0.0125747177330086), per410_bal = c(0.0131533759186757,
0.0403205163875132, 0.0392169358025006, 0.0183541621177287,
0, 0, 0.0499978500171999, 0, 0.0135135135135135, 0.0943435908789019
), per411_bal = c(0, 0.00806855489399588, 0.0490179834567492,
0.00917065902873216, 0, 0, 0.0833333333333333, 0.0534357624888587,
0, 0.00628735886650432), per412_bal = c(0, 0, 0, 0, 0, 0,
0, 0, 0, 0), per413_bal = c(0, 0.00806855489399588, 0, 0,
0, 0, 0, 0, 0, 0), per414_bal = c(0.0131533759186757, 0,
0.0686328239507526, 0.00458532951436608, 0.0952385941019329,
0.160217243720299, 0.0250042999656003, 0.206103306311277,
0.135135135135135, 0.0314478636263006), per415_bal = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0), per416_bal = c(0, 0, 0, 0, 0,
0, 0, 0, 0, 0), per501_bal = c(0.434221672733933, 0.177419175338045,
0.147053950370248, 0.0550496422929216, 0.119050861662564,
0.0386753224711473, 0.241668816649467, 0.0381668859556046,
0.364864864864865, 0.144653531104716), per502_bal = c(0.0263181995100396,
0.0161259807467587, 0, 0.00458532951436608, 0.0476140589806715,
0.0165796843177189, 0.0166623667010664, 0, 0, 0.0251605047597963
), per503_bal = c(0.0526363990200792, 0.0887095876690223,
0.196084679012503, 0.100915781496847, 0.0714263265413022,
0.027622199592668, 0.125, 0.0267178812444294, 0.175675675675676,
0.232709763117113), per504_bal = c(0, 0.104835568415781,
0.0588190311109978, 0.0321101506608269, 0.142852653082604,
0.0220956381534284, 0.0666709666322669, 0.00381987182207886,
0.0540540540540541, 0.0628957272526013), per505_bal = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0), per506_bal = c(0, 0.0241945356407545,
0.00980104765424861, 0.0321101506608269, 0, 0.0883931602172437,
0.0250042999656003, 0.0267178812444294, 0, 0.0125747177330086
), per507_bal = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), per601_bal = c(0,
0, 0.0490179834567492, 0.00458532951436608, 0, 0.027622199592668,
0.0166623667010664, 0.00762913288909639, 0, 0), per602_bal = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0), per603_bal = c(0, 0, 0, 0, 0.0714263265413022,
0.0220956381534284, 0.0083311833505332, 0.064884767200034,
0, 0), per604_bal = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), per605_bal = c(0,
0, 0, 0, 0, 0.0883931602172437, 0.0666709666322669, 0.0687046390221128,
0, 0), per606_bal = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), per607_bal = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0), per608_bal = c(0, 0, 0, 0, 0,
0, 0, 0, 0, 0), per701_bal = c(0.0131533759186757, 0.0564464971342719,
0.0686328239507526, 0.0229394916320947, 0, 0, 0.0083311833505332,
0, 0.0135135135135135, 0.0314478636263006), per702_bal = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0), per703_bal = c(0, 0.00806855489399588,
0, 0, 0, 0.0331487610319077, 0.0083311833505332, 0, 0, 0),
per704_bal = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), per705_bal = c(0,
0.0241945356407545, 0.029415888148252, 0.0183541621177287,
0.0952385941019329, 0.0110531228784793, 0.0083311833505332,
0, 0, 0), per706_bal = c(0, 0.112904123309777, 0.0588190311109978,
0.146794764761036, 0.0238122675606307, 0.0220956381534284,
0.0250042999656003, 0.00381987182207886, 0, 0.0503099402258136
), per103_1_bal = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), per103_2_bal = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), per201_1_bal = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), per201_2_bal = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), per202_1_bal = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), per202_2_bal = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), per202_3_bal = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), per202_4_bal = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), per305_1_bal = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), per305_2_bal = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), per305_3_bal = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), per305_4_bal = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), per305_5_bal = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), per305_6_bal = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), per416_1_bal = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), per416_2_bal = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), per601_1_bal = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), per601_2_bal = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), per602_1_bal = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), per602_2_bal = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), per605_1_bal = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), per605_2_bal = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), per606_1_bal = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), per606_2_bal = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), per607_1_bal = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), per607_2_bal = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), per607_3_bal = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), per608_1_bal = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), per608_2_bal = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), per608_3_bal = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), per703_1_bal = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), per703_2_bal = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
)), row.names = c(NA, 10L), class = "data.frame")
The code:
mod_data <- data %>%
mutate(right = rowSums(across(c(per102_bal,per104_bal,per109_bal,per110_bal,per204_bal,per302_bal,per407_bal,per414_bal,per601_bal,per603_bal,per608_bal,per702_bal,per505_bal,per507_bal))),
left = rowSums(across(c(per101_bal,per105_bal,per107_bal,per108_bal,per203_bal,per301_bal,per406_bal,per409_bal,per602_bal,per604_bal,per607_bal,per701_bal,per504_bal,per506_bal))),
gal = rowSums(across(c(per501_bal,per602_bal,per604_bal,per502_bal,per607_bal,per416_bal,per705_bal,per706_bal,per201_bal,per202_bal))),
tan = rowSums(across(c(per305_bal,per601_bal,per605_bal,per608_bal,per606_bal)))
) %>%
# This is the first formula, which I think might be the problem
mutate(lr = log(right/left),
galtan = log(tan/gal)
) %>%
# The second formula
group_by(countryname,edate) %>%
mutate(pol_lr = sqrt(sum(pervote*(lr-mean(lr))/5)^2),
pol_galtan = sqrt(sum(pervote*(galtan-mean(galtan))/5)^2),
)
Can anyone think of a solution to this problem? Is there an innate problem in the first formula that I am using which makes it impossible to measure? Am I maybe even writing the code with mistakes that may produce this problem? I know these are weird questions, but I definitely need help with this one, I cannot crack it.
Stopping the pipeline after the first step (the first mutate
) and looking at the results:
mod_data |> select(right, left, tan, gal) |> summary()
right left tan gal
Min. :0.009171 Min. :0.1081 Min. :0.000000 Min. :0.1823
1st Qu.:0.059512 1st Qu.:0.1484 1st Qu.:0.000000 1st Qu.:0.2813
Median :0.118033 Median :0.1867 Median :0.002293 Median :0.3684
Mean :0.132189 Mean :0.2040 Mean :0.037683 Mean :0.3658
3rd Qu.:0.193693 3rd Qu.:0.2543 3rd Qu.:0.085260 3rd Qu.:0.4242
Max. :0.293886 Max. :0.3306 Max. :0.116015 Max. :0.6579
we see that tan
contains zero values (in fact at least the lowest 25% of the values are zero, since the first quartile as well as the minimum is zero). Thus in your next expression tan/gal
will be zero, so log(tan/gal)
will be -Inf
. As the commenter above says, it's a subject-area rather than a technical question to decide whether you can work around this problem in a sensible way ...