I am analyzing repeatability between a variety of cognitive tests (and repetitions of those tests). I try to determine the individual repeatability of birds using the rptR package in R. However, regardless of my model or what I'm testing it always results in a warning and R = 0. I am trying to understand what causes this.
I currently have a dataframe which includes: an ID (repeated twice for each individual). Each ID repetition is accompanied by a score for the test in question. These scores are first log-transformed to attain normality and then the Z-scores of these scores are computed so that I can make comparisons between tests measuring the same trait on different scales. However, regardless of how I set up my model, with my data it always results in a repeatability of R=0. While this is technically possible, I find it unlikely for it to be so low for all parameters (as I make comparisons both between different tests as well as the same test measured twice). Moreover, I get a warning with every model I run stating: 'Boundary (singular) fit: see ?isSingular'. From what I've gathered this means that the variance in my data might be too small, though I am not entirely certain about this. And I am worried that this might be causing my R = 0.
A snippet of my dataframe looks as follows:
RNR_ID RoundNR TTC TTC_Z Test_date
2 1 1 28 0.0966013973 43423
114 1 2 14 -0.8138678026 43543
5 2 1 48 0.8045891472 43425
122 2 2 31 0.2302959586 43549
An example of two variations of my models: Unadjusted R:
Rep1_Assoc_A <- rpt(TTC_Z ~ RoundNR + (1|RNR_ID), grname = "RNR_ID", data = rpt_Assoc_A_df, datatype = "Gaussian", nboot = 10, npermut = 10)
Adjusted R (In which I control for test date in the hope of accounting for learning of individuals between repetitions of the same test):
Rep2_Assoc_A <- rpt(TTC_Z ~ RoundNR + Test_date + (1|RNR_ID), grname = "RNR_ID", data = rpt_Assoc_A_df, datatype = "Gaussian", nboot = 10, npermut = 10)
Note: RNR_ID, RoundNR & TTC_Z are numerical variables. Test_date is given as Date format, though I am not sure how the model handles this. In this model the RoundNR indicates the "treatment" (as this indicates whether a test was the first or the second time an individual was scored). The TTC_Z indicates the Z-score of an individual.
And the resulting output respectively:
Repeatability estimation using the lmm method
Repeatability for RNR_ID
R = 0
SE = 0.107
CI = [0, 0.283]
P = 1 [LRT]
1 [Permutation]
Repeatability estimation using the lmm method
Repeatability for RNR_ID
R = 0
SE = 0.12
CI = [0, 0.337]
P = 1 [LRT]
1 [Permutation]
As stated before, while running this code the console throws several: boundary (singular) fit: see ?isSingular
messages at me.
I have also tried a fake dataset in which I adjusted all values of repetitions to be nearly identical, which indeed results in a high R (around 0.9..).
While this seems to suggest my R=0 might actually be correct, I am still skeptical due to this being not only unexpected, (As I would expect at least a very low but measurable R). But due to my lack of comprehension behind the model I fear something else might be going wrong as well.
To summarize, my questions are:
Q1: Are the current formulas for my models correct? And are the variables in the right data types?
Q2: What does the boundary (singular) fit: see ?isSingular
mean in this situation, and can I "fix" it?
Q3: What could be causing my R=0? Am I analyzing my data wrongfully or is my R just really 0?
While not a complete answer quite yet, I at least have partial answers to my questions after talking to some colleagues.
Q1: Yes and no, the way I set up my formula's is completely fine. However, I included some factors which were (in my case) unnecessary. I initially added RoundNR as a factor to try and correct for learning. However, this doesn't make any sense as I only have two rounds, and thus I believe all my variation would be attributed to this factor. Taking the Z-scores was enough. As for test_date, that might be interesting if this wasn't heavily confounded with my tests themselves. Though more generally speaking (for other people): The broad lines of the model were fine. Just be careful which fixed effects to include.
Q2: I still am not entirely clear on it's meaning, so if someone else can offer a clearer explanation then that would be appreciated. However, as I understand, this is simply a consequence of my data and not any issue with for example my model.
Q3: The quite obvious answer: My own data. A colleague ran a quick analysis via another method (Which produces less accurate but quicker repeatability assessments) and also found an R=0 (or at least very very close to it).
Not a complete answer but I hope it helps others in the future.