rdataframemlogit

error with dfidx: the two indexes don't define unique observations


I have collected data from a survey in order to perform a choice based conjoint analysis. I have preprocessed and clean data with python in order to use them in R. However, when I apply the function dfidx on the dataset I get the following error: the two indexes don't define unique observations. I really do not understand why. Before creating the .csv file I checked if there were duplicates through the pandas function final_df.duplicated().sum() and its out put was 0 meaning that there were no duplicates. Can please some one help me to understand what I am doing wrong ?

Here is the code:

df <- read.csv('.../survey_results.csv')
df <-  df[,-c(1)]
df$Platform <- as.factor(df$Platform)
df$Deposit <-  as.factor(df$Deposit)
df$Fees <-  as.factor(df$Fees)
df$Financial_Instrument <-  as.factor(df$Financial_Instrument)
df$Leverage <-  as.factor(df$Leverage)
df$Social_Trading <-  as.factor(df$Social_Trading)
df.mlogit <- dfidx(df, idx = list(c("resp.id","ques"), "position"), shape='long')

Here is the link to the dataset that I am using https://github.com/AlbertoDeBenedittis/conjoint-survey-shiny/blob/main/survey_results.csv

Thank you in advance for you time


Solution

  • The function dfidx() is build for data frames "for which observations are defined by two (potentialy nested) indexes" (ref).

    I don't think this function is build for more than two idxs. Especially that, in your df, there aren't any duplicates ONLY when considering the combinations of the three columns you mention above (resp.id, ques and position).

    One solution to this problem is to "combine" the two columns resp.id and ques into one (called for example resp.id.ques) with paste(...).

    df$resp.id.ques <- paste(df$resp.id, df$ques, sep="_")
    

    Then you can write the following line which should work just fine:

    df.mlogit <- dfidx(df, idx = list("resp.id.ques", "position"))