pythonpython-3.xpandaslimesurvey

How to select columns of data from a DataFrame


I'm retrieving survey results from Lime Survey via its API (Remote Control):

enter image description here

And I manage to get it into a DataFrame. But it's just 1 column per row:

enter image description here

The data looks like this.

enter image description here

What I want to be able to do is get averages of the data by question and category. From the example below, q10[wor1], q10[wor2], . . ., q10[wor7] give the 7 questions that are part of category q10.

How to first select all the data for wor1, wor2, ..., wor7, separately, so that I can do stats on each of those individual questions.

Then how do I select all data for q10* so that I can do stats for the entire group?

Even without trying to separate the category from the question, I haven't been able to select just all the 'q10[wor1]' data.


Solution

  • Check out jq - https://stedolan.github.io/jq/

    You can pass your df ['responses'] json to jq and extract the required field and create that as a separate df column.

    And then you can get the average of the columns from the df.