I have a dataset that looks like the following:
year Gender VP
2023 Total 3
2023 F 2
2023 M 1
2022 Total 6
2022 F 4
2022 M 2
2021 Total 2
...
What I want to do is generate a variable VP_per
that gives me the female percentage of VPs for every year.
Basically something that looks like
bysort year gender : gen VP_per = VP if gender==1 / VP if gender==2
where 1 is the numeric value for Female and 2 is the numeric value for Total.
Your data layout -- in which subtotals are held in separate observations -- is not especially suitable for most Stata purposes, but it can be accommodated.
Here is some technique, which can be modified for numeric variables with value labels.
clear
input int Year str5 Gender byte VP
2023 "Total" 3
2023 "F" 2
2023 "M" 1
2022 "Total" 6
2022 "F" 4
2022 "M" 2
end
* this hinges on sorting F M Total
bysort Year (Gender) : gen wanted = 100 * VP[1] / VP[3]
* need to modify for numeric variables
egen num = total(cond(Gender == "F", VP, .)), by(Year)
egen den = total(cond(Gender == "Total", VP, .)), by(Year)
gen wanted2 = 100 * num/den
If Gender
were numeric with 1 for Female and 2 for Total
egen num = total(cond(Gender == 1, VP, .)), by(Year)
egen den = total(cond(Gender == 2, VP, .)), by(Year)