I have a table of areas with data. For a particular operation, I want to exclude the top and bottom 1% of areas, as they include extreme outliers.
Seems to me that a way forward is:
SORT CASES BY theVariableIwantToAnalyse (A) .
NUMERIC id (F12.0) . * create a casenum label "id"
COMPUTE id = $CASENUM. * populate it with casenum
EXECUTE.
NUMERIC idmax (F12.4) . * create a variable to contain the highest value for "id"
NUMERIC id1perc (F12.4) . * create a variable to contain 1% of the highest value for "id"
COMPUTE idmax = MAX(id) . * determine the highest value for id. This 'mock-syntax' line does not work.
COMPUTE id1perc = idmax / 100 . * 1% of the highest value for "id"
SELECT CASES WHERE ID >= id1perc or ID <= idmax - id1perc .
Draw graphs etc. I then need to
SORT CASES BY theNextVariableIwantToAnalyse (A) .
COMPUTE id = $CASENUM. * populate it with the NEW casenum order
EXECUTE.
etc ...
Try this to simply filter out the top and bottom 1% - just add FILTER BY filter.
to turn off all extreme cases, or SELECT IF filter.
... EXECUTE.
to delete them
EDIT: note that repeated values will be condensed by the RANK
method (specifically the /TIES
option). This might not be ideal if you have the possibility of repeated values. Change the /TIES
option if that's the case.
************* GENERATE RANDOM DATA *****************.
INPUT PROGRAM.
- LOOP #I = 1 TO 1000.
- COMPUTE Y = RV.NORMAL(100,10).
- END CASE.
- END LOOP.
- END FILE.
END INPUT PROGRAM.
dataset name exampleData WINDOW=front.
EXECUTE.
************* RANK DATA *************.
DATASET ACTIVATE exampleData.
RANK VARIABLES=Y (A)
/RFRACTION INTO fractile
/TIES=CONDENSE.
************* MAKE A FILTER *************.
COMPUTE filter = (fractile>0.01 AND fractile < 0.99).
EXECUTE.
* Chart Builder.
GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=Y filter MISSING=LISTWISE REPORTMISSING=NO
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: Y=col(source(s), name("Y"))
DATA: filter=col(source(s), name("filter"), unit.category())
GUIDE: axis(dim(1), label("Y"))
GUIDE: axis(dim(2), label("Frequency"))
GUIDE: legend(aesthetic(aesthetic.color.interior), label("filter"))
ELEMENT: interval.stack(position(summary.count(bin.rect(Y))), color.interior(filter),
shape.interior(shape.square))
END GPL.