Univariate Graphs:
I am analysing the Gapminder dataset, which has quantitative variables. I am looking at the variables incomeperperson, urbanrate and co2emissions. To prepare univariate graphs, I binned the values of the three variables and converted them into categorical variables. After this, I added the code for vertical bar charts for each of the variables.
My program is:
LIBNAME mydata "/courses/d1406ae5ba27fe300 " access=readonly;
DATA new; set mydata.gapminder;
LABEL incomeperperson="Per capita GDP"
co2emissions="CO2 emissions (in metric tons)"
urbanrate="Percentage of people in urban areas";
IF incomeperperson <= 2000 then Incomecategory=1;
ELSE IF incomeperperson <= 14000 THEN Incomecategory=2;
ELSE Incomecategory=3;
IF co2emissions LE 20000000 THEN Emissionrate=1;
ELSE IF co2emissions LE 50000000 THEN Emissionrate=2;
ELSE Emissionrate=3;
IF urbanrate LE 20 THEN Urbanisation=1;
ELSE IF urbanrate LE 40 THEN Urbanisation=2;
ELSE IF urbanrate LE 60 THEN Urbanisation=3;
ELSE IF urbanrate LE 80 THEN Urbanisation=4;
ELSE Urbanisation=5;
PROC SORT; BY COUNTRY;
PROC FREQ; TABLES Incomecategory Emissionrate Urbanisation;
PROC GCHART; VBAR Incomecategory/discrete Width=20;
PROC GCHART; VBAR Emissionrate/discrete Width=20;
PROC GCHART; VBAR Urbanisation/discrete type=PCT Width=10;
Run;
Result tables and graphs:
I decided to run the program for rate of urbanisation as a pie chart with percentages, since I had five categories that could easily be split.
Bivariate graphs on a separate page
I am analysing the Gapminder dataset, which has quantitative variables. I am looking at the variables incomeperperson, urbanrate and co2emissions. To prepare univariate graphs, I binned the values of the three variables and converted them into categorical variables. After this, I added the code for vertical bar charts for each of the variables.
My program is:
LIBNAME mydata "/courses/d1406ae5ba27fe300 " access=readonly;
DATA new; set mydata.gapminder;
LABEL incomeperperson="Per capita GDP"
co2emissions="CO2 emissions (in metric tons)"
urbanrate="Percentage of people in urban areas";
IF incomeperperson <= 2000 then Incomecategory=1;
ELSE IF incomeperperson <= 14000 THEN Incomecategory=2;
ELSE Incomecategory=3;
IF co2emissions LE 20000000 THEN Emissionrate=1;
ELSE IF co2emissions LE 50000000 THEN Emissionrate=2;
ELSE Emissionrate=3;
IF urbanrate LE 20 THEN Urbanisation=1;
ELSE IF urbanrate LE 40 THEN Urbanisation=2;
ELSE IF urbanrate LE 60 THEN Urbanisation=3;
ELSE IF urbanrate LE 80 THEN Urbanisation=4;
ELSE Urbanisation=5;
PROC SORT; BY COUNTRY;
PROC FREQ; TABLES Incomecategory Emissionrate Urbanisation;
PROC GCHART; VBAR Incomecategory/discrete Width=20;
PROC GCHART; VBAR Emissionrate/discrete Width=20;
PROC GCHART; VBAR Urbanisation/discrete type=PCT Width=10;
Run;
Result tables and graphs:
Incomecategory
|
Frequency
|
Percent
|
Cumulative
Frequency |
Cumulative
Percent |
1
|
103
|
48.36
|
103
|
48.36
|
2
|
72
|
33.80
|
175
|
82.16
|
3
|
38
|
17.84
|
213
|
100.00
|
The highest number of countries (103) are in the lowest incomecategory. Since there are only 3 observation categories in this variable, the symmetry is simple, with a unimodal distribution.
It is difficult to define the skew or spread of the graph as it has only 3 categorical values.
Emissionrate
|
Frequency
|
Percent
|
Cumulative
Frequency |
Cumulative
Percent |
1
|
50
|
23.47
|
50
|
23.47
|
2
|
25
|
11.74
|
75
|
35.21
|
3
|
138
|
64.79
|
213
|
100.00
|
The emissionrate graph is bimodal and skewed left. The highest number of countries (138) have emissionrate higher than 50000000. Interestingly, a relatively higher number of countries (50) are in the lowest emission rate category as compared to the middle category.
Urbanisation
|
Frequency
|
Percent
|
Cumulative
Frequency |
Cumulative
Percent |
1
|
23
|
10.80
|
23
|
10.80
|
2
|
46
|
21.60
|
69
|
32.39
|
3
|
46
|
21.60
|
115
|
53.99
|
4
|
58
|
27.23
|
173
|
81.22
|
5
|
40
|
18.78
|
213
|
100.00
|
I decided to run the program for rate of urbanisation as a pie chart with percentages, since I had five categories that could easily be split.
Bivariate graphs on a separate page
No comments:
Post a Comment