Sunday, October 30, 2016

Creating graphs

Univariate Graphs:

 I am analysing the Gapminder dataset, which has quantitative variables. I am looking at the variables incomeperperson, urbanrate and co2emissions. To prepare univariate graphs, I binned the values of the three variables and converted them into categorical variables. After this, I added the code for vertical bar charts for each of the variables.

My program is:

  LIBNAME mydata "/courses/d1406ae5ba27fe300 " access=readonly;
DATA new; set mydata.gapminder;
LABEL incomeperperson="Per capita GDP"
    co2emissions="CO2 emissions (in metric tons)"
    urbanrate="Percentage of people in urban areas";
IF incomeperperson <= 2000 then Incomecategory=1;
ELSE IF incomeperperson <= 14000 THEN Incomecategory=2;
ELSE Incomecategory=3;
IF co2emissions LE 20000000 THEN Emissionrate=1;
ELSE IF co2emissions LE 50000000 THEN Emissionrate=2;
ELSE Emissionrate=3;
IF urbanrate LE 20 THEN Urbanisation=1;
ELSE IF urbanrate LE 40 THEN Urbanisation=2;
ELSE IF urbanrate LE 60 THEN Urbanisation=3;
ELSE IF urbanrate LE 80 THEN Urbanisation=4;
ELSE Urbanisation=5;
PROC SORT; BY COUNTRY;
PROC FREQ; TABLES Incomecategory Emissionrate Urbanisation;
PROC GCHART; VBAR Incomecategory/discrete Width=20;
PROC GCHART; VBAR Emissionrate/discrete Width=20;
PROC GCHART; VBAR Urbanisation/discrete type=PCT Width=10;
Run;

Result tables and graphs:




Incomecategory
Frequency
Percent
Cumulative
Frequency
Cumulative
Percent
1
103
48.36
103
48.36
2
72
33.80
175
82.16
3
38
17.84
213
100.00


The highest number of countries (103) are in the lowest incomecategory. Since there are only 3 observation categories in this variable, the symmetry is simple, with a unimodal distribution. 

It is difficult to define the skew or spread of the graph as it has only 3 categorical values.
















Emissionrate
Frequency
Percent
Cumulative
Frequency
Cumulative
Percent
1
50
23.47
50
23.47
2
25
11.74
75
35.21
3
138
64.79
213
100.00


 
The emissionrate graph is bimodal and skewed left. The highest number of countries (138) have emissionrate higher than 50000000. Interestingly, a relatively higher number of countries (50) are in the lowest emission rate category as compared to the middle category.



















Urbanisation
Frequency
Percent
Cumulative
Frequency
Cumulative
Percent
1
23
10.80
23
10.80
2
46
21.60
69
32.39
3
46
21.60
115
53.99
4
58
27.23
173
81.22
5
40
18.78
213
100.00





I decided to run the program for rate of urbanisation as a pie chart with percentages, since I had five categories that could easily be split. 


















Bivariate graphs on a separate page

No comments:

Post a Comment