Sunday, September 25, 2016

Managing data

For managing data, I wanted to categorise the variable incomeperperson in the Gapminder dataset into Low, Middle and High income groups. I created a new variable - 'Incomecategory' with three categories: Low income (1), Middle income (2) and High income (3) categories.

My program is as follows:

  LIBNAME mydata "/courses/d1406ae5ba27fe300 " access=readonly;
DATA new; set mydata.gapminder;
LABEL incomeperperson="Per capita GDP"
    co2emissions="CO2 emissions (in metric tons)"
    urbanrate="Percentage of people in urban areas";
IF incomeperperson <= 2000 then Incomecategory=1;
ELSE IF incomeperperson <= 14000 THEN Incomecategory=2;
ELSE Incomecategory=3;
PROC SORT; BY COUNTRY;
PROC FREQ; TABLES Incomecategory incomeperperson co2emissions urbanrate;
RUN;

This is what the grouped table looks like:



The FREQ Procedure
Incomecategory
Frequency
Percent
Cumulative
Frequency
Cumulative
Percent
1
103
48.36
103
48.36
2
72
33.80
175
82.16
3
38
17.84
213
100.00
 



This shows that over 48% (nearly half) the sample countries fall in the low income category with per capita GDP equal to or less than USD 2,000, 33.8% are in the middle income category with per capita GDP equal to or less than USD 14,000. A little under 18% countries fall in the high income category with per capita GDP above USD 14,000.
The categorisation of the discrete quantitative variables will help me in further analysing the Gapminder data and interpret the trends and correlation between income, urbanisation and emission levels.

Frequency tables for the three variables, urbanrate, co2emisions and incomeperperson are provided in the following pages:

Managing data - 2a

Managing data - 2

Since the Gapminder dataset does not have qualitative or categorical variables, I decided not to run a program for coding out missing data or coding in valid data.



































Tuesday, September 20, 2016

Per capita GDP table





Per capita GDP
incomeperperson
Frequency
Percent
Cumulative
Frequency
Cumulative
Percent
103.77585724
1
0.64
1
0.64
115.3059959
1
0.64
2
1.28
131.79620701
1
0.64
3
1.92
155.03323123
1
0.64
4
2.56
161.3171371
1
0.64
5
3.21
180.083376
1
0.64
6
3.85
184.14179659
1
0.64
7
4.49
220.89124792
1
0.64
8
5.13
239.51874937
1
0.64
9
5.77
242.67753416
1
0.64
10
6.41
268.25944951
1
0.64
11
7.05
268.3317903
1
0.64
12
7.69
269.89288112
1
0.64
13
8.33
275.88428653
1
0.64
14
8.97
276.20041296
1
0.64
15
9.62
279.18045256
1
0.64
16
10.26
285.22444925
1
0.64
17
10.90
320.77188995
1
0.64
18
11.54
336.36874948
1
0.64
19
12.18
338.26639123
1
0.64
20
12.82
354.59972629
1
0.64
21
13.46
358.9795398
1
0.64
22
14.10
369.57295374
1
0.64
23
14.74
371.42419752
1
0.64
24
15.38
372.728414
1
0.64
25
16.03
377.03969946
1
0.64
26
16.67
377.42111326
1
0.64
27
17.31
389.76363425
1
0.64
28
17.95
411.50144725
1
0.64
29
18.59
432.22633697
1
0.64
30
19.23
456.38571165
1
0.64
31
19.87
468.69604356
1
0.64
32
20.51
495.73424694
1
0.64
33
21.15
523.95015149
1
0.64
34
21.79
544.59947666
1
0.64
35
22.44
554.87984007
1
0.64
36
23.08
557.9475126
1
0.64
37
23.72
558.06287663
1
0.64
38
24.36
561.70858483
1
0.64
39
25.00
591.06794434
1
0.64
40
25.64
595.87453452
1
0.64
41
26.28
609.13120592
1
0.64
42
26.92
610.35736732
1
0.64
43
27.56
668.54794304
1
0.64
44
28.21
713.63930272
1
0.64
45
28.85
722.80755883
1
0.64
46
29.49
736.2680538
1
0.64
47
30.13
744.23941319
1
0.64
48
30.77
760.26236504
1
0.64
49
31.41
772.9333448
1
0.64
50
32.05
786.70009815
1
0.64
51
32.69
895.31833964
1
0.64
52
33.33
948.35595202
1
0.64
53
33.97
952.82726083
1
0.64
54
34.62
1036.8307249
1
0.64
55
35.26
1143.8315135
1
0.64
56
35.90
1144.1021934
1
0.64
57
36.54
1194.7114334
1
0.64
58
37.18
1200.6520749
1
0.64
59
37.82
1232.794137
1
0.64
60
38.46
1253.2920151
1
0.64
61
39.10
1258.7625963
1
0.64
62
39.74
1295.7426861
1
0.64
63
40.38
1324.1949063
1
0.64
64
41.03
1326.7417572
1
0.64
65
41.67
1381.0042677
1
0.64
66
42.31
1383.4018689
1
0.64
67
42.95
1392.4118285
1
0.64
68
43.59
1525.7801159
1
0.64
69
44.23
1543.9564567
1
0.64
70
44.87
1621.1770776
1
0.64
71
45.51
1714.9428899
1
0.64
72
46.15
1728.0209761
1
0.64
73
46.79
1784.0712838
1
0.64
74
47.44
1810.2305326
1
0.64
75
48.08
1844.3510276
1
0.64
76
48.72
1860.753895
1
0.64
77
49.36
1914.9965509
1
0.64
78
50.00
1959.8444724
1
0.64
79
50.64
1975.5519059
1
0.64
80
51.28
2025.2826649
1
0.64
81
51.92
2062.1251524
1
0.64
82
52.56
2146.3585931
1
0.64
83
53.21
2161.5465097
1
0.64
84
53.85
2183.344867
1
0.64
85
54.49
2221.185664
1
0.64
86
55.13
2222.3350522
1
0.64
87
55.77
2230.6763741
1
0.64
88
56.41
2231.9933352
1
0.64
89
57.05
2344.8969162
1
0.64
90
57.69
2425.4712933
1
0.64
91
58.33
2437.2824454
1
0.64
92
58.97
2481.7189179
1
0.64
93
59.62
2534.00038
1
0.64
94
60.26
2549.5584738
1
0.64
95
60.90
2557.4336378
1
0.64
96
61.54
2636.7877999
1
0.64
97
62.18
2667.2467097
1
0.64
98
62.82
2668.0205189
1
0.64
99
63.46
2712.5171987
1
0.64
100
64.10
2737.6703794
1
0.64
101
64.74
2923.1443548
1
0.64
102
65.38
3164.9276933
1
0.64
103
66.03
3180.4306118
1
0.64
104
66.67
3233.4237801
1
0.64
105
67.31
3545.6521739
1
0.64
106
67.95
3665.3483686
1
0.64
107
68.59
3745.6498521
1
0.64
108
69.23
4038.857818
1
0.64
109
69.87
4049.1696291
1
0.64
110
70.51
4180.7658207
1
0.64
111
71.15
4189.4365875
1
0.64
112
71.79
4495.0462615
1
0.64
113
72.44
4699.4112621
1
0.64
114
73.08
4885.0467014
1
0.64
115
73.72
5011.2194563
1
0.64
116
74.36
5182.1437206
1
0.64
117
75.00
5184.7093276
1
0.64
118
75.64
5188.9009352
1
0.64
119
76.28
5248.5823215
1
0.64
120
76.92
5330.401612
1
0.64
121
77.56
5332.2385914
1
0.64
122
78.21
5348.5971919
1
0.64
123
78.85
5528.3631139
1
0.64
124
79.49
5634.003948
1
0.64
125
80.13
5900.6169445
1
0.64
126
80.77
6105.280743
1
0.64
127
81.41
6147.7796098
1
0.64
128
82.05
6238.5375062
1
0.64
129
82.69
6243.5713183
1
0.64
130
83.33
6334.105194
1
0.64
131
83.97
6338.4946677
1
0.64
132
84.62
6575.7450439
1
0.64
133
85.26
6746.6126318
1
0.64
134
85.90
7381.3127508
1
0.64
135
86.54
7885.4680369
1
0.64
136
87.18
8445.5266887
1
0.64
137
87.82
8614.1202192
1
0.64
138
88.46
8654.5368449
1
0.64
139
89.10
9106.3272342
1
0.64
140
89.74
9175.7960147
1
0.64
141
90.38
9243.5870526
1
0.64
142
91.03
9425.3258698
1
0.64
143
91.67
10480.817203
1
0.64
144
92.31
10749.419238
1
0.64
145
92.95
11066.784145
1
0.64
146
93.59
11191.811007
1
0.64
147
94.23
11744.834167
1
0.64
148
94.87
11894.464075
1
0.64
149
95.51
12505.212545
1
0.64
150
96.15
12729.4544
1
0.64
151
96.79
13577.879885
1
0.64
152
97.44
14778.163929
1
0.64
153
98.08
15313.859347
1
0.64
154
98.72
15461.758372
1
0.64
155
99.36
15822.112141
1
0.64
156
100.00
Frequency Missing = 23