Data management & visualisation: Managing data

For managing data, I wanted to categorise the variable incomeperperson in the Gapminder dataset into Low, Middle and High income groups. I created a new variable - 'Incomecategory' with three categories: Low income (1), Middle income (2) and High income (3) categories.

My program is as follows:

LIBNAME mydata "/courses/d1406ae5ba27fe300 " access=readonly;
DATA new; set mydata.gapminder;
LABEL incomeperperson="Per capita GDP"
co2emissions="CO2 emissions (in metric tons)"
urbanrate="Percentage of people in urban areas";
IF incomeperperson <= 2000 then Incomecategory=1;
ELSE IF incomeperperson <= 14000 THEN Incomecategory=2;
ELSE Incomecategory=3;
PROC SORT; BY COUNTRY;
PROC FREQ; TABLES Incomecategory incomeperperson co2emissions urbanrate;
RUN;

This is what the grouped table looks like:

The FREQ Procedure

Incomecategory	Frequency	Percent	Cumulative Frequency	Cumulative Percent
1	103	48.36	103	48.36
2	72	33.80	175	82.16
3	38	17.84	213	100.00

				This shows that over 48% (nearly half) the sample countries fall in the low income category with per capita GDP equal to or less than USD 2,000, 33.8% are in the middle income category with per capita GDP equal to or less than USD 14,000. A little under 18% countries fall in the high income category with per capita GDP above USD 14,000.

The categorisation of the discrete quantitative variables will help me in further analysing the Gapminder data and interpret the trends and correlation between income, urbanisation and emission levels.

Frequency tables for the three variables, urbanrate, co2emisions and incomeperperson are provided in the following pages:

Managing data - 2a

Managing data - 2

Since the Gapminder dataset does not have qualitative or categorical variables, I decided not to run a program for coding out missing data or coding in valid data.

Data management & visualisation

Sunday, September 25, 2016

Managing data

Managing data - 2a

Managing data - 2

No comments:

Post a Comment