Looking at the Gapminder dataset, my research question was “How do urbanisation and income levels impact CO2 emissions?”
After doing a literature review on the question, my hypothesis was that while higher levels of urbanisation might lead to greater economic activity and therefore higher incomes, they do not necessarily lead to higher rates of CO2 emissions.
Since the variables I have selected are all quantitative, I ran the program to create two separate scatter plots with incomeperperson and urbanrate as the explanatory variables and co2emissions as the response variable in both. After this, I also added the code for the new categorical variable (Incomecategory) I had created by binning incomeperperson into 3 categories. I ran the code for a bar chart with Incomecategory as the explanatory variable and co2emisions as the response variable. My program was:
LIBNAME mydata "/courses/d1406ae5ba27fe300 " access=readonly;
DATA new; set mydata.gapminder;
LABEL incomeperperson="Per capita GDP"
co2emissions="CO2 emissions (in metric tons)"
urbanrate="Percentage of people in urban areas";
IF incomeperperson <= 2000 then Incomecategory=1;
ELSE IF incomeperperson <= 14000 THEN Incomecategory=2;
ELSE Incomecategory=3;
PROC SORT; BY COUNTRY;
PROC FREQ; TABLES Incomecategory;
PROC UNIVARIATE; VAR incomeperperson co2emissions urbanrate;
PROC GPLOT; PLOT co2emissions*incomeperperson co2emissions*urbanrate;
PROC GCHART; VBAR Incomecategory/discrete TYPE=MEAN SUMVAR=co2emissions;
RUN;
The tables are on the following page:
Tables for variability and frequency
As seen from the two scatter plots, there is no significant rise in CO2 emissions with rise in income or rate of urbanisation. In fact, the second graph shows a marginally negative correlation between rate of urbanisation and CO2 emissions, which is consistent with the literature review of my research question.
To get a better idea on the correlation between income and CO2 emissions, I created a bar chart with the categorical variable Incomecategory.
With this Quantitative - Categorical graph, we clearly see a positive correlation between income and CO2 emissions. The jump in the level of emissions for the high income category is significant.
After doing a literature review on the question, my hypothesis was that while higher levels of urbanisation might lead to greater economic activity and therefore higher incomes, they do not necessarily lead to higher rates of CO2 emissions.
Since the variables I have selected are all quantitative, I ran the program to create two separate scatter plots with incomeperperson and urbanrate as the explanatory variables and co2emissions as the response variable in both. After this, I also added the code for the new categorical variable (Incomecategory) I had created by binning incomeperperson into 3 categories. I ran the code for a bar chart with Incomecategory as the explanatory variable and co2emisions as the response variable. My program was:
LIBNAME mydata "/courses/d1406ae5ba27fe300 " access=readonly;
DATA new; set mydata.gapminder;
LABEL incomeperperson="Per capita GDP"
co2emissions="CO2 emissions (in metric tons)"
urbanrate="Percentage of people in urban areas";
IF incomeperperson <= 2000 then Incomecategory=1;
ELSE IF incomeperperson <= 14000 THEN Incomecategory=2;
ELSE Incomecategory=3;
PROC SORT; BY COUNTRY;
PROC FREQ; TABLES Incomecategory;
PROC UNIVARIATE; VAR incomeperperson co2emissions urbanrate;
PROC GPLOT; PLOT co2emissions*incomeperperson co2emissions*urbanrate;
PROC GCHART; VBAR Incomecategory/discrete TYPE=MEAN SUMVAR=co2emissions;
RUN;
The tables are on the following page:
Tables for variability and frequency
As seen from the two scatter plots, there is no significant rise in CO2 emissions with rise in income or rate of urbanisation. In fact, the second graph shows a marginally negative correlation between rate of urbanisation and CO2 emissions, which is consistent with the literature review of my research question.
To get a better idea on the correlation between income and CO2 emissions, I created a bar chart with the categorical variable Incomecategory.
With this Quantitative - Categorical graph, we clearly see a positive correlation between income and CO2 emissions. The jump in the level of emissions for the high income category is significant.