J Lab 3

Linear Regression with Jamovi

Author

Yurk

Portions of this lab are based on the R lab from Chapter 10 of the Introduction to Modern Statistics (2e) textbook by Mine Çetinkaya-Rundel and Johanna Hardin.

Human Freedom Index

In this lab we will explore the hfi data set, available here or on our Moodle page. The data set is from the Human Freedom Index report, published annually by the Cato Institute, the Fraser Institute, and the Liberales Institut at the Friedrich Naumann Foundation for Freedom. The report includes several variables that measure different aspects of freedom for countries around the world.

Download the hfi.csv file to your computer. Next, open the file in Jamovi. How many cases/observations are there? What does each row represent?

We will only use a subset of the variables for this lab, described in the following table. In Jamovi, set up the Measure type and Data type for each variable as shown in the table.

Variable Measure type Data type Description
year Ordinal Integer Year
pf_expression_control Continuous Decimal Political pressures and controls on media content
pf_score Continuous Decimal Personal freedom score
ef_regulation Continuous Decimal Economic freedom regulation score
hf_score Continuous Decimal Total human freedom score

Filtering the data and scatter plot of pf_score vs pf_expression_control

There are many years in the hfi data, but we will focus only on the year 2016 for this lab. Filter the data set so that it only includes observations from 2016. If you forgot how to filter data in Jamovi, you can refer to the instructions in J Lab 2. The filtered data should include 162 rows.

Create a scatter plot of pf_score (the response variable) vs pf_expression_control (the predictor). In the scatter plot, pf_score should be on the \(y\)-axis and pf_expression_control should be on the \(x\)-axis.

If you forgot how to create a scatter plot in Jamovi, you can refer back to the instructions in J Lab 1. Your scatter plot should look like the following:

Does the relationship between pf_score and pf_expression_control appear to be linear? Is the association positive or negative? It it strong or weak?

pf_expression_control is on a scale of 0 to 10 with higher values indicating less political pressures and controls on media content. If you know a country’s pf_expression_control score, would you be comfortable predicting its personal freedom score using a linear model?

Adding a least squares regression line

Next, we will add the least squares regression line to the scatter plot. Recall that the least squares regression line is the line that minimizes the sum of the squared residuals (the vertical distances between the observed values of the response variable and the predicted values of the response variable).

We can add a regression line to the scatter plot by selecting the Linear option in the Regression Line menu in the Scatterplot interface. Figure 1 shows the Scatterplot interface with the correct option selected.

Figure 1: Adding a regression line to a scatter plot.

Correlation

The correlation coefficient is a measure of the strength and direction of a linear relationship between two variables. The correlation coefficient is always between -1 and 1. A correlation coefficient of 1 indicates a perfect positive linear relationship, a correlation coefficient of -1 indicates a perfect negative linear relationship, and a correlation coefficient of 0 indicates no linear relationship.

We can calculate the correlation coefficient between pf_score and pf_expression_control in Jamovi. In the Analysis tab, click on the Regression icon. Then, select Correlation Matrix. In the interface, drag the pf_score and pf_expression_control variables into the box on the right. The correlation coefficient we are interested in is called Pearson’s r, and it is listed in the table that is created. Figure 2 shows the Correlation Matrix interface with the correct variables selected.

Figure 2: Calculating the correlation coefficient between two variables.

The correlation between pf_score and pf_expression_control is \(r=0.845\). What does this value tell you about the relationship between the two variables?

Linear model and coefficient of determination

The equation of the least squares regression line is \(\widehat{pf\_score} = b_0 + b_1\times pf\_expression\_control\), where \(b_0\) is the intercept and \(b_1\) is the slope. We can calculate the coefficients of this linear model using Jamovi.

In the Analysis tab, click on the Regression icon. Then, select Linear Regression. In the interface, drag the pf_score variable into the Dependent Variable box, and drag the pf_expression_control variable into the Covariates box. Figure 3 shows the Linear Regression interface with the correct variables selected.

Figure 3: Fitting a linear model to two variables.

Two tables will be created, as seen on the right in Figure 3. The coefficients of the linear model are listed in the second table in the Estimate column. The first value is the intercept, and the second value is the slope. Using the coefficients from the regression table, we can write the equation of the least squares regression line, \[\widehat{pf\_score} = 4.284 + 0.542\times pf\_expression\_control.\]

What does the slope of the regression line tell you about the relationship between pf_score and pf_expression_control? Use the linear model to predict the personal freedom score for a country with a pf_expression_control score of 5.

The coefficient of determination, \(R^2\), is measure the proportion of the variability in the response variable that is explained by the predictor variable. The value of \(R^2\) is always between 0 and 1. The first table in Figure 3 lists the value of \(R^2\) for the linear model. Since \(R^2 = 0.714\), we know that 71.4% of the variability in pf_score is explained by pf_expression_control.

Model diagnostics

The residual for an observation is the difference between the observed and predicted values of the response variable for that observation. A residual plot is a scatter plot that shows the residuals on the \(y\)-axis and the predicted values of the response variable on the \(x\)-axis. For a linear model to be appropriate, the residuals should be randomly scattered around 0, and there should be no clear pattern in the residuals.

We can create a residual plot in Jamovi by selecting the Residual plots option in the Assumptions menu in the Linear Regression interface, as shown in Figure 4.

Figure 4: Creating a residual plot for a linear model.

Tis creates three scatter plots, as seen in Figure 4. The first plot is the one we want. It shows scatter plot of the residuals vs the predicted values of the response variable.

What does the residual plot tell you about the appropriateness of the linear model for predicting pf_score based on pf_expression_control? There is no apparent curve in the residuals, so the relationship appears to be linear. There are two things that you may notice in the residual plot that may be cause for concern. First, if we imagine a horizontal line passing through the y-axis at 0, we see that the residuals are more spread out below 0 than they are above 0. This suggests that the distributions of the residuals may be skewed left. However there are just a few points that are contibuting to this apparent skewness. Second, and probably more concerning is the residual plot exhibits a fanning pattern. The variability of the residuals appears to decrease as the predicted values of the response variable increase.

Later in the course we will use linear regression to make inferences about the relationship between two variables. The fanning that appears in this plot violates an assumption of linear regression, and we would need to address this issue before we can make valid inferences.

Saving your work

You can save your work in Jamovi by clicking on the hamburger menu and selecting Save. You can save your work as a .omv file, which is a file that can be opened in Jamovi. However, you will not turn this file in for your lab report. Instead, you will turn in a PDF of your lab report that includes screenshots of the Jamovi interface, scatter plots, tables, and your answers to questions at the end of the lab. Even though you are not turning it in, you should save your Jamovi file in case you need to refer back to it later.

What you need to turn in

This section includes questions that you will turn in for this lab. You will continue to work with the filtered hfi data set for this part of the lab, so make sure you still have the data filtered to only include observations from 2016.

In this part of the lab we will use hf_score as the response variable. hf_score is the total human freedom score. We will explore the relationships between hf_score and pf_expression_control and between hf_score and ef_regulation, the economic freedom regulation score.

  1. Use Jamovi to calculate the correlation coefficient between hf_score and pf_expression_control, and the correlation coefficient between hf_score and ef_regulation. Include your Correlation Matrix tables in your lab report. List the two correlation coefficients. Are both variables positively associated with hf_score? Which variable has a stronger linear association with hf_score?
  2. Use Jamovi to create a scatter plot of hf_score vs ef_regulation, with hf_score on the \(y\)-axis and ef_regulation on the \(x\)-axis, and add a least squares regression line to your plot. Include the scatter plot with the regression line in your lab report. Does the relationship between hf_score and ef_regulation appear to be linear? Is the association positive or negative? Is it strong or weak?
  3. Use Jamovi to fit a linear model to predict hf_score based on ef_regulation. Include the regression table (includes the model coefficients) and the model fit table (includes the coefficient of determination) in your lab report. 4. Using the values from the regression table you created in problem 3, write the equation of the least squares regression line. What does the slope of the regression line tell you about the relationship between hf_score and ef_regulation? Use the linear model to predict the total human freedom score for a country with a ef_regulation score of 7. Show the calculation you used to make the prediction.
  4. Suppose that a country with an ef_regulation score of 7 has an observed hf_score of 5.8. What is the value of the residual for this country? Show the calculation you used to find the residual.
  5. What is the value of the coefficient of determination for the linear model you fit in problem 3? What does this value tell you about the proportion of the variability in hf_score that is explained by ef_regulation?
  6. Use Jamovi to create a residual plot for the linear model you fit in problem 3. Include the residual plot in your lab report. What does the residual plot tell you about the appropriateness of the linear model for predicting hf_score based on ef_regulation? Explain.

You may create your lab report in a Word document or a Google Doc. You may organize your report as numbered answers to the questions listed above. Include the screenshots, plots, and tables in your report, making sure that they are positioned under the correct question number. You should also include your answers to the questions in your report, and your answers should refer to the relevant plots or tables when applicable. Save your report as a PDF and submit using the appropriate submission link on the course Moodle page (check the pdf before you submit it to make sure it is readable and complete).