Scatterplots

A scatterplot is a visual display of bivariate data which can be used to deduce whether or not there is a relationship between variables. There are several things to look for in the pattern of points. 

Note: The independent variable is typically the x-axis whereas the dependent variable is the y-axis. Each point represents a single case.

Direction: positive or negative

no relationshipGraph 1

There is no observed pattern in this set of data, meaning there is no relationship between happiness and age.

positive relationshipGraph 2

In contrast, graph 2 suggests a positive relationship between the two variables as an increase in age seemed to increase a person’s height.

negative relationshipGraph 3


This graph shows a negative relationship between grades and school absences. This indicates that these two variables are related, in that there is a steady decrease in grades the more absences there are from school.

Form: linear or non-linear

In the core section of the Further Mathematics syllabus, we are only concerned with whether or not the relationships between variables are linear. If the points in a scatterplot represent fluctuations around a straight line, then it could be described as having a linear form. If there is an observed pattern but it is not linear, then it is referred to as non-linear.

Graph 2 and 3 above both display a linear relationship, indicating that there is steady change in dependent variable vs independent variable.

(Left) Graph 4 non linear

On the contrary, Graph 4 displays a clear pattern but one that is most likely not linear. The green line drawn is a more likely pattern for the data.

 

 

Strength: the correlation coefficient, r

The strength of a linear relationship is an indication of how closely the points in the scatterplot are to the best-fitted line. If there is little variation from the line, then the scatterplot can be described as having a strong linear relationship. To measure the strength of a linear relationship, Pearson’s correlation coefficientr, is determined and used. It must be noted that the relationship must be linear and there cannot be any outliers.

The correlation coefficient is a number between  -1 and 1 where a number on either extremity represents a strong relationship. That is, -1 < r <1. Below is a representation of the range of value and what they indicate.

correlation coefficient

Examples:

strength

In calculating Pearson’s correlation coefficient without any programs, the following formula is used:

r = \dfrac{\sum(x-\bar{x})(y-\bar{y})}{(n-1)s_xs_y}
where \bar{x} and s_x = mean and standard deviations of values
\bar{y} and s_y = mean and standard deviations of y values

See also: