A scatterplot is a visual display of bivariate data which can be used to deduce whether or not there is a relationship between variables. There are several things to look for in the pattern of points.
Note: The independent variable is typically the x-axis whereas the dependent variable is the y-axis. Each point represents a single case.
Graph 1
There is no observed pattern in this set of data, meaning there is no relationship between happiness and age.
In contrast, graph 2 suggests a positive relationship between the two variables as an increase in age seemed to increase a person’s height.
This graph shows a negative relationship between grades and school absences. This indicates that these two variables are related, in that there is a steady decrease in grades the more absences there are from school.
In the core section of the Further Mathematics syllabus, we are only concerned with whether or not the relationships between variables are linear. If the points in a scatterplot represent fluctuations around a straight line, then it could be described as having a linear form. If there is an observed pattern but it is not linear, then it is referred to as non-linear.
Graph 2 and 3 above both display a linear relationship, indicating that there is steady change in dependent variable vs independent variable.
On the contrary, Graph 4 displays a clear pattern but one that is most likely not linear. The green line drawn is a more likely pattern for the data.
The strength of a linear relationship is an indication of how closely the points in the scatterplot are to the best-fitted line. If there is little variation from the line, then the scatterplot can be described as having a strong linear relationship. To measure the strength of a linear relationship, Pearson’s correlation coefficient, r, is determined and used. It must be noted that the relationship must be linear and there cannot be any outliers.
The correlation coefficient is a number between -1 and 1 where a number on either extremity represents a strong relationship. That is, -1 < r <1. Below is a representation of the range of r value and what they indicate.
Examples:
In calculating Pearson’s correlation coefficient without any programs, the following formula is used:
where and
= mean and standard deviations of x values
and
= mean and standard deviations of y values
See also:
Want to suggest an edit? Have some questions? General comments? Let us know how we can make this resource more useful to you.