Scatterplot Chart
A scatterplot places individual data points on a two-axis grid, where the horizontal position encodes one numeric variable and the vertical position encodes another. The resulting pattern of dots reveals whether - and how strongly - the two variables are related.
Scatterplots excel at identifying correlations and outliers between two numeric variables.
An example of an embedded scatterplot chart
Creating an Effective Scatterplot Chart
Recommended data types for each axis:
- X-Axis Numerical values (independent variable)
- Y-Axis Numerical values (dependent variable)
Description
- Points - each dot represents one row of data; its position is determined by the X and Y values for that row
- X-Axis - the independent or explanatory variable
- Y-Axis - the dependent or outcome variable
- Trend - the overall direction of the point cloud (upward, downward, or flat) indicates the correlation direction
When to Use a Scatterplot
- Identify correlations - see whether two numeric variables move together (positive correlation) or in opposite directions (negative correlation)
- Detect outliers - points far from the main cluster stand out immediately
- Explore relationships before modeling - scatterplots are a standard first step in statistical analysis
- Compare two numeric datasets - when both axes are continuous, scatterplots are the natural choice
- Show distribution density - dense clusters reveal where most data points concentrate
When to Avoid a Scatterplot
- One categorical axis - if one variable is categorical, use a bar chart instead
- Showing trends over time - use a line chart when the X-axis is time and the sequence matters
- Too many overlapping points - with very large datasets, points can overplot and obscure patterns; consider aggregating or using a heatmap
- Multiple categories - use a grouped scatterplot to color-code points by a third categorical variable
Further Reading
When to Use a Scatterplot Chart - a deeper look at scatterplot use cases, common mistakes, and alternatives.