How to create a Box and Whisker Plot in Tableaumengjiao.fu
[box type=”note” align=”” class=”” width=””]This article is an excerpt from a book written by Shweta Sankhe-Savale, titled Tableau Cookbook – Recipes for Data Visualization. With the recipes in this book, learn to create beautiful data visualizations in no time on Tableau.[/box]
In today’s tutorial, we will learn how to create a Box and Whisker plot in Tableau.
The Box plot, or Box and Whisker plot as it is popularly known, is a convenient statistical representation of the variation in a statistical population. It is a great way of showing a number of data points as well as showing the outliers and the central tendencies of data.
This visual representation of the distribution within a dataset was first introduced by American mathematician John W. Tukey in 1969. A box plot is significantly easier to plot than say a histogram and it does not require the user to make assumptions regarding the bin sizes and number of bins; and yet it gives significant insight into the distribution of the dataset.
The box plot primarily consists of four parts:
The median provides the central tendency of our dataset. It is the value that divides our dataset into two parts, values that are either higher or lower than the median. The position of the median within the box indicates the skewness in the data as it shifts either towards the upper or lower quartile.
The upper and lower quartiles, which form the box, represent the degree of dispersion or spread of the data between them. The difference between the upper and lower quartile is called the Interquartile Range (IQR) and it indicates the mid-spread within which 50 percentage of the points in our dataset lie.
The upper and lower whiskers in a box plot can either be plotted at the maximum and minimum value in the dataset, or 1.5 times the IQR on the upper and lower side. Plotting the whiskers at the maximum and minimum values includes 100 percentage of all values in the dataset including all the outliers. Whereas plotting the whiskers at 1.5 times the IQR on the upper and lower side represents outliers in the data beyond the whiskers.
The points lying between the lower whisker and the lower quartile are the lower 25 percent
of values in the dataset, whereas the points lying between the upper whisker and the upper
quartile are the upper 25 percent of values in the dataset.
In a typical normal distribution, each part of the box plot will be equally spaced. However, in
most cases, the box plot will quickly show the underlying variations and trends in data and
allows for easy comparison between datasets:
Create a Box and Whisker plot in a new sheet in a workbook.
For this purpose, we will connect to an Excel file named Data for Box plot & Gantt chart, which has been uploaded on https://1drv.ms/f/ s!Av5QCoyLTBpnhkGyrRrZQWPHWpcY.
Let us save this Excel file in Documents | My Tableau Repository | Datasources | Tableau Cookbook data folder.
The data contains information about customers in terms of their gender and recorded weight. The data contains 100 records, one record per customer. Using this data, let us look at how we can create a Box and Whisker plot.
How to do it
Once we have downloaded and saved the data from the link provided in the Getting ready section, we will create a new worksheet in our existing workbook and rename it to Box and Whisker plot.
- Since we haven’t connected to the new dataset yet, establish a new data connection by pressing Ctrl + D on our keyboard.
- Select the Excel option and connect to the Data for Box plot & Gantt chart file, which is saved in our Documents | My Tableau Repository | Datasources | Tableau Cookbook data folder.
- Next let us select the table named Box and Whisker plot data by doubleclicking on it.
- Let us go ahead with the Live option to connect to this data.
- Next let us multi-select the Customer and Gender field from the Dimensions pane and the Weight from the Measures pane by doing a Ctrl + Select. Refer to the following image:
6. Next let us click on the Show Me! button and select the box-and-whisker plot. Refer to the highlighted section in the following image:
7. Once we click on the box-and-whisker plot option, we will see the following view:
How it works
In the preceding chart, we get two box and whisker plots: one for each gender. The whiskers are the maximum and minimum extent of the data. Furthermore, in each category we can see some circles, which are essentially representing a customer. Thus, within each gender category, the graph is showing the distribution of customers by their respective weights. When we hover over any of these circles, we can see details of the customer in terms of name, gender, and recorded weight in the tooltip. Refer to the following image:
However, when we hover over the box (gray section), we will see the details in terms of median, lower quartiles, upper quartiles, and so on. Refer to the following image:
Thus, a summary of the box plot that we created is as follows:
In more simple terms, for the female category, the majority of the population lies between the weight range of 44 to 75, whereas for the male category, the majority of the population lies between the weight range of 44 to 82.
|Please note that in our visualization, even though the Row shelf displays SUM(Weight), since we have Customer in the Detail shelf, there’s only one entry per customer, so SUM(Weight) is actually the same as MIN(Weight), MAX(Weight), or AVG(Weight).|
We learnt the basics of Box and Whisker plot and how to create them using Tableau. If you had fun with this recipe, do check out our book Tableau Cookbook – Recipes for Data Visualization to create interactive dashboards and beautiful data visualizations with Tableau.