- Science, Tech, Math ›
- Statistics ›
Presenting Data in Graphic Form
Ashley Crossman
- Statistics Tutorials
- Probability & Games
- Descriptive Statistics
- Inferential Statistics
- Applications Of Statistics
- Math Tutorials
- Pre Algebra & Algebra
- Exponential Decay
- Worksheets By Grade
Many people find frequency tables, crosstabs, and other forms of numerical statistical results intimidating. The same information can usually be presented in graphical form, which makes it easier to understand and less intimidating. Graphs tell a story with visuals rather than in words or numbers and can help readers understand the substance of the findings rather than the technical details behind the numbers.
There are numerous graphing options when it comes to presenting data. Here we will take a look at the most popularly used: pie charts , bar graphs , statistical maps, histograms, and frequency polygons.
A pie chart is a graph that shows the differences in frequencies or percentages among categories of a nominal or ordinal variable. The categories are displayed as segments of a circle whose pieces add up to 100 percent of the total frequencies.
Pie charts are a great way to graphically show a frequency distribution. In a pie chart, the frequency or percentage is represented both visually and numerically, so it is typically quick for readers to understand the data and what the researcher is conveying.
Like a pie chart, a bar graph is also a way to visually show the differences in frequencies or percentages among categories of a nominal or ordinal variable. In a bar graph, however, the categories are displayed as rectangles of equal width with their height proportional to the frequency of percentage of the category.
Unlike pie charts, bar graphs are very useful for comparing categories of a variable among different groups. For example, we can compare marital status among U.S. adults by gender. This graph would, thus, have two bars for each category of marital status: one for males and one for females. The pie chart does not allow you to include more than one group. You would have to create two separate pie charts, one for females and one for males.
Statistical Maps
Statistical maps are a way to display the geographic distribution of data. For example, let’s say we are studying the geographic distribution of the elderly persons in the United States. A statistical map would be a great way to visually display our data. On our map, each category is represented by a different color or shade and the states are then shaded depending on their classification into the different categories.
In our example of the elderly in the United States, let’s say we had four categories, each with its own color: Less than 10 percent (red), 10 to 11.9 percent (yellow), 12 to 13.9 percent (blue), and 14 percent or more (green). If 12.2 percent of Arizona’s population is over 65 years old, Arizona would be shaded blue on our map. Likewise, if Florida’s has 15 percent of its population aged 65 and older, it would be shaded green on the map.
Maps can display geographical data on the level of cities, counties, city blocks, census tracts, countries, states, or other units. This choice depends on the researcher’s topic and the questions they are exploring.
A histogram is used to show the differences in frequencies or percentages among categories of an interval-ratio variable. The categories are displayed as bars, with the width of the bar proportional to the width of the category and the height proportional to the frequency or percentage of that category. The area that each bar occupies on a histogram tells us the proportion of the population that falls into a given interval. A histogram looks very similar to a bar chart, however, in a histogram, the bars are touching and may not be of equal width. In a bar chart, the space between the bars indicates that the categories are separate.
Whether a researcher creates a bar chart or a histogram depends on the type of data he or she is using. Typically, bar charts are created with qualitative data (nominal or ordinal variables) while histograms are created with quantitative data (interval-ratio variables).
Frequency Polygons
A frequency polygon is a graph showing the differences in frequencies or percentages among categories of an interval-ratio variable. Points representing the frequencies of each category are placed above the midpoint of the category and are joined by a straight line. A frequency polygon is similar to a histogram, however, instead of bars, a point is used to show the frequency and all the points are then connected with a line.
Distortions in Graphs
When a graph is distorted, it can quickly deceive the reader into thinking something other than what the data really says. There are several ways that graphs can be distorted.
Probably the most common way that graphs get distorted is when the distance along the vertical or horizontal axis is altered in relation to the other axis. Axes can be stretched or shrunk to create any desired result. For example, if you were to shrink the horizontal axis (X axis), it could make the slope of your line graph appear steeper than it actually is, giving the impression that the results are more dramatic than they are. Likewise, if you expanded the horizontal axis while keeping the vertical axis (Y axis) the same, the slope of the line graph would be more gradual, making the results appear less significant than they really are.
When creating and editing graphs, it is important to make sure the graphs do not get distorted. Oftentimes, it can happen by accident when editing the range of numbers in an axis, for example. Therefore it is important to pay attention to how the data comes across in the graphs and make sure the results are being presented accurately and appropriately, so as to not deceive the readers.
Resources and Further Reading
- Frankfort-Nachmias, Chava, and Anna Leon-Guerrero. Social Statistics for a Diverse Society . SAGE, 2018.
- How Bar Graphs Are Used to Display Data
- Understanding Statistics
- 7 Graphs Commonly Used in Statistics
- A Introduction to Sociology Statistics
- What Are Pie Charts and Why Are They Useful?
- What Is a Histogram?
- Tallies and Counts in Statistics
- How to Make a Boxplot
- What Are Time Series Graphs?
- What Is a Two-Way Table of Categorical Variables?
- What Is a Scatterplot?
- Relative Frequency Histograms
- What Is a Table of Random Digits in Statistics?
- The Difference Between Descriptive and Inferential Statistics
- Histogram Classes
- Convenience Sample Definition and Examples in Statistics
User Preferences
Content preview.
Arcu felis bibendum ut tristique et egestas quis:
- Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
- Duis aute irure dolor in reprehenderit in voluptate
- Excepteur sint occaecat cupidatat non proident
Keyboard Shortcuts
2.1.1.2 - visual representations.
Frequency tables, pie charts, and bar charts can all be used to display data concerning one categorical (i.e., nominal- or ordinal-level) variable. Below are descriptions for each along with some examples. At the end of this lesson you will learn how to construct each of these using Minitab.
Frequency Tables Section
A frequency table contains the counts of how often each value occurs in the dataset. Some statistical software, such as Minitab, will use the term tally to describe a frequency table. Frequency tables are most commonly used with nominal- and ordinal-level variables, though they may also be used with interval- or ratio-level variables if there are a limited number of possible outcomes.
In addition to containing counts, some frequency tables may also include the percent of the dataset that falls into each category, and some may include cumulative values. A cumulative count is the number of cases in that category and all previous categories. A cumulative percent is the percent in that category and all previous categories. Cumulative counts and cumulative percentages should only be presented when the data are at least ordinal-level.
The first example is a frequency table displaying the counts and percentages for Penn State undergraduate student enrollment by campus. Because this is a nominal-level variable, cumulative values were not included.
Penn State Fall 2019 Undergraduate Enrollments
The next example is a frequency table for an ordinal-level variable: class standing. Because ordinal-level variables have a meaningful order, we sometimes want to look at the cumulative counts or cumulative percents, which tell us the number or percent of cases at or below that level.
As an example, let's interpret the values in the "Sophomore" row. There are 22 sophomore students in this sample. There are 27 students who are sophomore or below (i.e., first-year or sophomore). In terms of percentages, 34.4% of students are sophomores and 42.2% of students are sophomores or below.
Pie Charts Section
A pie chart displays data concerning one categorical variable by partitioning a circle into "slices" that represent the proportion in each category. When constructing a pie chart, pay special attention to the colors being used to ensure that it is accessible to individuals with different types of colorblindness.
- University Park (48.5%)
- Commonwealth Campuses (34.9%)
- PA College of Technology (6.5%)
- World Campus (10.1%)
Bar Charts Section
A bar chart is a graph that can be used to display data concerning one nominal- or ordinal-level variable. The bars, which may be vertical or horizontal, symbolize the number of cases in each category. Note that the bars on a bar chart are separated by spaces; this communicates that this a categorical variable.
The first example below is a bar chart with vertical bars. The second example is a bar chart with horizontal bars. Both examples are displaying the same data. On both charts, the size of the bar represents the number of cases in that category.
Penn State Fall 2019 Undergraduate Enrollments
Considerations Section
Pie charts tend to work best when there are only a few categories. If a variable has many categories, a pie chart may be difficult to read. In those cases, a frequency table or bar chart may be more appropriate. Each visual display has its own strengths and weaknesses. When first starting out, you may need to make a few different types of displays to determine which most clearly communicates your data.
Nominal Data
Nominal data is a type of categorical data that is qualitative in nature. Labels and tags that do not possess any numerical properties are used to classify nominal data. Grouping of nominal data is done with the help of a nominal variable and there is no intrinsic ordering within these groups.
Nominal data can be analyzed using non-parametric statistical tests such as the Chi-Squared test and Cochran Q's test. In this article, we will learn more about nominal data, its analysis, characteristics and see various associated examples.
What is Nominal Data?
Nominal data and ordinal data are the two types of categorical data . Categorical data is qualitative in nature as logical and arithmetic operations cannot be performed on such data. Non-parametric statistics is used in the analysis of ordinal and nominal data as they are categorical in nature. Other types of data include ratio and interval data that are quantitative in nature.
Nominal Data Defintion
Nominal data can be defined as a type of data that can be divided into mutually exclusive groups that do not overlap using labels and tags. These tags could be numerical in nature but do not possess any quantitative properties. Furthermore, nominal data cannot be ranked or ordered.
Nominal Data Characteristics
The most important characteristics of nominal data are given as follows:
- Nominal data is qualitative in nature.
- Groups of nominal data are mutually exclusive and these categories do not overlap with each other.
- Descriptive tags and labels are used to categorize nominal data.
- Nominal data is not quantitative in nature thus, arithmetic and logical operations cannot be performed.
- Nominal data has a mode but does not have a mean or median.
- A definite order cannot be assigned to nominal data. In other words, such data cannot be ranked or data .
- Graphs and charts are used to visualize nominal data.
Nominal Data Analysis
Nominal data cannot be analyzed using parametric statistics as it does not possess any quantitative property. One way of analyzing nominal data is by dividing it into different categories using nominal variables. The mode, frequency, and percentage can be calculated for such groups and the results can be displayed in the form of graphs.
Another way of analyzing nominal data is by using certain hypothesis testing . Tests such as the Chi-squared test, Cochran's Q test, Fisher's Exact test, and McNemar test can be used to make inferences about the population data.
Nominal Data Graph
A bar graph and a pie chart are the most common ways of representing nominal data. Suppose a survey was conducted. Participants were required to choose which fruits they liked among apples, oranges, and bananas. The frequency distribution table for this nominal data is given as follows:
The bar graph and pie chart for this nominal data can be given as follows:
Nominal Data Examples
Nominal data can be expressed in words or numbers however, they cannot be ordered and they do not have any numerical properties. Given below are a few examples of nominal data.
- Hair Colour - Black, brown, blonde, red, silver.
- Pin code - 482001, 400056, 49375
- Test Status - Pass, Fail
- Movie Genres - Comedy, musical, horror, drama, satire
Nominal Data vs Ordinal Data
Categorical data can be divided into both nominal data and ordinal data. The table given below lists the difference between nominal data and ordinal data.
Related Articles:
- Summary Statistics
- Probability and Statistics
- Mean Median Mode
Important Notes on Nominal Data
- Nominal data is a type of categorical data that does not possess any intrinsic ordering.
- Bar graphs and pie charts can be used to represent nominal data.
- Nominal data can be analyzed using non-parametric statistical tests such as the Chi-squared test.
Examples on Nominal Data
Example 1: Is the following survey an example of ordinal or nominal data?
Q.1 What is your gender? a) female, b) male, c) prefer not to specify
Q.2 What is your favorite movie genre? a) horror, b) romance, c) comedy
Solution: Both questions are examples of nominal data. This is because it is qualitative in nature and cannot be ordered.
Answer: Nominal Data
Example 2: On a scale of 1 to 5 rate your experience at XYZ restaurant. Is this nominal data or ordinal data?
Solution: As the experience can be rated or ordered thus, this is an example of ordinal data.
Answer: Ordinal Data
Example 3: Use a pie chart to represent the following nominal data.
go to slide go to slide go to slide
Book a Free Trial Class
FAQs on Nominal Data
What is nominal data in statistics.
Nominal data in statistics can be defined as categorical data that is qualitative in nature and cannot be ordered or ranked. Thus, arithmetic and logical operations cannot be used on nominal data.
What are the Characteristics of Nominal Data?
The characteristics of nominal data are as follows:
- Nominal data groups are mutually exclusive.
- It is qualitative in nature.
- Nominal data can have a mode .
- Labels and tags are used for nominal data.
How to Represent Nominal Data?
Nominal data can be represented using bar graphs and pie charts. Frequency and percentage distribution tables can also be used to show nominal data.
What Hypothesis Tests can be Conducted on Nominal Data?
There are 4 types of hypothesis tests that can be used on nominal data. These are the Chi-squared test, Cochran's Q test, Fisher's Exact test, and McNemar test.
Can You Find the Mean of Nominal Data?
The mean of nominal data cannot be determined. This is because nominal data is not quantitative in nature and statistical computations cannot be performed on it.
How to Analyze Nominal Data?
Nominal data can be analyzed by grouping the data. The frequency and percentage can be calculated for such groups which can further be represented using graphs.
What is the Difference Between Nominal Data and Ordinal Data?
Nominal data and ordinal data are both types of categorical data. The former cannot be ranked while the latter can be intrinsically ordered.
Data Analysis for Leadership & Public Affairs:
Chapter 3 visualizing data.
One of the first tasks of data analysis should be to look at our data. Certainly, you should open up the dataset and eyeball it to make sure everything looks okay, that the file isn’t mysteriously corrupted, etc., but that is not what I mean when I say “look.” Nor am I talking about data visualization in the sense of the masterpieces of stalwarts such as Alberto Cairo , Mona Chalabi , Andy Kirk , Robert Kossara , Giorgia Lupi , David McCandless , Cole Nussbaumer Knaflic , Randy Olson , Lisa Charlotte Rost , John Schwabish , Nathan Yau , Stephanie Evergreen , Martin Wattenberg , Fernanda Viégas , and many others. We are not going to create visuals that rival these artists’ stunning works. Rather, our goal will be simpler, to use appropriate tables and graphics when analyzing and describing our data, both so that we understand what the data are telling us and can communicate the resulting story to our audience. To accomplish these tasks we will obey simple rules that help us make wise decisions.
3.1 Visualizing Nominal/Ordinal Data
Nominal -> no hierarchy to the levels, values of the categorical variable (A = B = C = …)
Ordinal -> some hierarchy to the levels, values of the categorical variable (A > B > C > …)
3.1.1 Bar-charts and Frequency Tables
If a variable is measured in nominal or ordinal terms either (i) a bar-chart or (ii) a frequency table are very effective displays of how the variable is distributed: What value seems to be most common? Are high values as likely as low values? You can see both these visuals below, drawn from data about the relative abundance of different species. First, the frequency table.
FIGURE 3.1: Factors associated with human-killing tigers in Chitwan National Park, Nepal
Notice how the frequency table is constructed. You have each human activity listed, followed by the frequency (the number of times someone was killed by a tiger while engaging in each activity), then this column with this frequency converted into a proportion , and finally the proportion converted into a percentage . None of this should be a mystery to you:
\[proportion = \dfrac{frequency}{Total}\] \[percentage = \left(\dfrac{frequency}{Total}\right) \times 100 = proportion \times 100\]
The last row in the table shows you the total frequency (we have a total of 88 humans killed), the total proportion (which must sum to \(1\) ), and the total percentage (which should sum to \(100\) ). I love frequency tables such as these because they show all the data and the story; most kills occurred while the individuals were cutting grass for cattle fodder, followed by while they were out gathering other forest products, and least of all while they were going to the toilet 4 . We could improve upon this table and bar-chart by organizing it in such a way that the most dangerous activity is listed/plotted first. This would have the added benefit of quickly drawing the reader’s/viewer’s eyes to the most dangerous activity.
What if the variable was an ordinal level variable, say something like an individual’s frequency of praying?
FIGURE 3.2: Frequency of Prayer
We would be facing the same options, a bar-chart or a frequency table. However, we would have to be cautious here in making sure both the table and the bar-chart categories are logically assigned. That is, notice the Response Category ; we start with the option “Never” and each category that follows corresponds to a category that reflects praying more often than the preceding category. This order would have to be maintained so that we would be unable to arrange the table or bar-chart in ascending/descending order of the “Frequency” column. If we made that mistake we would be destroying the natural order that exists in the ordinal variable we have before us.
You might be wondering, what about pie charts ? Why not use those with nominal/ordinal data? There are two camps, those who hate them and those who think they may be useful. If you are interested, read What do you mean I’m not supposed to use Pie Charts?! but I for one do not use them since I find them less useful than bar-charts.
3.1.2 Contingency Tables and Bar-charts
Frequency tables and bar-charts are also useful when you have two nominal/ordinal variables to work with and are interested in exploring difference between the two or more groups reflected in one variable versus whatever is being measured by the second variable. Let us use a specific example, one where we ask if religiosity differs between Liberals, Moderates, and Conservatives. First the table.
If you look at Table 3.3, you see the frequencies reported in what we call a contingency table or a crosstabulation – where the distributions of two nominal/ordinal variables are jointly displayed. Reading such a table should be simple. In brief, You see how many Liberals said they prayed “Never,” “Once a week or less,” “A few times a week,” “Once a day,” or “Several times a day.” What pattern is evident from this table? Of the 447 conservatives we see most of them saying they pray several times a day, followed by once a day, and the fewest saying they never pray. The pattern is similar for liberals and moderates, although the differences between the numbers responding “Never” and “Several times a day” is small for liberals than it is for conservatives.
The story could be helped a great deal if we calculated percentages for these frequencies. We have two choices when calculating these percentages, we could calculate these as row percentages where we ask “what percent of those who said Never were Liberal, Moderate, and Conservative, respectively?” We could then repeat this for the other categories of religiosity. The result is shown in Table 3.4. This table shows quite clearly that 51.16% of those who say they pray several times a day tend to be Conservative. Likewise, most (43.06%) of those who say they never pray tend to be Liberal.
If we calculated column percentages we would be able to answer such questions as: “What percent of Liberals said they never pray, pray once a week or less, a few times a week, once a day, several times a day?” The same could then be asked of Moderates and Conservatives, respectively. If we used the column percentages shown in Table 3.5 it would be obvious that Moderates and Conservatives tend to be more religious than Liberals. The essential takeaway here is that how you calculate the percentages (row versus column) depends upon what story you want to highlight, the question you want to ask and answer.
Graphing these tables is easily done as well, and very effective; see Figure 3.3.
FIGURE 3.3: Political Ideology and Religiosity
Categorical variables (one or two) -> bar-chart
If you have cross-tabulations, choose between stacked versus dodged bar-charts
3.2 Visualizing Interval/Ratio Data
We have several visuals we could draw with numeric data that are either interval or ratio levels of measurement. Let us see these first before we look at the one frequency table that could be used with interval/ratio data.
3.2.1 The Histogram
A histogram is used with a single numeric variable and looks like a bar-chart except there are no gaps between consecutive bars unless there are missing data. The example that follows use a popular dataset known as hsb2 , which contains information about 200 randomly selected students from a national survey of high school seniors called the High School and Beyond survey. The variables in this dataset include:
- id = student id
- female = (0/1)
- race = ethnicity (1=hispanic 2=asian 3=african-amer 4=white)
- ses = (1=low 2=middle 3=high)
- schtyp = type of school (1=public 2=private)
- prog = type of program (1=general 2=academic 3=vocational)
- read = standardized reading score
- write = standardized writing score
- math = standardized math score
- science = standardized science score
- socst = standardized social studies score
I’ll use read (Reading scores on a standardized test) to plot a histogram. Notice a few things about the histogram in Figure 3.4. The height of the bars, representing how often a particular score occurs, varies a great deal. A few students have done poorly while a few have done very well, but the rest are distributed over the middle range of test scores.
FIGURE 3.4: Histogram of Reading Scores
We could farther break this histogram apart by asking if male and female students ( female ), private versus public students ( schtyp ), or students of different races/ethnicities ( race ) perform differently on the reading test.
FIGURE 3.5: Histogram of Reading Scores by Various Groups
These plots are hard to read for a number of reasons. First, with just \(n = 200\) students in all we have more students in some groups than in others and as a result the histograms look very thin for the groups with fewer data points, making it difficult to tease out any patterns. Second, within any group there are too many different test scores so we don’t see a clear pattern at all. To fix these problems we construct groups of scores, turning what is a numeric variable into an ordinal variable. Let us build this by first creating a grouped frequency table and then plotting this table as a histogram.
Histogram’s bins must be chosen with some care
3.2.2 Grouped Frequency Tables
Because numeric variables take on too many values it is often easier to see their distribution by grouping the numeric values. For example, we could start by seeing what are the lowest and highest reading scores. These turn out to be 28 and 76, respectively. We can build the groups as follows:
- Calculate difference between maximum and minimum values, which turns out to be \(76 - 28 = 48\)
- Decide how many groups we want. Good practice suggests no fewer than 4/5 and no more than 6/7. Say we go with 5 groups.
- Divide the gap between the maximum and minimum values by the desired number of groups: \(= \dfrac{48}{5} = 9.6\) and round up to the nearest whole number \(= 10\) . This tells us how wide each group should be.
- The groups could thus be \(28-38, 38-48, 48-58, 58-68, 68-78\) .
Notice that these groups span, start to finish, all the values of reading scores. But we’ll have to decide which group to include 38, 48, 58, and 68 in. Should 38 go in 28-38 or in 38-48? The choice doesn’t matter so long as we are consistent. Let us choose a rule that says include 38 in 38-48, 48 in 48-58, 58 in 58-68, and 68 in 68-78. Using this rule we now find each reading score and drop it into its group. Then we calculate how many scores fall in each group, creating our Frequency column.
Now it is easy to see that the largest number of students (68) appear to fall in the 38-48 group of reading scores while the smallest frequency (14) occurs in the 28-38 group. Let us add to this table a percentage column.
The percentages make it even easier to see how the distribution breaks down; 34 percent of the students have scores in the 38-48 range while only 10 percent scored in the highest bracket (68-78). We can also break this down by the three groups we used earlier.
FIGURE 3.6: Histogram of Grouped Reading Scores by Specific Groups
Histograms were once quite popular but there is a better visual for looking at our numeric variables, one called the box-plot that we’ll see in the next chapter. I am not a huge fan of histograms because grouping decisions influence the story being told. My advice would be to use grouped frequency tables instead of histograms to present a summary overview of your numeric data.
Histograms of grouped frequencies of numerical variables can be useful for summary depictions of the distribution
3.2.3 Scatterplots
With two numeric variables, a scatterplot comes in very handy if we want to explore how one variable might be related to another. For example, we may want to ask whether students who score high on the reading test also tend to score high on the mathematics test. This could be visually explored via a scatter-plot, as shown below. The goal should be to look for a pattern: Does one variable increase as the other increases or does one variable decrease as the other increases? Or does there seem to be no relationship at all?
FIGURE 3.7: Scatterplot of Reading Scores and Mathematics Scores
Quite clearly, students who do well in Reading also tend to do well in Mathematics. You can see this by virtue of the upward, left-to-right tilt of the cloud of points.
What about Science scores and Mathematics scores? Is there any relationship between doing well in Science and doing well in Mathematics?
FIGURE 3.8: Scatterplot of Science Scores and Mathematics Scores
Of course, just like everything else we could break this down by any nominal/ordinal variable. The visual below shows you the breakouts by the student’s sex.
FIGURE 3.9: Scatterplot of Science Scores and Mathematics Scores, by Sex
Work best with two numerical variables since they show the pattern of association between the two variables
3.2.4 Line Graphs
If we have time-series data, such as the Presidential Approval data we saw earlier, then a line graph works well because it shows you how the outcome/phenomenon varies over time. Let us take another example, median household incomes. Say I am curious about trends in median household incomes in Ohio, Pennsylvania, and West Virginia.
FIGURE 3.10: Trends in Real Median Household Incomes in OH, PA, and WV
These plots don’t work well just for financial or election data, they are ideally suited for any phenomenon that is measured over time. For example, the size of the immigrant population over the years, and even the number of lynx pelts reported in Canada per year from 1752 to 1819.
FIGURE 3.11: Number of Legal Permanent Residents in the USA
FIGURE 3.12: Number of lynx pelts reported in Canada per year from 1752 to 1819
Line graphs are the default choice for showing trends – patterns over time
3.2.5 Polar Charts
These charts are helpful for visualizing data that might have otherwise been explored via a bar-chart. For example, say you want to look at the miles per gallon given by a number of different automobiles. We could use a bar-chart, as shown below:
FIGURE 3.13: Miles per Gallon
Quite obviously, the Toyota Corolla has the best fuel economy, followed by the Fiat 128, while the Cadillac Fleetwood is tied with the Lincoln Continental for the worst fuel economy. A polar chart would present the same information as seen in the bar-chart, albeit in a more aesthetically pleasing manner.
FIGURE 3.14: Polar Chart of Miles per Gallon
3.3 Some Essential Rules for Good Visualizations
There are several rules, some favored by this expert or that, but regardless of the source of the rule, its merit is not in doubt. So here are the ones I try to follow (although I do break these at times, some times by design and other times unintentionally). It all starts of course with Edward Tufte’s maxims : show the data, tell the truth, help the viewer think about the information rather than the design, encourage the eye to compare the data, make large data sets coherent.
A more specific listing of the rules I try to keep in mind follows:
- Do not include anything in your visualization that is not very informative. Give titles and subtitles where needed and make these stand on their own. At the same time, do not clutter your charts and tables with too much information otherwise the reader gets lost.
- Use colors wisely. Is this visual to be printed on a color printer? Is it part of a presentation in a poorly lit room? Choose bright colors that stand out from each each other and yet are visible on a printed page or on the projection screen in the room. Remember, a fair share of men seeing the visual could be color-blind so use color palettes designed for color-blind individuals. If this is news to you, read How a dog sees a rainbow, and 12 other images that explain how we see color . Pay attention to what is being plotted: Use sequential colors for ordered data values (that go from low to high or vice-versa); qualitative colors nominal data, and; diverging colors if the goal is to put equal emphasis on mid-range critical values and extremes at both ends of the data range but emphasize the middle with light colors and the low/high extremes with dark colors that have contrasting hues.
- The visualization is not your personal Mona Lisa; it must be effective for the target audience and help you tell the story. Do not get carried away in creating it.
- Use a table if a table would be more effective than a graph but remember, while tables are useful for audiences that will want to see more (not less of the actual data you have) graphs are favored by those who want a quick take-away.
- Always have your y-axis (or x-axis if relevant) starting at zero. If you don’t do this you can misrepresent the data. If you are forced to truncate the axis, point this out to the reader so that they can interpret dips and swells in line charts or differences between heights of frequency bars with care.
- Combine multiple graphs/tables into a single figure if you can so long as it does not lead to information overload. Go back and look at the scatterplots and line charts and try to visualize what these might look like if the y-axis had been forced to start at \(0\) ; they would certainly look different. For illustration purposes I have let the plotting software pick the starting coordinates.
- For non-technical audiences you should round up all percentages/proportions to no more than one decimal place and ideally to no decimal place unless doing so distorts the picture. For technical audiences, stay with two or four decimal places.
- Above all, start with pencil and paper and consider all alternate visualizations possible by drawing a rough sketch of what the finished product should look like.
- Consider using a choropleth map if geographic variation is one of the key narratives.
- Be prepared to alter your visual if it does not work; we are all hesitant to delete a page or a graphic and start from scratch but this locks you into something that isn’t working to begin with and you will be stuck in a rut.
- Avoid jargon like the plague. We often think we sound intelligent when we use jargon but this distracts from the oral/written presentation. Think of your audience and write and present in a manner that will resonate with them.
- Show as much data as you can; it vastly improves the effectiveness of the visual narration.
3.4 Chapter 3 Practice Problems
Download the monthly Great Lakes water level dataset SPSS format from here and Excel format from here . Construct an appropriate chart to display the data for Lake Superior. Be sure to label the x- and y-axis, and to title the chart. Note that water level is in meters.
Download the number of births per 10,000 of 23 year old women, U.S., 1917-1975 SPSS format from here and Excel format from here . Construct an appropriate chart to display the trend in the data. Be sure to label the x- and y-axis, and to title the chart.
Download the winning speed (in kilometers per hour) for several men’s track and field distances world meets over the 1900 - 2012 period SPSS format from here and Excel format from here . Construct an appropriate chart to display the speeds for the 100 meter dash. Be sure to label the x- and y-axis, and to title the chart. Note that the data are monthly and replicate the speed from the preceding month if the fastest speed was not eclipsed.
Use this data-set used in Practice Problem 4 in the preceding Chapter, noting these details of each variable . Construct a frequency table for belief in life after death , showing both the frequencies and the relative frequencies (as percentages). Based on the table, what do most people seem to believe? Report the percentage for your answer. Construct an appropriate chart for these data, making sure to label all axis and providing a title.
Construct an appropriate chart that shows the relationship between high school GPA and college GPA. Label both axis and title the chart. What does this plot show? Are the two positively/negatively related? How strong would you guess is the relationship?
Construct a contingency table of vegetarianism against belief in life after death. What percent of vegetarians believe in life after death? What percent of those who believe in life after death are vegetarians? Use an appropriate chart to show the relationship between vegetarianism and belief in life after death. Label everything.
Construct a grouped frequency table with five groups of the variable age . Report both the frequencies and the relative frequencies (as percentages). Plot the relative frequencies using an appropriate chart. Label everything as usual. What is the modal age group?
The next set of questions revolve around the 2016 Boston Marathon race results available here . The dataset contains the following variables:
- Bib = the runner’s bib number
- Name = the runner’s name
- Age = the runner’s age (in years)
- M/F = the runner’s gender (M = Male; F = Female)
- City = the runner’s home city
- State = the runner’s home state
- Country = the runner’s home country
- finishtime = the runner’s time (in seconds) to the finish line
What countries had the largest and second-largest number of runners in the race?
What was the distribution of the runners’ gender? Use a suitable chart to reflect the distribution.
Construct a grouped frequency table of the runners’ age, using the following groupings – 18-25; 25-32; 32-39; 39-46; 46-53; 53-60; 60-67; 67-74; 74-81; 81-88. Also construct a grouped histogram. What is the modal age group?
Draw a scatter-plot of runners’ age and finish times. Does this show any relationship? If it does, what sort of a relationship?
Using a reasonable grouping structure, construct country-specific histograms of finish times for runners from each the following countries – AUS, BRA, CAN, CHN, FRA, GBR, GER, ITA, JPN, MEX, and USA. Are finish times skewed for each country? Is the direction of the skew similar? What country seems to have the least skew? What country seems to have the most skew?
Use this data-set , also used in Practice Problem 5 in the preceding Chapter, to answer the following questions.
Were employees who had a workplace accident more likely to leave than employees who did not have an accident? Briefly explain your conclusion with appropriate charts/tables.
Are low salary employees more likely to leave than medium/high salary employees? Why do you conclude as you do? Briefly explain your reasoning with appropriate charts/tables.
Construct an appropriate chart that shows the distribution of the number of years employees spent in the company. What patterns do you see in this chart?
Download the 2020 County Health Rankings data SPSS format from here , CSV format from here and the accompanying analytic codebook
You should see the following measures ( measurename )
- Adult obesity
- Children in poverty
- High school graduation
- Preventable hospital stays
- Unemployment rate
Looking at pairs of variables in turn, briefly (20 words or less) state whether you would expect a positive/negative/no relationship between the variables in each pair and why you deduce as you do?
Construct scatterplots of all pairs of the five variables.
How do the scatterplots you constructed in the preceding problem stack up against the expectations you listed in Problem 16 ? Which plots surprised you and how? Be brief.
Indoor plumbing is lacking in many still developing countries, particularly in the rural areas. In some of these countries, particularly India, tigers aren’t the most dangerous animals ↩︎
IMAGES
VIDEO