Sections
- Introduction
- Plotting Simple \(xy\) Data Sets
- Interpolation Using Graphs
- Identifying Patterns with Best-Fit Lines and Curves
- Logarithmic and Semi-Logarithmic Graphs
- Reversed Axes: An Astronomy Idiosyncrasy
- Quiz Yourself
- Answers
- Graph templates (templates available for download)
Introduction
The ability to interpret and create graphs and charts is a critical skill in all scientific disciplines as it is in a great many other areas, including reading and understanding newspaper articles. Graphs and charts serve many purposes, perhaps the most important of which are the ability to:
- display a significant amount of information in a single diagram,
- readily visualize patterns in the data,
- estimate the values of data points that lie between displayed or measured values.
This tutorial is designed to refresh and hone your skills in reading and creating various types of graphs that you will encounter in your astronomy course.
Plotting Simple \(xy\) Data Sets
The simplest form of a graph is often referred to as a linear \(xy\)-plot, or more formally as a Cartesian graph. Suppose you are given a set of four data points:
Table 1: Sample \(xy\) data
\(x\) | \(y\) |
---|---|
3.2 | 5.2 |
4.4 | 7.4 |
5.0 | 8.5 |
7.4 | 12.9 |
Simply looking at the table of data doesn’t give you an immediate sense of the relationship between the data. Are the data randomly scattered around the graph, with no apparent correlation? Do the data follow some predictable pattern? If so, what type of pattern? By creating a graph of the data, the relationship becomes immediately apparent; the points are part of a straight line. Traditionally, but certainly not required, the \(x\) axis is the horizontal axis and the \(y\) axis is the vertical axis. The data in Table 1 are displayed in the following graph, together with a straight line that connects each of the data points.
Since the data in the graph above do not get close to the origin [the point \(\small(0,0)\)] it is not necessary to start the axes at the origin, as is typically taught. Instead it is common to start the axes wherever the data make most sense. The same data are represented in the following graph by starting the axes near the first data point.
Interpolation Using Graphs
Once you have created a graph with a defined pattern, such as the straight line in Figures 1 and 2 above, it becomes possible to interpolate between points to determine the appropriate values if they had been represented in the initial table of data. Referring to Figure 3 below, suppose you know that the \(x\) value of a data point is \(x=6.3\), you can determine the \(y\) value by starting on the \(x\) axis at \(6.3\), moving straight up parallel to the \(y\) axis until you reach the line created by the known data points. Once you reach the line you then move horizontally, parallel to the \(x\) axis, until you reach the \(y\) axis. The location on the \(y\) axis is the \(y\) value of the data point, in this case \(y\approx10.9\) (or more precisely, \(y=10.88\)). (The symbol \(\approx\) means “approximately equal to.”) In Table 1 above, a fifth entry in the table could have included \((x=6.3, y=10.88)\).
Of course, the process could have been reversed as well. If it was known that \(y=10.88\), you could have followed the line parallel to the xx axis until you encountered the line connecting the existing data points. You would then follow the line parallel to the \(y\) axis until reaching the \(x\) axis at \(x=6.3\).
If you are a bit rusty in plotting the data points in the first place, Figure 3 can be used to illustrate the process. If you have the data points \(x=6.3\) and \(y=10.88\), you locate the spot on the graph where the mark should be placed by starting on the \(x\) axis and drawing a line parallel to the \(y\) axis. Similarly, you draw a line parallel to the \(x\) axis that passes through the \(y\) axis at \(y=10.88\). Where the two lines you drew intersect is the location of the point on the graph (the intersection of the two dashed red lines in Figure 3).
Identifying Patterns with Best-Fit Lines and Curves
It is often the case that the data from experiments or observations are not nearly as “clean” as represented above. There are likely to be uncertainties, sometimes significant, in the measurements that are made. In such a case a best fit must be made through the available data to determine what the expected relationship between \(x\) and \(y\) should be if all of the measurements were perfect. Unfortunately, this “best fit” is itself an estimate to what is really going on, but it is often the best that we can do. While there are mathematical techniques that can be used to determine the “best fit” line or curve through the data, for our purposes it is okay to “eyeball” the fit by simply trying to draw a line or curve through the data that minimizes the distances between the points and the curve.
As an example of ploting data that are not ideal, consider the atmospheric concentrations of carbon dioxide (CO2) above Mauna Loa in Hawai’i for every January from 1959 through 2008. The data are in the number of carbon dioxide molecules per million atoms and molecules within a volume of atmosphere (ppmv).
Table 2: CO2 atmospheric concentrations above Mauna Loa, Hawi’i in January. [Most recent data available at Global Monitoring Laboratory – Carbon Cycle Greenhouse Gases (noaa.gov).]
Year | Concentration (ppmv) |
---|---|
1959 | 315.62 |
1960 | 316.43 |
1961 | 316.93 |
1962 | 317.94 |
1963 | 318.74 |
1964 | 319.57 |
1965 | 319.44 |
1966 | 320.62 |
1967 | 322.06 |
1968 | 322.57 |
1969 | 324.00 |
1970 | 325.03 |
1971 | 326.17 |
1972 | 326.77 |
1973 | 328.55 |
1974 | 329.35 |
1975 | 330.40 |
1976 | 331.75 |
1977 | 332.93 |
1978 | 334.97 |
1979 | 336.23 |
1980 | 338.01 |
1981 | 339.23 |
1982 | 340.75 |
1983 | 341.37 |
1984 | 343.70 |
1985 | 344.97 |
1986 | 346.30 |
1987 | 348.02 |
1988 | 350.43 |
1989 | 352.76 |
1990 | 353.66 |
1991 | 354.72 |
1992 | 355.98 |
1993 | 356.70 |
1994 | 358.37 |
1995 | 359.97 |
1996 | 362.05 |
1997 | 363.18 |
1998 | 365.33 |
1999 | 368.15 |
2000 | 369.14 |
2001 | 370.28 |
2002 | 372.43 |
2003 | 374.68 |
2004 | 376.79 |
2005 | 378.37 |
2006 | 381.38 |
2007 | 382.45 |
2008 | 385.07 |
Since we will rarely actually use the labels \(x\) and \(y\) to describe our data we will refer to the horizontal and vertical axes by their more general names as the abscissa and ordinate, respectively. In this case, the abscissa corresponds to the years in which each data point was recorded, and the ordinate is the CO2 concentration level. Plotting the entire data set we have:
If we limit our selection of data to the years 1964 to 1966 and draw a “best fit” straight line through the data points, we have the results in Figure 5:
Notice that the “best fit” straight line is tilted up slightly, meaning that the CO2 concentration appears to be increasing slightly over the two year period between January 1964 and January 1966. You should also note that none of the data points actually lie along the “best fit” line. This is okay since we are looking for a line that best represents the data trend in its entirety. You should also note that if we had restricted ourselves even further in the data set in Table 2 to only the years 1964 and 1965, the CO2 concentration would have actually decreased. Those who like to criticize global warming data, of which CO2 is a major contributor, will often focus on a subset of the data, trying to argue that the trend in global warming isn’t real, and therefore there must be a world-wide conspiracy among scientists! Your author feels very comfortable in saying that no such conspiracy exists, and that global warming is a real effect.
Going back to the full data set and attempting to draw a “best fit” straight line through the points results in Figure 6:
Certainly a “best fit” straight line can be drawn through the data and the trend is clear; virtually without exception (1964 and 1965 being the sole exception) the concentration of CO2 has been increasing annually. This trend has continued through at least 2021 and shows no sign of diminishing. In fact, if you look at the straight line drawn through the data, there is another obvious trend. The data in the early years of the data set lie above the line, the data in the middle are below the line, and the data in the later years of the data set again lie above the line. If you were to draw a “best fit” curve through the data, rather than being restricted to a straight line, you would likely draw a curve like the one shown in Figure 7:
The “best fit” curve through the data points is actually a quadratic equation of the form \(\text{CO}_2 \text{ concentration} = at^2+bt+c\) where \(t\) is the year and \(a\), \(b\), and \(c\) are adjustable constants to make the curve fit as well as possible. What the results are telling us is that the CO2 concentration in the atmosphere above Mauna Loa, Hawai’i is rising faster than linear; it is increasing more rapidly every year! This is a part of the alarming pattern scientists have been telling the world about for decades now.
Logarithmic and Semi-Logarithmic Graphs
In the last three sections we have been exclusively reviewing graphs with linear axes, meaning that every major or minor tick mark represented an equal increase over the previous major or minor tick mark. In Figures 1 through 3 above, the spacing between major tick mark was always \(1\) along both the \(x\) and \(y\) axes although other even spacings may also be appropriate as was the case in Figures 4 through 7. Certainly a linear axis is the most commonly-encountered form of graphical axis, but it is not the only possibility. It is fairly common in the physical sciences and engineering to encounter logarithmic axes as well. In essence, instead of spacing the tick marks out evenly in equal intervals, such as \(1\), \(2\), \(3\), etc., the tick marks are spaced out evenly in powers of \(10\), such as \(10^{−2}\), \(10^{−1}\), \(10^0\), \(10^1\), \(10^2\), etc. A graph with two logarithmic axes is a logarithmic graph (also known as a “log-log” plot), while a graph with one linear axis and one logarithmic axis is a “semi-logarithmic” (or “semi-log”) plot.
There are two major reasons for creating axes that have evenly spaced powers of \(10\), rather than simple, linear axes:
- logarithmic axes can more easily display data that are spread out over very large value ranges, and
- data sets that would otherwise show up as being curved, can end up being displayed as straight lines instead.
Regarding the first reason, suppose you have a series of lengths that range from \(0.000\,001\) to \(\text{1,000,000}\). If you used a traditional linear axis to display the data, any of the data points down near \(0.000\,001\) would be impossible to distinguish from \(0\) on the graph. In fact, even data values as large \(1000\) would be tucked into the left-most \(0.1\%\) of the axis. Obviously it would be impossible to see anything of significance with data that are so badly bunched together and so close to \(0\). If the powers of \(10\) are evenly spaced out instead, as \(10^{−6}\), \(10^{−5}\), \(\ldots\), \(10^5\), \(10^6\) then the data points would be much more spread out. The value of \(1000\) would actually be two-thirds of the way along the axes from \(10^{−6}\) to \(10^6\).
To help you see how the logarithms of numbers behave, consider the following table:
Table 3: Some numbers and their base-\(10\) logarithms.
Number | Logarithm |
---|---|
\(10^{−3}=0.001\) | −3.000 |
\(10^{−2}=0.01\) | −2.000 |
\(20^{−1}=0.05\) | −1.301 |
\(10^{−1}=0.1\) | −1.000 |
\(2^{−1}=0.5\) | −0.301 |
\(10^0\)=1\) | 0.000 |
\(2\) | 0.301 |
\(3\) | 0.477 |
\(4\) | 0.602 |
\(5\) | 0.699 |
\(6\) | 0.778 |
\(7\) | 0.845 |
\(8\) | 0.903 |
\(9\) | 0.954 |
\(10^1=10\) | 1.000 |
\(20\) | 1.301 |
\(10^2=100\) | 2.000 |
\(200\) | 2.301 |
\(10^3=1000\) | 3.000 |
Plotting the data in the table gives:
Notice all of the data points squeezed together between \(0.001\) and \(1\) along the abscissa. If your data were plotted using the numbers are along the abscissa you would never be able to tell the difference between the small numbers. However, if the data are “translated” to correspond to the numbers along the ordinate (the abscissa numbers are replaced by their corresponding ordinate numbers) then the data would not range from barely greater than \(0\) to \(1000\), the data would range from \(−3\) to \(3\) instead, with the data more spread out. (Note the spread in the dots vertically even though they are essentially right on top of one another horizontally near the ordinate axis.)
(For the following discussion it may be helpful to review exponents if necessary.)
As for the second reason given above, if you are familiar with the behavior of logarithms in mathematics, recall that if you take the logarithm of a quantity with an exponent, such as \(\log\left(a^b\right)\), the exponent comes down to multiply the logarithm of the base number, so that \(\log\left(a^b\right)=b\log(a)\). If the value of \(a\) happens to be \(10\) and the logarithm happens to be base-\(10\), then you have \(\log_{10}\left(10^b\right)=b\log_{10}(10)=b\). The last step follows because \(\log_{10}(10)=\log_{10}(10^1)=1\log_{10}(10)=1\) by definition. You see this repeatedly in Table 3 above: for example, \(\log_{10}10^{−3}=−3\log_{10}10=−3\) and \(\log_{10}10^2=2\log_{10}10=2\). [An aside: If \(a\) is the base number, then \(\log_a a = 1\) for any value of \(a\). For example, for the famous number \(\pi\) (pi), \(\log_\pi \pi = 1\).]
Now suppose that you have data that depend on a power of the number in the abscissa, rather than just the abscissa itself, such as \(y=x^2\). If you take the logarithm of both sides you find that \(\log(y)=2\log(x)\). This is actually a straight-line equation if \log(y)\) is plotted on the ordinate and \(log(x)\) is plotted along the abscissa. The slope of the straight line would be \(2\) on the graph (the slope is often referred to as “the rise over the run” of the straight line).
If you don’t have a deep understanding of logarithms that is fine. You won’t need to manipulate logarithms in the course, you simply need to understand that they return an exponent. Since we will be using base-\(10\) logarithms exclusively just know that any positive number can be written as \(10^b\) for some value of \(b\) and when you take the base-\(10\) logarithm of that number, the result will be the associated exponent of \(10\), namely \(b\). As one example, looking at Table 3, note that \(\log_{10}200=2.301\). This means that \(10^{2.301}=200\); check it out with your calculator.
Note that the base-\(10\) logarithm key on your calculator may look like \(\log\), or perhaps \(\log_{10}\), or \(\log10\). If you’re not sure, be sure top ask your instructor or someone who is familiar with calculating logarithms using a calculator similar to yours.
As an example of using logarithms in graphing, consider what happens with the data for the orbits of planets and dwarf planets around our Sun. There is a well-defined relationship between average distance of the planet or dwarf planet from the Sun \((a)\) and the orbital period \((P)\). We will write the orbital period in units of Earth years and the average distance from the Sun in units of the average distance of Earth from the Sun [known as an astronomical unit (au)]
Table 4: Orbital data for our Solar System’s planets and two dwarf planets.
Planet | \(a\) (au) | \(P\) (years) | \(\log_{10}a\) | \(\log_{10}P\) |
---|---|---|---|---|
Mercury | \(0.3871\) | \(0.2408\) | \(−0.4122\) | \(−0.6183\) |
Venus | \(0.7233\) | \(0.6152\) | \(−0.1407\) | \(−0.2110\) |
Earth | \(1.0000\) | \(1.0000\) | \(0.0000\) | \(0.0000\) |
Mars | \(1.5236\) | \(1.8808\) | \(0.1829\) | \(0.2743\) |
Ceres | \(2.7675\) | \(4.6040\) | \(0.4421\) | \(0.663\) |
Jupiter | \(5.2044\) | \(11.8618\) | \(0.7164\) | \(1.0742\) |
Saturn | \(9.5826\) | \(29.4567\) | \(0.9815\) | \(1.4692\) |
Uranus | \(19.2012\) | \(84.0107\) | \(1.2833\) | \(1.9243\) |
Neptune | \(30.0476\) | \(164.79\) | \(1.4778\) | \(2.2169\) |
Pluto | \(39.4817\) | \(247.68\) | \(1.5964\) | \(2.3939\) |
Plotting the the average distances from the Sun as the abscissa values with as the orbital periods ordinate values, a nice smooth curve results as shown in Figure 9. A well-defined curve like this clearly indicates that the two quantities are strongly correlated. What is less obvious is just how they are correlated. In science it is critically important to be able to quantify relationships, rather than simply say that some relationship exists. By “quantifying a relationship” we mean that we should be able to develop an equation that describes the relationship. Such an equation allows us to find the average distance from the Sun for any given orbital period or visa versa.
On the other hand, if we plot the base-\(10\) logarithms of both the average distance from the Sun and the orbital period (columns 4 and 5 in Table 4), rather than getting an arcing curve we get a straight line as shown in Figure 10. As was discussed before Table 4, a straight line in a graph using logarithms on both axes means that \(P\) and aa must be related by a power law of the form, \(P=a^b\), where \(b\) is the slope (the “rise over the run”) of the \log_{10}P\) versus \(\log_{10}a\) graph (Figure 10). If you were to calculate the slope from the graph you would find a value of \(1.5\) or \(3/2\), which means that the equation relating \(P\) and \(a\) is \(P=a^{3/2}\).
One final method for plotting the orbital data given in Table 4 is to use logarithmic axes directly rather than calculating logarithms and then plotting the data on linear axes. To see what is meant by this take a look at Figure 11 below. The numbers along the axes are the actual orbital periods and average distances from the Sun, but the powers of \(10\) are evenly spaced along the axes. (Note that the powers of \(10\) are represented in “computer speak” rather than standard scientific notation; for example \(1\times10^2\) is represented by \(1.0\text{E}+2\). If this is confusing, check out the scientific notation tutorial.)
A careful inspection of the tick marks on both the horizontal and vertical axes shows that those tick marks are not evenly spaced, but they get closer and closer together as the next power of \(10\) is approached. The major gridlines at each power of \(10\) are actually \(1\) times that power of ten such as \(1\times10^2\). The next tick mark is \(2\) times that power of ten, such as \(2\times10^2\), then \(3\times10^2\), and so on up to \(9\times10^2\). After that tick mark, the next power of ten is reached, \(1\times10^3\), and the nonlinear tick-mark pattern repeats again. If you look carefully at Table 3 you can see that progressively closer and closer spacing in the logarithms of \(1\) through \(9\); \(0\), \(0.301\), \(0.477\), \(0.602\), \(0.699\), \(0.778\), \(0.845\), \(0.903\), \(0.954\), and finally \(1.000\). When these values are located on the logarithmic axes they correspond to \(2\) being \(30.1\%\) of the way from the last power of ten to the next power of ten, \(3\) being \(47.7\%\) of the way from the last power of ten to the next one, \(4\) being \(60.2\%\) of the way from the last power of ten to the next power of ten and so on, causing the tick marks to bunch up as the next power of ten is approached.
Although these so-called “log-log” plots are not quite as easy to use in terms of getting the exponent for the power law that relates the quantities, log-log plots are very quick in showing that such a power law exists when the plot yields a straight line. You won’t need to do calculations with “log-log” plots, but your instructor may ask you to interpolate data using them. (If you want to get the “slope” of a straight line from a log-log plot you would still need to take the logarithms of the numbers that are read off the axes and use those to compute the “rise over the run.”
Reversed Axes: An Astronomy Idiosyncrasy
Being a science with very old roots, one of the idiosyncrasies of astronomy is that it contains relics in its quantification of some quantities, most notably brightness and stellar temperatures. The origins of these oddities are discussed fully in the text, but one consequence is that these quantities are sometimes plotted on axes in reverse of what you would normally expect.
For example, in a very important diagram about stars, known as the Hertzsprung-Russell diagram (or HR diagram for short) the stars with the hottest surface temperatures are shown on the left side of the diagram, with the coolest stars on the right-hand side. You would normally expect numbers on the abscissa axis to increase from left to right but in the HR diagram temperatures decrease from left to right.
A similar situation occurs with one form of representing the brightnesses of stars. The magnitude scale, which dates back more than two millennia, specifies that the brightest stars (or any celestial object) have the smallest values, often negative. When used in the HR diagram, the brightest stars are at the top of the diagram with their negative values for magnitude, while the dimmest stars are at the bottom of the diagram with their positive values. In other words, when using the magnitude scale on the HR diagram the largest numbers are at the bottom of the ordinate axis with the values decreasing up the axis. Again, this is just the opposite of what one would normally expect.
To make matters even more confusing, when temperatures are used along the horizontal axis of the HR diagram, that axis is specified logarithmically. However, when magnitudes are used on the vertical axis, that axis is linear. Moreover, sometimes brightnesses are specified in more standard units, in which case the numbers increase from bottom to top and the axis is logarithmic! IT IS VERY IMPORTANT TO CHECK THE LABELS ON GRAPH AXES, AND IT IS JUT AS IMPORTANT FOR YOU TO ALWAYS LABEL THE AXES OF GRAPHS THAT YOU CREATE!
When you create your own graphs, always be sure to show the gridlines and tick marks accurately (preferably by using appropriate graph paper or by using a spreadsheet or graphing program) AND clearly identify what is being plotted and the units that are being used in the axis titles.
Quiz Yourself
Answers available at the bottom of the page.
To make sure that you understand how to create and work with graphs, try the problems below.
- Referring to the plot in Figure 2, if \(x=3.9\) for a point of interest, estimate the associated \(y\) value.
- Extrapolation (extending best fit lines and curves beyond known values) can be used very cautiously to forecast future outcomes based on past results. By referring back to Figure 7, extend the best-fit curve until it intersects reaches 400 ppmv over Mauna Loa, Hawi’i? (This actually occurred in May, 2013.)
- Suppose that a new asteroid is discovered orbiting the Sun at an average distance of 15.85 au.
- Using a calculator or spreadsheet program, determine the base-\(10\) logarithm of its average distance from the Sun.
- According to Figure 10, what is the base-\(10\) logarithm of the asteroid’s orbital period?
- What is the asteroid’s orbital period in years?
- Figure 12 shows the HR Diagram for a set of data obtained by the Hipparcos satellite. Each dot represents one star from the data set.
- For the star indicated by the large orange-brown dot at the top of the plot, estimate the star’s absolute magnitude.
- Which statement best describes the star represented by the orange-brown dot?
- The star is unusually hot
- The star is unusually cool
- The star is unusually bright
- The star is unusually dim
- There is nothing particularly unusual about the star
- For the star indicated by the large blue dot near the left-hand-side of the plot, estimate the star’s surface temperature.
- Which statement best describes the star represented by the blue dot?
- The star is unusually hot
- The star is unusually cool
- The star is unusually bright
- The star is unusually dim
- There is nothing particularly unusual about the star
- Create a standard Cartesian (also known as a \(xy\)) graph with the data in Table 5 below, then draw a “best fit” straight line through the points. You can download a blank sheet of graph paper here.
- The orbital distances and periods of the four largest moons of Jupiter are given in Table 6.
- Create a log-log plot similar to Figure 11. You can download a blank sheet of log-log graph paper here that provides two powers of \(10\) on each axis. The horizontal axis should run from \(1\times10^5\) to \(1\times10^7\), with \(1\times10^6\) in the middle of the axis. The vertical axis should run from \(1\) to \(100\) with \(10\) in the middle of the axis.
- Suppose a new moon was discovered to be located at a distance of \(\text{900,000}\) km from Jupiter. By drawing a straight line through the data points in your log-log plot, would the orbital period of the newly discovered moon be closet to
- \(5.5~\text{days}\)
- \(6.5~\text{days}\)
- \(9.7~\text{days}\)
- \(12.8~\text{days}\)
- \(51~\text{days}\)
- \(75~\text{days}\)
Table 5: Sample data for a Cartesian graph with an estimated “best fit” straight line.
\(x\) | \(y\) |
---|---|
\(1.2\) | \(3.5\) |
\(2.0\) | \(4.8\) |
\(2.6\) | \(6.1\) |
\(3.9\) | \(7.2\) |
\(4.8\) | \(9.4\) |
\(6.2\) | \(11.1\) |
\(7.3\) | \(13.6\) |
\(8.8\) | \(15.6\) |
\(9.3\) | \(16.5\) |
\(9.8\) | \(17.6\) |
Table 6: Jupiter’s four largest moons
Moon | \(a\) (km) | \(P\) (days) |
Io | \(\text{422,000}\) | \(1.769\) |
Europa | \(\text{671,000}\) | \(3.551\) |
Ganymede | \(\text{1,070,000}\) | \(7.155\) |
Callisto | \(\text{1,883,000}\) | \(16.689\) |
Answers
- \(y \approx 6.5\)
- See Figure 13
- (a) \(1.2\) (b) \(1.8\) (c) \(63.1\) y
- (a) \(4.7\) (b) The star is unusually bright (c) \(\text{20,000 K}\) (d) The star is unusually hot
- See Figure 14
- (a) See Figure 15 (b) The blue star symbol is the location of the hypothetical newly-discovered moon with \(a = \text{900,000 km}\). The corresponding orbital period is approximately \(5.5~\text{days}\).