An Interesting Christmas Gift
Over the holidays, the New York Times delivered an unusual juxtaposition of headlines and content, and apparent lack of self-awareness, to illicit such a hearty chuckle from its readers as to make the cheerful Old Saint jealous.
[image originally provided by @ddmeyer on Twitter]
To those imbued with the skill of basic high school Algebra 1, the information in the article about Sony’s revenues for the first four days of release of “The Interview” were enough to solve a unit value problem. If we let R = the number of rentals, and S = the number of sales; then,
- R + S = 2 million
- $6*R + $15*S = $15 million
However, not too far into the sudoku puzzle we might realize that a deeper, more instructive problem exists here, a problem that actually permeates all of our daily lives. That problem is related to the precision of the information we have to deal with in planning exercises or, say, garnering market intelligence, etc. A second reading of the article reveals that the sales values, both the total transactions and the total value of them, were reported as approximations. In other words, if the sources at Sony followed some basic rules of rounding, the total number of transactions could range from 1.5 million to 2.4 million, and the total value might range from $14.5 million to $15.4 million. This might not seem like a problem at first consideration. After all, 2 million is in the middleish of its rounding range as is $15 million. Certainly the actual values determined by the simple algebra above point to a good enough approximate answer. Right? Right?
To see if this true, let’s reassign the formulas above in the following way.
- R + S = T
- $6*R + $15*S = V
- S = 1/9 * V - 2/3 * T
- R = T - S
[Fig. 1: The distribution of total transaction values for various combinations of rental and direct sales numbers.]
Here we see that the rental numbers could range from about 800 thousand to 2.4 million, while the direct sales could range from nearly 0 to 700 thousand! Maybe more instructive is to consider the range of the ratio of the rentals to direct sales:
[Fig. 2: The distribution of the ratio of rentals to direct sales for various combinations of rental and direct sales numbers.]
Using the sample values underlying these distributions in our last set of formulas, we observe that in all likelihood - an 80th percentile likelihood – the actual ratio of the rentals to sales falls in a much narrower range – the range of 3 to 9, not 1.11 to 215.
[Fig. 3: The 80th percentile prediction interval for the ratio of the rentals to sales falls in the range of 3 to 9.]
Our manager may push back on this by saying that our SME doesn’t really have the credibility to use the distributions assessed above. She asks, "What if we stick with maximal uncertainty within the range?” In other words, what if, instead of assessing a central tendency around the reported values with declining tails on each side, we assume there is a uniform distribution along the range of sales values (i.e., each value is equally probable to all values in the range)?
[Fig. 4a, b: We replace our SME supplied distribution for (a) total sales transactions and (b) total value with one that admits an insufficient reason to suspect that any value in our range is more likely than any other.]
What is the result? Well, we see that even with the assumption of maximal uncertainty, while the most likely range expands by a factor of 2.7 (i.e., the range expanded from 3-9 to 1.7-18), it still remains within a manageable range as the extreme edge cases are ruled out, not as impossible but as fairly unlikely.
[Fig. 5: Replacing our original SME distributions that had peaks with uniform distributions flattens out the distribution of our ratio of rentals to sales, causing the 80th percentile prediction interval to widen. The new range runs from about 1.7 to 18.]
The following graph displays the full range of sales and rental variation that is possible depending on our degrees of belief (as represented by our choice of distribution) about the range of total transactions and total value.
[Fig. 6: A scatter plot that demonstrates the distribution of direct sales and rental combinations as conditioned by our choice of distribution type.]
By focusing on the 80th percentile range of outcomes in the ratio of rentals to sales, we can significantly improve the credible range to estimate the rentals and direct sales from the approximate information we were given.
[Fig. 7: A scatter plot that demonstrates the distribution of direct sales and rental combinations as conditioned by our choice of distribution type, constrained only to those values in the 80th percentile prediction interval.]
Precise? Not within a hair’s breadth, no, but the degree of precision we obtain by employing probabilities (as opposed to relying on just a best guess with no understanding of the implications of the range of the assumptions) into our analysis improves by a factor of 13.1 (assuming maximum uncertainty) to 35.2 (trusting our SME). If our own planning depends on an understanding of this sales ratio, we can exercise more prudence in the effective allocation of the resources required to address it. Now, when our manager asks, “How do you know the actual values aren’t near the edge cases?”, we can respond by saying that we don’t know precisely, but using simple algebra combined with probabilities dictates that the actual values most likely are not.