### Bridging the Gap: The Economics of Data Science and Decisions

The official blog site of Incite! Decision Technologies. Thinking about thinking and deciding in a complex, uncertain, and risky world.

Incite! Decision Technologies has recently developed a simple yet sophisticated web-based sales opportunity portfolio analysis tool that is ready for beta testing. Now we're seeking parties that would be interested in participating at __no cost and no obligation__.

Specifically, we are looking for progressive sales managers in firms whose sales team pursues high value, low-frequency sales. Examples of target firms might be...

- Engineering, architecture & construction firms
- Professional service firms
- Capital equipment manufacturers
- Start-ups

The purpose of the tool is to provide

- Improved accuracy of revenue realization and timing forecasts;
- Guidance on how to allocate resources to maximize the likelihood of deal closure;
- Guidance on opportunity selection and prioritization.

Ultimately, you will be able to determine if the sales opportunities you are pursuing are worth the time, effort, and resources.

If you are interested in learning more or know someone who might be, please, contact me via LinkedIn message or send me an email from our web form.

If you are interested in learning more or know someone who might be, please, contact me via LinkedIn message or send me an email from our web form.

Labels: complex sales, portfolio analysis, sales forecasting

Compounding the external uncertainty, we often get in our own way by importing certain kinds of biases into our assessment of the value of the sales opportunities at hand. These biases can include…

**Unwarranted optimism**or**wishful thinking**– personal enthusiasm or a natural disposition to believe that desired outcomes will most likely occur; or, inflating initial estimates of desired outcomes to appear more effective than is warranted;**Sand-bagging**– under reporting potential outcomes to appear heroic when better than anticipated outcomes materialize;**False precision**– reporting anticipated outcomes with an unjustified level of certainty, usually as a single-point estimate rather than a range;**Availability**– recalling values that are memorable, easily accessible, recent, or extreme;**Anchoring**– using the first “best guess” as a starting point for subsequent estimating;**Expert over-confidence**– failure of creativity or hubris (e.g., “I know this information and can’t be wrong because I’m the expert.”);**Incentives**– the SME experiences some benefit or cost in relationship to the outcome of the term being measured, adjusting his estimate in the direction of the preferred outcome;**Entitlement**– the SME provides an estimate that reinforces his sense of personal value.

Without bias-free assessments in our decisions to actively pursue sales opportunities, it's nearly impossible to know how to allocate sales and support resources effectively to maximize the likelihood of capturing sales in a profitable and efficient manner. In short, when given the opportunity to pursue multiple opportunities with limited resources, it’s often difficult to know if any given opportunity is worth the time.

As odd as it may sound in a culture that seems to demand almost endless optimism, the Power of Negative Thinking actually helps us to overcome our biases as well as inform us how to obtain better information about the external uncertainties we face. By “negative thinking” we do not mean cynicism or toxic nay-saying. Rather, we refer to a process that asks us to consider critically the opposite of what we too easily assume (or wish) to be true. While Negative Thinking could lead us to consider the effects of unfortunate outcomes or conditions (the opposite of desired outcomes) on sales opportunities...

The best laid schemes o’ Mice an’ Salesmen, Gang aft agley |

...it could also lead us to consider the possibility of desirable outcomes or conditions (the opposite of the unfortunate) for situations that we often easily dismiss.

No, no, boy, that's no way to make a plane. That'll, I say, that'll never...fly! |

- What is the real opportunity?
- What are our goals and objectives?
- What are the client's goals and objectives?
- What are the decision boundaries and open decisions?
- What are the sources of uncertainty?

Answering these questions helps the team know that it has the right reasons in mind to pursue an opportunity and what constraints in their current level of knowledge limit their ability to make unambiguous decisions about what opportunities to pursue and how to go about pursuing them.

Probabilistic reasoning helps a sales team then answer these questions:

- What is the likely range of outcomes for the uncertainties?
- What are the effects of uncertainties on sales goals, revenues, and profit?
- How much risk do we face with each opportunity; i.e., how much could we lose by pursuing one opportunity over another?
- What insights can we create for contingency plans or options?
- How do we prioritize our set of current opportunities?

The effect of taking these two steps in a structured way reveals the Power of Negative Thinking so that the sales team can recognize when an opportunity is worth pursuing…or not. Ultimately, not only does the Power of Negative Thinking give the sales team a more accurate assessment of the current state and possibilities they face, they can also develop more effective contingency plans to increase the likelihood of achieving results their organization—and their clients—desire.

Labels: complex sales, decision analysis, sales forecasting

I turned off the television last night at 8 PM. Since I had an analytics problem to work on, I didn't want my attention divided, and I knew that clinging to electoral results was more neurotic than helpful. My attention at the moment was not going to change the results. So, I rolled up my sleeves and got to work.

At 12:30 AM, I turned my television back on...

As I watched the polling results roll in and followed the reactions of establishment pundits and the broader hoi polloi (from both sides) in social media, all I could think was, "What is going on here?" Over and over. I mean, Nate Silver was still giving better than 2:1 odds of a Clinton victory just before I turned off the TV. Could the situation really have been that different than assessed? Could things really have changed that quickly? At 4 AM, I finally captured some thoughts that I think should serve as object lessons for all of us, and not just in politics, but in business, too.

- Never, ever believe your own spin. Humans love narratives that give them comfort. Unfortunately, almost all narratives are constructed from selected evidence that fits a preferred narrative.
- Always question where your biases are coming from. You are biased. Until you recognize it, you will frequently be rudely embarrassed.
- There is no meaningful position in certainty. All beliefs about future events should be treated with degrees of belief.
- Even events that happened in the past are open to interpretation. The real issue about the facts of events is not so much whether events have occurred in the past or whether they will occur in the future. The real issue is our epistemic distance from the events. We generally don't know as much as we think we do.
- We condition our beliefs on the evidence at hand. Thinking that a Clinton victory was highly probable was not a bad position to take. It made sense given much of the evidence. BUT, Prob(Clinton win) > 50% does mean Prob(Clinton win) = 100%! (I'm actually getting tired of explaining this. I'm getting tired of seeing people make this mistake and the effects it has in real life on real people. Probabilities are degrees of belief, not statements of fact.) Always, always, always consider the disconfirming evidence.
- Trump never showed an insignificant chance of winning. His victory was always probable. What I see and hear coming from those expressing shocked disappointment about the Clinton loss is that they didn't really explore and consider the edge cases that would lead to a Trump victory. Explore the edge cases. Explore aggressively. Keep exploring.
- Informed accuracy trumps false precision (pun intended). Don't be embarrassed to draw your prediction intervals wide. It's more honest, more informative, and will allow you to do a better job preparing contingency plans. When #6 is performed honestly and aggressively, it should lead you to make your prediction intervals even wider. It's better to be humble and recognize how little you know versus being sure and then being rudely surprised.
- The evolving probability of win curves for this election resemble the curves associated with predicting that a given hypothesis among several is true when there are unaccounted for characteristics at play. Suddenly, a seemingly most likely explanation crashes to be replaced by a previously less likely hypothesis as the unrecognized characteristic manifests itself. This is a long way to say people get caught up in false dichotomies (or n-chotomies) for the possible explanations for what really is the case. It is almost always the case that more explanations are available than the limited set we originally conceived.
- If something really weird happens and somehow the posted results at 4 AM reverse by the time I wake up, all of the above still applies, maybe more so.

Labels: decision analysis

If you are new to data science and learning the R language, let me recommend this new gem of a book, *Business Intelligence with R*, by my friendr (the term I just coined to describe R users who help each other), Dwight Barry: https://leanpub.com/businessintelligencewithr

Also, please consider the personal note that Dwight sent to all of his beta readers:

Perhaps most importantly, I've also decided to give all proceeds to the Agape Girls Junior Guild, which is a group of middle-school girls who do fundraising for mitochondrial disorder research at Seattle Children's Research Institute and Seattle Children's Hospital. While the minimum price for this book will always be free, if you're the type who likes to "buy the author a coffee," know that your donation is supporting a better cause than my already out-of-control coffee habit. :-)

Labels: business intelligence, data analytics, R

I will be speaking at the Georgia Tech Scheller College of Business on February 18, 2016 on the following topic:

In the current rush to adopt data-driven analytics, discussions about algorithms, programming tools, and big data tend to dominate the practice of business analytics. But we are defined by our choices, our values, and preferences. Data and business analytics that do not start with this recognition actually fail to support the human-centered reason for decision making. This is the way of the Sith. A Jedi, however, knows that framing business analytics in terms of the values and preferences of decision makers, and the uncertainty of achieving those, employs the tools of decision and data science in the wisest way. In this discussion, we will think about the principles of high quality decisions, how to frame a business analytics problem, and learn how to use information in the most efficient way to create value and minimize risk.

The discussion will include a demonstration of the Analytica modeling software.

If you're in the Atlanta area, I would love for you to join me in the discussion.

A special thanks to Dr. Beverly Wright for organizing this event!

Recently, Brian McCarthy and I had some fun being interviewed by Ryan McPherson of Atlanta Business Radio.

This post is going to be different from what I've published here before. I'm not going explain something or attempt to be clever. Instead, I want to share an idea, an open ended kind of idea for which, at this point, I have no conclusions. First, let me share some background.

The other day I shared a TED Talk by Conrad Wolfram ("Teaching kids real math with computers") as an update on LinkedIn and on my personal Facebook. Please take the time to listen to this if you have not already. I think this is actually vitally important to the well being of our children and how they gain an education.

My friend and colleague, James Mitchell, made the following comment on the original update: "A great talk. My daughter's life would have been so much easier and better with this approach to teaching math. Wolfram talked about all her complaints." They were my complaints, too. A few of the comments made on my Facebook page included "Math is hard" and "I hate math. I never use it." Apparently, the same complaints are shared by more than just two people.

I've been thinking about this TED Talk almost non-stop since I watched it, and I'm beginning to think that one way to achieve the idea here is to provide mathematics education outside of traditional school environments. By that, I don't mean that we should advocate that schools quit teaching math; rather, I think we need to start providing private forums in which kids who are interested in math can learn math in the same way they might learn and participate in extracurricular sports or arts activities that are not offered in a traditional school. I'm currently convinced the program must be private and free from policy driven curricula that "teaches to the test" and arbitrary performance criteria. This is for fun, but a special kind of fun.

Curiosity photo by Rosemary Ratcliff, provided courtesy of FreeDigitalPhotos.net |

I've been thinking about this TED Talk almost non-stop since I watched it, and I'm beginning to think that one way to achieve the idea here is to provide mathematics education outside of traditional school environments. By that, I don't mean that we should advocate that schools quit teaching math; rather, I think we need to start providing private forums in which kids who are interested in math can learn math in the same way they might learn and participate in extracurricular sports or arts activities that are not offered in a traditional school. I'm currently convinced the program must be private and free from policy driven curricula that "teaches to the test" and arbitrary performance criteria. This is for fun, but a special kind of fun.

What if there were mathematics/programming academies that taught math this way? Maybe it would be a private academy for self-motivated kids who want to learn math, maybe offered after their normal school day or on the weekends. It would follow the approaches advocated by Conrad Wolfram, Paul Lockhart, and Kevin Devlin. It would not confer a degree, diploma, or certificate of any sort other than a letter that describes the areas of inquiry and completion of certain milestone projects that were self-selected by the student and mentored by the "professors." For older students, these projects might include publishing papers in journals as well as serve as the submissions to more traditional math and science fair projects. This would not be an after school tutoring program for students who want to improve their grades to passing levels or gain extra points on their college admission tests.

In other words, the immediate purpose of the school would only be to satisfy the natural curiosity of self-motivated students. I believe such an academy would eventually provide economic benefits to its students because it would teach both creative and structured thinking that the market would eventually reward, but the near term benefit would serve to remediate the destruction of natural curiosity created in our current systems and just simply help our youngest achieve what they want to achieve. I envision this as a kind of math zendo where children learn the art driven by intrinsic motivation and encouragement from like-mined but more mature leaders.

Of course, as ideas take hold in our minds, so do the doubts. I think the difficult aspect of this idea would be financing the program. Currently, I see the finances being provided in part by student fees, some voluntary time offered by teachers, and private donations. I would want to structure the student fees such that an interested student could not participate because they could not afford the fees.

Much remains to be considered here. Maybe this has been done before or is being done right now. I don't know. Regardless, I welcome any feedback you might offer.

Over the holidays, the New York Times delivered an unusual juxtaposition of headlines and content, and apparent lack of self-awareness, to illicit such a hearty chuckle from its readers as to make the cheerful Old Saint jealous.

[image originally provided by @ddmeyer on Twitter]

To those imbued with the skill of basic high school Algebra 1, the information in the article about Sony’s revenues for the first four days of release of “The Interview” were enough to solve a unit value problem. If we let R = the number of rentals, and S = the number of sales; then,

- R + S = 2 million
- $6*R + $15*S = $15 million

However, not too far into the sudoku puzzle we might realize that a deeper, more instructive problem exists here, a problem that actually permeates all of our daily lives. That problem is related to the precision of the information we have to deal with in planning exercises or, say, garnering market intelligence, etc. A second reading of the article reveals that the sales values, both the total transactions and the total value of them, were reported as approximations. In other words, if the sources at Sony followed some basic rules of rounding, the total number of transactions could range from 1.5 million to 2.4 million, and the total value might range from $14.5 million to $15.4 million. This might not seem like a problem at first consideration. After all, 2 million is in the middleish of its rounding range as is $15 million. Certainly the actual values determined by the simple algebra above point to a good enough approximate answer. Right? Right?

To see if this true, let’s reassign the formulas above in the following way.

- R + S = T
- $6*R + $15*S = V

- S = 1/9 * V - 2/3 * T
- R = T - S

[Fig. 1: The distribution of total transaction values for various combinations of rental and direct sales numbers.]

Here we see that the rental numbers could range from about 800 thousand to 2.4 million, while the direct sales could range from nearly 0 to 700 thousand! Maybe more instructive is to consider the range of the ratio of the rentals to direct sales:

[Fig. 2: The distribution of the ratio of rentals to direct sales for various combinations of rental and direct sales numbers.]

If we blithely assume that the reported values of sales were precise enough to support believing that the actual value of rentals and unit sales were close to our initial result, we could be astoundingly wrong. The range of this ratio could run from about 1.11 (for 1.5 million in total transactions; 15.4 million in sales) to 215 (for 2.4 million in total transactions; 14.5 million in sales). If we were trying to glean market intelligence from these numbers on which to base our own operational or marketing activities, we would face quite a conundrum. What’s the best estimate to use?

Fortunately, we can turn to probabilisitic reasoning to help us out. Let’s say we consult a subject matter expert (SME) who gives us a calibrated range and distribution for the sales assumptions such that the range of each distribution stays mostly within the rounding range we specify.

[Fig. 2a, b: The hypothetical distribution of the (a) total sales transactions and (b) total value assessed by our SME.]

Using the sample values underlying these distributions in our last set of formulas, we observe that in all likelihood - an 80th percentile likelihood – the actual ratio of the rentals to sales falls in a much narrower range – the range of 3 to 9, not 1.11 to 215.

[Fig. 3: The 80th percentile prediction interval for the ratio of the rentals to sales falls in the range of 3 to 9.]

Our manager may push back on this by saying that our SME doesn’t really have the credibility to use the distributions assessed above. She asks, "What if we stick with maximal uncertainty within the range?” In other words, what if, instead of assessing a central tendency around the reported values with declining tails on each side, we assume there is a uniform distribution along the range of sales values (i.e., each value is equally probable to all values in the range)?

[Fig. 4a, b: We replace our SME supplied distribution for (a) total sales transactions and (b) total value with one that admits an insufficient reason to suspect that any value in our range is more likely than any other.]

What is the result? Well, we see that even with the assumption of maximal uncertainty, while the most likely range expands by a factor of 2.7 (i.e., the range expanded from 3-9 to 1.7-18), it still remains within a manageable range as the extreme edge cases are ruled out, not as impossible but as fairly unlikely.

[Fig. 5: Replacing our original SME distributions that had peaks with uniform distributions flattens out the distribution of our ratio of rentals to sales, causing the 80th percentile prediction interval to widen. The new range runs from about 1.7 to 18.]

The following graph displays the full range of sales and rental variation that is possible depending on our degrees of belief (as represented by our choice of distribution) about the range of total transactions and total value.

[Fig. 6: A scatter plot that demonstrates the distribution of direct sales and rental combinations as conditioned by our choice of distribution type.]

By focusing on the 80th percentile range of outcomes in the ratio of rentals to sales, we can significantly improve the credible range to estimate the rentals and direct sales from the approximate information we were given.

[Fig. 7: A scatter plot that demonstrates the distribution of direct sales and rental combinations as conditioned by our choice of distribution type, constrained only to those values in the 80th percentile prediction interval.]

Precise? Not within a hair’s breadth, no, but the degree of precision we obtain by employing probabilities (as opposed to relying on just a best guess with no understanding of the implications of the range of the assumptions) into our analysis improves by a factor of 13.1 (assuming maximum uncertainty) to 35.2 (trusting our SME). If our own planning depends on an understanding of this sales ratio, we can exercise more prudence in the effective allocation of the resources required to address it. Now, when our manager asks, “How do you know the actual values aren’t near the edge cases?”, we can respond by saying that we don’t know precisely, but using simple algebra combined with probabilities dictates that the actual values most likely are not.

Using the sample values underlying these distributions in our last set of formulas, we observe that in all likelihood - an 80th percentile likelihood – the actual ratio of the rentals to sales falls in a much narrower range – the range of 3 to 9, not 1.11 to 215.

[Fig. 3: The 80th percentile prediction interval for the ratio of the rentals to sales falls in the range of 3 to 9.]

Our manager may push back on this by saying that our SME doesn’t really have the credibility to use the distributions assessed above. She asks, "What if we stick with maximal uncertainty within the range?” In other words, what if, instead of assessing a central tendency around the reported values with declining tails on each side, we assume there is a uniform distribution along the range of sales values (i.e., each value is equally probable to all values in the range)?

[Fig. 4a, b: We replace our SME supplied distribution for (a) total sales transactions and (b) total value with one that admits an insufficient reason to suspect that any value in our range is more likely than any other.]

What is the result? Well, we see that even with the assumption of maximal uncertainty, while the most likely range expands by a factor of 2.7 (i.e., the range expanded from 3-9 to 1.7-18), it still remains within a manageable range as the extreme edge cases are ruled out, not as impossible but as fairly unlikely.

[Fig. 5: Replacing our original SME distributions that had peaks with uniform distributions flattens out the distribution of our ratio of rentals to sales, causing the 80th percentile prediction interval to widen. The new range runs from about 1.7 to 18.]

The following graph displays the full range of sales and rental variation that is possible depending on our degrees of belief (as represented by our choice of distribution) about the range of total transactions and total value.

[Fig. 6: A scatter plot that demonstrates the distribution of direct sales and rental combinations as conditioned by our choice of distribution type.]

By focusing on the 80th percentile range of outcomes in the ratio of rentals to sales, we can significantly improve the credible range to estimate the rentals and direct sales from the approximate information we were given.

[Fig. 7: A scatter plot that demonstrates the distribution of direct sales and rental combinations as conditioned by our choice of distribution type, constrained only to those values in the 80th percentile prediction interval.]

Precise? Not within a hair’s breadth, no, but the degree of precision we obtain by employing probabilities (as opposed to relying on just a best guess with no understanding of the implications of the range of the assumptions) into our analysis improves by a factor of 13.1 (assuming maximum uncertainty) to 35.2 (trusting our SME). If our own planning depends on an understanding of this sales ratio, we can exercise more prudence in the effective allocation of the resources required to address it. Now, when our manager asks, “How do you know the actual values aren’t near the edge cases?”, we can respond by saying that we don’t know precisely, but using simple algebra combined with probabilities dictates that the actual values most likely are not.

Labels: Analytica, business forecasting, data analytics, decision making, modeling, Monte Carlo simulation, probability, sales forecasting

I copied the following nineteen zen-like koans from the website devoted to the Python programming language (don't leave yet...this isn't really going to be about programming!).

- Beautiful is better than ugly.
- Explicit is better than implicit.
- Simple is better than complex.
- Complex is better than complicated.
- Flat is better than nested.
- Sparse is better than dense.
- Readability counts.
- Special cases aren't special enough to break the rules.
- Although practicality beats purity.
- Errors should never pass silently.
- Unless explicitly silenced.
- In the face of ambiguity, refuse the temptation to guess.
- There should be one-- and preferably only one --obvious way to do it.
- Although that way may not be obvious at first unless you're Dutch.
- Now is better than never.
- Although never is often better than *right* now.
- If the implementation is hard to explain, it's a bad idea.
- If the implementation is easy to explain, it may be a good idea.
- Namespaces are one honking great idea -- let's do more of those!

The koans are supposed to communicate the essence of the guiding principles of programming. Their zen-like fashion is intended to motivate reflection and discussion more so than state explicit rules. In fact, there is a twentieth unstated (Or is it? How's that for zen-like clarity?) principle that you must discover for yourself.

Good aphorisms often find meaning beyond their initial intent. That's the way general, somewhat ambiguous guidance works and why some aphorisms last for so long in common parlance. They're malleable to one's circumstances and provide a kind of structure on which to hinge one's thoughts, concerns, and aspirations (I'm pretty sure horoscopes and Myers Briggs work this way). Some of these aphorisms, maybe all of them, struck me as not only useful as guiding principles for programming but also for decision management in general. Seriously. Go back and consider them again, Grasshopper.

So, let me ask you:

- In what way is decision management like programming?
- How would you interpret these principles, if at all, for use in the role of decision making?
- What do you think is the missing principle?

Labels: decision making, programming

You've probably heard the saying, "It's better to be mostly accurate than precisely wrong." But what does that mean

Accuracy relates to the likelihood that outcomes fall within a prediction band or measurement tolerance. A prediction/measurement that comprehends, say, 90% of actual outcomes is more accurate than a prediction/measurement that comprehends only 30%. For example, let's say you repeatedly estimate the number of marbles in several Mason jars mostly full of marbles. An estimate of "more than 75 marbles and less than 300 marbles" is probably going to be correct more often than "more than 100 marbles but less than 120 marbles." You might say that's cheating. After all, you can always make your ranges wide enough to comprehend any range of possibilities, and that is true. But the goal of accuracy is just to be more frequently right than not (within reasonable ranges), and wider ranges accomplish that goal. As I'll show you in just a bit, accuracy is very powerful by itself.

Precision relates to the width of the prediction/measurement band relative to the mean of the prediction/measurement. A precision band that varies around a mean by +/- 50% is less precise than one that varies by +/- 10%. When people think about a precise prediction/measurement, they usually think about one that is

The canonical target pattern explanation of accuracy and precision. |

The problem is that people jump past accuracy before that attempt to be precise, thinking that the two are synonymous. Unfortunately, unrecognized biases can make precise predictions extremely inaccurate, hence the proverbial saying. Jumping ahead of the all too important step of calibrating accuracy is where the "precisely wrong" comes in.

Good accuracy trucks many more miles in most cases than precision, especially when high quality, formal data is sparse. This is because the marginal cost of improving accuracy is usually much less than the marginal costs of improved precision, but the payoff for improved accuracy is usually much greater. To understand this point, take a look again at the target diagram above. The Accurate/Not Precise score is higher than the Not Accurate/Precise score. In practice, a lot of effort is required to create a measurement situation that effectively controls for the sources of noise and contingent factors that swamp efforts to be reasonably more precise. Higher precision usually comes at the cost of tighter control, heightened attention on fine detail, or advanced competence. There are some finer nuances even here in the technical usages of the terms, but these descriptions work well enough for now.

Be careful, though - being more accurate is not just a matter of going with your gut instinct and letting that be good enough. Our gut instinct is frequently the source of the biases that make our predictions look as if we were squiffy when we made them. We usually achieve improved accuracy through the deliberative process of accounting for the causes and sources of the variation (or range of outcome) we might observe in the events we're trying to measure or predict. The ability to do this reflects the depth of expert knowledge we possess about the system we're addressing, the degree of nuances we can bring to bear to explain the causes of variation, and a recognition of the sources of bias that may affect our predictions. In fact, achieving good accuracy usually begins by assessing that we may be biased at all (and we usually are) and why.

Once we've achieved reasonable accuracy about some measurement of concern, it might then make sense to improve our precision of the measurement if the payoff is worth the cost of intensified attention and control. In other words, we only need to improve our precision when it really matters.

[Image from FreeDigitalPhotos.net by Salvatore Vuono.] |

Labels: bias, business forecasting, sensitivity analysis, uncertainty

Mr. Patrick Burns at Burns Statistics (no, not that Mr. Burns) provides an excellent overview for the hidden dangers that lurk in your spreadsheets. Guess what. The problems aren't just programming errors and the potential for their harm, but are errors that are inherent to the spreadsheet software itself. That's right. Before your analysts even make an error, the errors are already built in. Do you know what's lurking in your spreadsheets? Well, do you?

Before you answer that question, ask yourself these:

- What quality assurance procedures does our organization employ to ensure that our spreadsheets are free of errors of math, units conversion, and logic?
- What effort does our organization undertake to make sure that the decision makers and consumers of the spreadsheet analysis comprehend the assumptions, intermediate logic, and results in our spreadsheets?
- How do we ensure that spreadsheet templates (or repurposed spreadsheets or previously loved spreadsheets) are actually contextually coherent with the problem framing and subsequent decisions that the spreadsheets are intended to support?

My suspicion is that errors of the first level run amok much more than people are willing to admit, but their prevalence is relatively easy to estimate given our knowledge about the rates at which programming errors occur, why they occur, and how they propagate geometrically through spreadsheets. Mr. Burns recommends that the programming language R is a better solution than spreadsheets and easier to adopt than might be currently imagined by your analysts. I agree. I happen to like R a lot, but I love Analytica as a modeling environment more. But the solution to our spreadsheet modeling problems isn't going to be completely resolved by our choice of software and programming mastery of it.

My greater suspicion is that errors of the second and third level are rarely addressed and pose the greatest level of risk to our organizations because we let spreadsheets (which are immediately accessible) drive our thinking instead of letting good thinking determine the structure and use of our spreadsheets. To rid ourselves of the addiction to spreadsheets and their inherent risks, we have to do the hard work first by starting with question 3 and then working our way down to 1. Otherwise, we're being careless at worst and precisely wrong at best.

(Originally published at LinkedIn.)

Labels: Analytica, modeling, R language, spreadsheets

This morning @WSJ posted a link to the story about Microsoft’s announcement of its plans to lay off 18,000 employees. This picture (as captured on my iPhone)...

[click image to enlarge]

...accompanied the tweet, which is presumably available through their paywall link.

While I’m really sorry to hear about the Microsoft employees who will be losing their jobs, I am simply outraged at the miscommunication in the pictured graph. (This news appeared to me first on Twitter, and the seemingly typical response on Twitter is hyperbolic outrage.)

Here’s the problem as I see it: the graph communicates one-dimensional information with two-dimensional images. By doing so, it distorts the actual intensity of the information the reporters are supposed to be conveying in an unbiased manner. In fact, it makes the relationships discussed appear much less dramatic than it actually is.

For example, look at Microsoft’s (MSFT) revenue per employee compared to Apple’s (AAPL). WSJ reports MSFT is $786,400/person; APPL, $2,128,400. The former is 37% of the latter. But for some reason, WSJ communicates the intensity with an area, a two-dimensional measure, whereas intensity is one-dimensional. Our eyes are pulled to view the length of the side of the square as a proxy for the measurement being communicated. The sides of the squares are proportionally equal to √(786,400) and √(2,128,400); therefore, the sides of the squares visually communicate the ratio of the productivity of MSFT:AAPL as 61%. In other words, the chart visually overstates the relative productivity of MSFT's employees compared to that of AAPL's by a factor of 1.62.

If the numbers are confusing there, consider this simpler example. The speed of your car as measured by your speedometer is an intensity. It’s one dimensional. It tells you how many miles (or kilometers, if you’re from most anywhere else outside the US) you can cover in one hour if your car maintains a constant speed. Your speedometer aptly uses a needle to point to the current intensity as a single number. It does not use a square area to communicate your speed. If it did, 60 miles per hour would look 1.41 times faster than 30 miles per hour instead of the actual 2 times faster that it really is. The reason for this is that the the sides of the squares used to display speed would have to be proportional to the square roots of the speed. The square roots of 60 and 30 are 7.75 and 5.48, respectively.

For your own personal edification, I have corrected the WSJ graph here:

[click image to enlarge]

Do you see, now, how much more dramatic the AAPL employees' productivity is over that of MSFT's?

This may not seem like a big deal to you at the moment, but consider how much quantitative information we communicate graphically. The reason is that, as the cliché goes, a picture is figuratively worth a thousand words. I firmly believe graphical displays of information are powerful methods of communication, and a large part of my professional practice revolves around accurately and succinctly communicating complex analysis in a manner that decision makers can easily consume and digest. But I’m also keenly aware of how analyst and reporters often miscommunicate important information via visual displays, either by design, inexperience, or by trying to be too clever. I see these transgressions all the time in the analyses I’m asked to audit.

The way we communicate information is not just a matter of style for business reporters. We often make prodigious decisions based on information. If information is communicated in a way that distorts the underlying relationships involved, we risk making serious misallocations of scarce resources. This affects every aspect of the nature of our wealth - money, time, and quality of life. The way we communicate information bears fiduciary responsibilities.

For discussion sake I ask,

- How often have you seen, and maybe even been victimized by, graphical information that miscommunicates important underlying relationships and patterns?
- How often have you possibly incorporated ineffective means of graphically communicating important information? (Pie charts, anyone?)

If you want to learn more about the best ways to communicate through the graphical display of quantitative information, I highly recommend these online resources as a starting point:

Labels: bias, data analytics, visual display of quantitative information

During a recent market development planning exercise, my client recognized that his colleagues were making some rather dubious assumptions regarding the customers they were trying to address (i.e., acceptable price, adoption rate, lifecycle, market size, etc.), the costs of development, and costs of support. Although he frequently asked “How do you know that?”, he seemed to face irritation and mild belligerence in reaction from those he asked to justify their assumptions. So, together we devised a simple little routine to force the recognition that assumed facts might be shakier than previously thought.

After bringing the development team members together, we went around the room and asked for a list of statements that each believed to be true that must be true for the program to succeed. We wrote each down as a succinct, declarative statement. Then, after everyone had the opportunity to reflect on the statements, we converted each to a question simply by converting the periods to question marks.

Before Western explorers proved that the Earth is round, ships used to sail right off the assumed edges. |

We then asked the team to supply a statement that answered each question in support of the original statement. Once this was completed, we then appended the dreaded question mark to each of these responses. We repeated this process until no declarative answers could be supplied in response to the questions. The cognitive dissonance among the team members became palpable as they all had to start facing the uncomfortable situation that what they once advocated as fact was largely unsupportable. Many open questions remained. More uncertainty reigned than was previously recognized. The remaining open questions then became the basis for uncertainties in our subsequent modeling efforts in which we examined value tradeoffs in decisions as a function of the quality of information we possessed. You probably won’t be surprised to learn that the team faced even more surprises as the implications of their tenuous assumptions came to light.

I am interested to know how frequently you find yourself participating in planning exercises at work in which key decisions are made on the basis of largely unsupported or untested assumptions. My belief is that such events happen much more often than we care to admit.

I would also be interested to know if the previously described routine works with your colleagues to force awareness of just how tenuous many preconceived notions really are. I outline the steps below for clarity.

- Write down everything you believe to be true about the issue or subject at hand.
- Each statement should be a single declarative statement.
- Read each out loud, forcing ownership of the statement.
- Convert each statement to a question by changing the period to a question mark.
- Again, read each out loud as a question, opening the door to the tentative nature of the original statement.
- Supply a statement that you believe to be true that answers each question.
- Repeat the steps above until you reach a point with each line of statements-questions where you can no longer supply answers.

You might find that using a mind mapping tool such as MindNode or XMind are useful for documenting and displaying the assumptions and branching question/responses. The visual display may serve to help your team see connections among assumptions that were not previously recognized.

Let me know if you try this and how well it works.

Let me know if you try this and how well it works.

Labels: bias, epistemology, uncertainty

A friend on LinkedIn asks, “Can modeling a business work?” I respond:

For now, or at least until The Singularity occurs, the development of business ideas and plans is a uniquely human enterprise that springs from a combination of intuition, goals, and ambitions. That should not mean, however, that we cannot effectively supplement our intuition and planning with aids to management and decision making. While I think human intuition is a very powerful feature of our species, I’m also convinced it can be led astray or corrupted by biases very quickly, particularly amid the complexities that arise as plans turn into real life execution. This is not a modern realization. The origin of the principles of inventory management, civil engineering, and accounting date back to the antiquities. Think of the seagoing merchants of the Phoenicians and the public works building Babylonians and Egyptians. In fact, historians now believe that the actual founder of Arthur Andersen LLP was none other than the blind Venetian mathematician and priest, Luca Pacioli (ca. 1494). That's right - that musty odor that emanates from accounting books is due to their being more than 500 years old.

Luca Pacioli doodling circles out of sheer boredom after a day of accounting. I made up the part about his being blind.

Business modeling is a tool similar to accounting in that it aids our thinking in a world whose complexity seems often to exceed the grasp of our comprehension. I look at the value of modeling a business as a means to stress test both the business plan logic and the working assumptions that drive the business plan. In regard to the business plan logic, we're asking if the business has the potential ability to produce the value we think it can; and in regard to the working assumptions, we're testing how sensitively important metrics (i.e., payback time, break-even, required resources, shareholder value) of the business plan respond to conditions in the environment and controllable settings to which our business plan will be subjected.

Obtaining such insights from modeling a business, business leaders can modify business plans by changing policies about pricing, products/services offered, costs targeted for reduction or elimination, and contingency or risk mitigation plans that can be adopted, etc.

However, I recommend awareness of at least three caveats with regard to business modeling:

- Think of such models as "what-ifs" more so than precise forecasts. Use the "what if" mindset to make a business plan more robust against the things outside your direct control versus using it to justify a belief in guaranteed success. The latter is almost a sure fire approach to failure.
- Always compare more than one plan with a model to minimize opportunity costs. Often times, the best business plans derive from hybrids of two models that show how value can be created and retained for at least two different reasons.
- Avoid overly complex models as much as, maybe more so than, overly simplistic models. Building a requisite model from an influence diagram first is usually the best way to achieve this happy medium before writing the first formula in a spreadsheet or simulation tool. Richer, more complex models that correspond to the real world with the highest degree of precision are usually not useful for a number of reasons:
- they can be costly to build
- the value frontier of the insights derived decline relative to the cost to achieve them as the degree of complexity increases
- they are difficult to maintain and refactor for other purposes
- they are often used to justify delaying commitment to a decision
- few people will achieve a shared understanding that is useful for collaborating and execution

Labels: bias, epistemology, modeling, sensitivity analysis

This was a great article in The Wall Street Journal today.

For me, the key take away point can be summed up in this quote from Prof. Goetzmann: "Once people buy in, they start to discount evidence that challenges them..." I relate this not only to investing decisions in the market, but also to making organizational decisions--investments in capital projects, new strategies, the next corporate buzz. We've all seen or been apart of the exuberant irrationality that leads organizations into malinvestments.

Let's consider the complementary action--saying "no." Against the tendency toward the irrational "yes, Yes, YES!", learning to say "no" is a very important skill to master. It's probably one of the hardest skills to master when people request something from us that makes us feel important and liked.

I think, however, we always need to be aware that many of our initial reactions are often driven by biases. Reactively saying "no," once we've learned to say it and it becomes easy to do, can emerge from the same biases that urge us unreservedly to say "yes." Both incur their costs: missed opportunity, waste, and rework.

The skill more important to learn than saying "no" is acquiring the skill to consider disconfirming evidence, especially when that evidence challenges our dearest assumptions about what is going to make us rich. Let's not be so quick to say "yes" or smug when we say "no." Rather, let's learn the practice of asking,

- "what information might disabuse me of my favorite assumptions?"
- "what biases are preventing me from seeing clearly?"

Failing to learn these, we all too often find ourselves concocting a witches' brew.

The followings is the first chapter excerpt from my newly published tutorial.

Business opportunities of moderate to even light complexity often expose decision makers to hundreds, if not tens of thousands, of coordinated decision options that should be considered thoughtfully before making resource commitments. That complexity is just overwhelming! Unfortunately, the typical response is either analysis paralysis or "shooting from the hip," both of which expose decision makers to unnecessary loss of value and risk. This tutorial teaches decision makers how to tame option complexity to develop creative, valuable decision strategies that range from "mild to wild" with three simple thinking tools.

Read more here.

Labels: decision analysis, decision making

Developing a competitive price in response to an RFP is difficult and fraught with uncertainty about competitor pricing decisions. "Priced to Win" approaches often lead to declining margins. Our approach and tool set allow you to develop a most likely price neutral position that helps you focus more attention on providing "intangible" benefits that differentiate your offering in a way that is more valuable to your potential client.

Labels: business forecasting, Priced to Win, RFP response

The following is the first chapter excerpt from my newly published book.

Even if you are new to R, you most likely have noticed that R is used almost exclusively for statistical analysis, as it's described at The R Project for *Statistical Computin*g. Most people who use R do not frequently employ it for the type of inquiry which business case analysts use spreadsheets to select projects to implement, make capital allocation decisions, or justify strategic pursuits. The statistical analysis from R might inform those decisions, but most business case analysts don't employ R for those types of activities.

Obviously, as the title of this document suggests, I am recommending a different approach from the status quo. I'm not just suggesting that R might be a useful replacement for spreadsheets; rather, I'm suggesting that better alternatives to spreadsheets be found for doing business case analysis. I think R is a great candidate. Before I explain why, let me explain why I don't like spreadsheets.

Think about how a spreadsheet communicates information. It essentially uses three layers of presentation:

- Tabulation
- Formulation
- Logic

When we open a spreadsheet, usually the first thing we see are tables and tables of numbers. The tables **may** have explanatory column and row headers. The cells **may** have descriptive comments inserted to provide some deeper explanation. Failure to provide these explanatory clues represents more a failing of the spreadsheet developer's communication abilities than a failing of the spreadsheet environment, but even with the best of explanations, the emergent pattern implied by the values in the cells can be difficult to discern. Fortunately, spreadsheet developers can supply graphs of the results, but even those can be misleading chart junk.

To understand how the numbers arise, we might ask about the formulas. By clicking in a cell we can see the formulas used, but unfortunately the situation here is even worse than the prior level of presentation of tables of featureless numbers. Here, we don't see formulas written in a form that reveals underlying meaning; rather, we see formulas constructed by pointing to other cell locations on the sheet. Spreadsheet formulation is inherently tied to the structural presentation of the spreadsheet. This is like saying the meaning of our lives should be dependent on the placement of furniture in our houses.

While the goal of good analysis should not be more complex models, a deeper inquiry into a subject usually does create a need for some level of complexity that exceeds the simplistic. But as a spreadsheet grows in complexity, it becomes increasingly difficult to extend the size of tables (both by length of indices that structure them and the number of indicies used to configure the dimensionality) as a direct function of its current configuration. Furthermore, if we need to add new tables, choosing where to place them and how to configure them also depends almost entirely on the placement and configuration of previously constructed tables. So, as the complexity of a spreadsheet does increase, it naturally leads to less flexibility in the way the model can be represented. It becomes crystalized by the development of its own real estate.

The cell referencing formulation method also increases the likelihood of error propagation because formulas are generally written in a quasi-fractal manner that requires the formula to be written across every element in at least one index of a table's organizing structure. Usually, the first instance of a required formula is written within one element in the table; then, it is copied to all the appropriate adjacent cells. If the first formula is incorrect, all the copies will be, too. If the formula is sufficiently long and complex, reading it to properly debug it becomes very difficult. Really, the formula doesn't have to be that complicated or the model that complex for this kind of failure to occur, as the recent London Whale VaR model and Reinhart-Rogoff Study On Debt debacles demonstrated.[1]

All of this builds to the most important failure of spreadsheets -- the failure to clearly communicate the underlying meaning and logic of the analytic model. The first layer visually presents the numbers, but the patterns in them are difficult to discern unless good graphical representations are employed. The second layer, which is only visible unless requested, uses an arcane formulation language that seems inherently irrational compared to the goal of good analysis. The final layer--the logic, the meaning, the essence of the model--is left almost entirely to the inference capability of any user, other than the developer, who happens to need to use the model. The most important layer is the most ambiguous, the least obvious. I think the order should be the exact opposite.

When I bring up these complaints, the first response I usually get is: "ROB! Can't we just eat our dinner without you complaining about spreadsheets again?" But when the population of my dinner company tends to look more like fellow analysts, I get, "So what? Spreadsheets are cheap and ubiquitous. Everyone has one, and just about anyone can figure out how to put numbers in them. I can give my analysis to anyone, and anyone can open it up and read it."

Then I'm logically--no, **morally**--compelled to point out that carbon monoxide is cheap and ubiquitous, that everyone has secrets, that just about everyone knows how to contribute to the sewage system, that just about everyone can read your diary and add something to it. Free, ubiquitous, and easy to use are all great characteristics of some things in their proper context, but they aren't characteristics that are necessarily universally beneficial.

More seriously, though, I know that what most people have in mind with the common response I receive is the low cost of entry to the use of spreadsheets and the relative ease of use for creating reports (which I think spreadsheets are excellent for, by the way). Considering the shortcomings and failure of spreadsheets based on the persistent errors I've seen in client spreadsheets and the humiliating ones I've created, I think the price of cheap is too high. The answer to the first part of their objection--spreadsheets are cheap--is that R is free. Freer, in fact, than spreadsheets. In some sense, it's even easier to use since the formulation layer can be written directly in a simple text file without intermediate development environments. Of course, R is not ubiquitous, but it is freely available on the internet.

Unlike spreadsheets, R is programming language with the built in capacity to operate over arrays as if they were whole objects, a feature that demolishes any justification for cell-referencing syntax of spreadsheets. Consider the following example.

Suppose we want to model a simple parabola over the interval (-10, 10). In R, we might start by defining an index we call x.axis as an integer series.

x.axis <– -10:10

which looks like this,

> [1] -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10

when we call x.axis.

To define a simple parabola, we then write a formula that we might define as

parabola <– x.axis^2

which produces, as you might now expect, a series that looks like this:

>[1] 100 81 64 49 36 25 16 9 4 1 0 1 4 9 16 25 36 49 64 81 100.

Producing this result in R required exactly two formulas. A typical spreadsheet that replicates this same example requires manually typing in 21 numbers and then 21 formulas, each pointing to the particular value in the series we represented with x.axis. The spreadsheet version produces 42 opportunities for error. Even if we use a formula to create the spreadsheet analog of the x.axis values, the number of opportunities for failure remains the same.

Extending the range of parabola requires little more than changing the parameters in the x.axis definition. No additional formulas need be written, which is not the case if we needed to extend the same calculation in our spreadsheet. There, more formulas need to be written, and the number of potential opportunities for error continues to increase.

The number of formula errors that are possible in R is directly related to the total number of formula parameters required to correctly write each formula. In a spreadsheet, the number of formula errors is a function of both the number of formula parameters and the number of cell locations needed to represent the full response range of results. Can we make errors in R-based analysis? Of course, but the potential for those errors is exponentially smaller.

As we've already seen, too, R operates according to a linear flow that guides the development of logic. Also, variables can be named in a way that makes sense to the context of the problem[2] so that the program formulation and business logic are more closely merged, reducing the burden of inference about the meaning of formulas for auditors and other users. In Chapter 2, I'll present a style guide that will help you maintain clarity in the definition of variables, function, and files.

However, while R answers the concerns of direct cost and the propagation of formula errors, its procedural language structure presents a higher barrier to improper use because it requires a more rational, structured logic than is required by spreadsheets, requiring a rigor that people usually learn from programming and software design. The best aspect of R is that it communicates the formulation and logic layer of an analysis in a more straightforward manner as the procedural instructions for performing calculations. It preserves the flow of thought that is necessary to move from starting assumptions to conclusions. The numerical layer is presented only when requested, but logic and formulation are more visibly available. As we move forward through this tutorial, I'll explain more how these features present themselves for effective business case analysis.

This document is a tutorial for learning how to use the statistical programming language R to develop a business case simulation and analysis. I assume you possess at least the skill level of a novice R user.

The tutorial will consider the case in which a chemical manufacturing company considers constructing a new chemical reactor and production facility to bring a new compound to market. There are several uncertainties and risks involved, including the possibility that a competitor brings a similar product online. The company must determine the value of making the decision to move forward and where they might prioritize their attention to make a more informed and robust decision.

The purpose of the book is not to teach you R in a broad manner. There are plenty of resources that do that well now. Rather, it will attempt to show you how to

- Set up a business case abstraction for clear communication of the analysis
- Model the inherent uncertainties and resultant risks in the problem with Monte Carlo simulation
- Communicate the results graphically
- Draw appropriate insights from the results

So, while you will not necessarily become a power user of R, you will gain some insights into how to use this powerful language to escape the foolish consistency of spreadsheet dependency. There is a better way.

To follow this tutorial, you will need to download and install the latest version of R for your particular OS. R can be obtained here. Since I wrote this tutorial with the near beginner in mind, you will only need the base install of R and no additional packages.

1: You will find other examples of spreadsheet errors at Raymond Panko's website. Panko researches the cause and prevalence of spreadsheet errors.

2: Spreadsheets allow the use of named references, but the naming convention can become unwieldy if sections in an array need different names.

Labels: business forecasting, decision analysis, Monte Carlo simulation, R language, risk analysis, uncertainty

In a previous post, I discussed the meaning of expected value (EV) and how it's useful for comparing the values of choices we could make when the outcomes we face with each choice vary across a range of probabilities. The discussion closed by comparing the choice to play two different games, each with different payoffs and likelihoods. Game 1 returns an EV of $5, even though it could never actually produce that outcome; and Game 2 returns an EV of $4, also being incapable of producing that outcome.

But let's say that you hate it when C-3PO tells you the odds, so you commit to Game 2 because you like the upside potential of $15, and you think the potential loss of $5 is tolerable. After all, Han Solo always beat the odds, right? Well, before you so commit, let me encourage you to look into my crystal ball to show you what the future holds…not just in one future, but many.

Here's what we see. For Game 1, the accrued earnings range from ~$4,500 to ~$5,500 by the 1000th step.

But take a second look. That second-from-the-top band for Game 2 converges on the second-from-the-bottom band in Game 1. These are the upper and lower 5th percentile bands of the outcome, respectively.

So it is in the fantasy of Hollywood that the mere mention of long odds ensures the protagonist's success. Unfortunately, life doesn't always conform to that fantasy. Over a long time and many repeated occasions to play risky games,

So how can you know when you will be lucky? You can't. The odds based on short-term observations of good luck will not long be in your favor. Your fate will likely regress to the mean.

(This post was also simultaneously published at the Lumina Blog.)

Labels: Analytica, expected value, risk analysis, uncertainty