Over 200 years ago Benjamin Disraeli1 said: “There are three kinds of lies: lies, damned lies and statistics”. This aphorism is still immensely popular. Every time someone wants to ridicule statistics and statistical thinking he quotes Disraeli. Then, with more relish, he tells you that other one linking statistics to a bikini.
Most laypersons think of statistics as something that starts with “mean-mode-median” and then wanders off to “variance or standard deviation” with probability coming in somewhere. Very few realize, or accept, that statistics can be a powerful analytical tool. Since it’s hard to change perceptions, statistical analysis must be sold with a new name: “Analytics”. This new name works better; the top management “buys” it more easily. And, one day, when they stop buying “business analytics”, it will be re-packaged as “business intelligence”.
The major reason why analytics has risen appreciably in public esteem is because computers and computer software can now deliver its power much more effectively. A market researcher, worried that his customer is no longer excited by percentages and pie charts, is now asking for key driver analysis; which is really good old regression in the “analytics” garb, and which can now be accomplished in a split second. The financial analyst, who doesn’t want to look beyond his Excel spreadsheet, is asking how he can make more intelligent profit predictions. He has seven deterministic variables (such as sales during last three months, or the Sensex value on 1 Jan 2007) that he can manipulate very well, but also three stochastic variables (e.g. market volatility) that he just can’t get a handle on. So why not simulate these variables, especially now that a very powerful pseudorandom number generator like the Mersenne twister is available?
Or look at all those telecom operators who have mushroomed in the last five years. During the heady, early days it was all about selling an exciting dream to the common man. It was about telling him how he can carry an affordable mobile telephone in his pocket with no ugly wires sticking out. But, with growing competition, the battle now is to achieve the highest ARPU (“average revenue per unit”, but that’s a very unfashionable expansion!). So operators now worry about “churn” (what fraction of your connections will you lose to your rival), revenue “leakages” (when someone avails of a service without paying for it), customer “segmentation” (is your user a chic socialite or a rich man’s spoilt daughter who calls up every hour, even from Greece or Turkey, or is it a taxi driver who only uses his handset to make “missed calls”?) and billing “metrics”… all situations crying out for “analytics”.
This clamour for analytics can only grow as computer networks proliferate, and more and more data becomes digital. We are currently living through times when data capture and transmission is getting continually cheaper and more efficient. Soon we will face a situation where there’s a data overload, but no one knows what to do with all this data. The retailers have their SCM (again, it’s not fashionable to say “supply chain management”) data, the “pharma” companies have their clinical trials data, the customs and excise boards have their data, the financial institutions have their investor data and Captain G R Gopinath has all his data about Air Deccan flights and passengers! Only statistical analysis, or analytics, appears capable of spotting the method in this madness.
Indeed analytics can adapt itself admirably to every offering on the plate; for example, cricket that is very much the current flavour. The cricket fan is a rather curious specimen: his surreptitious eye is glancing at the score, asking rates, or D/L par scores, but his public pretence is to say that he only cares for the artistry of Rahul Dravid or the lazy elegance of Inzamam-ul-Haq; not cold and unfriendly numbers2. He is also rather gullible: if the ICC suddenly declares that South Africa is the world’s No. 1 ODI team, just because Australia lost a few and South Africa won a few in friendly home conditions, he accepts this verdict timidly. The ICC can apparently get away with such lies and damn lies3.
One example of cricket analytics that we are attempting right now involves the “pressure index (PI)”; to quantify the ball-by-ball probability of a team chasing a target in an ODI game. If the odds are even at the start of a chase the PI is exactly 100. If Sehwag hits four boundaries in the first over, the PI might drop to 93 or 94. If he gets out in the following over, the PI would rise to 102 or 105. We find that the PI gives a realistic picture of the “pressure” felt by the chasing team.
Our formulation essentially tracks the ratio of “runs needed to win” and “runs that the team can potentially score” on a ball-by-ball basis. The potential score is derived from a resource table not dissimilar to the D/L or Jayadevan tables. The ratio is wrapped in a symmetric function, that also ensures that the pressure approaches zero if a win is imminent, and 200 if defeat is imminent. We also add a tweaking parameter to make the PI sufficiently perky and responsive.
The reaction to the PI has been mixed so far; and of course many have called it lies and damn lies. We were also asked: “why just one number? Can one number hope to catch the variability and romance of ODI match situations?”.
It is indeed ridiculous to make any such claim; no single number is usually bestowed with such magical properties. And, yet, it’s amazing how often we are guilty of making such single numbers hallowed and sacrosanct: the impact factor of publications is one number that enjoys an undeserved halo; the ARPU in the telecom sector is perhaps another, and there was a time when CSIR believed that it could measure the performance of its R&D establishments by simply measuring its external cash flow (ECF).
1There’s some confusion about who really said this; it has been also attributed to Mark Twain and a French politician named Labouchère. I talked about this confusion recently with a visiting English lady. She loyally argued that only an Englishman could have said this; I would have thought it’s not something to be terribly proud of!
2I am reminded of the time when the Indian leg spinner Narendra Hirwani was playing an ODI match in Australia in 1991-92. His batting ability was pathetic, and he could bat only at No. 11. As Hirwani came out to bat, his fellow Australian radio commentator asked my brother, Harsha Bhogle: “How bad is Hirwani as a batsman?”. Harsha was briefly at a loss for words, but quickly chose to go analytical. “Well, Tim, if you were to make a playing eleven of only No.11 batsmen, then Hirwani would bat No.11 in this team!”.
3The biggest ICC lie was that absurd rain rule in the 1992 World Cup that denied South Africa (SA) the opportunity of reaching the final: requiring 22 runs to win in 13 balls, with four wickets in hand, the ICC rule suddenly revised SA’s target to 21 in 1 ball after a brief shower interrupted play. This shocking verdict triggered off many detailed and scholarly studies on rain rules, including V Jayadevan’s fine effort that Current Science published some years ago.
–This article first appeared in Current Science in 2007.