2013 is the International Year of Statistics. To kick off the Year, the International Indian Statistical Association (IISA) recently organized a conference on Statistics, Science and Society at Chennai, and, to my surprise, invited me to give a talk!
I called my talk “The Statistician’s New Face”, although I could also have called it “Some Thoughts on the Changing Paradigms and Emerging Meta Currents in the Statistical Landscape Over the Past Four Decades” — a title that might have appealed to the retired sub-editors of The Hindu, but practically no one else.
This title aroused enough curiosity when it was posted online; in fact I received multiple emails from obscure publishers offering to publish my “book” … when all I had was a five-line abstract!
Thinking up the title was the easy part. The harder part was to spin a story around this title. If today’s statistician is going to have a new face, what was the face of yesterday’s statistician? I couldn’t think of any … till I finally decided to pretend that yesterday’s statistician was faceless.
Actually, that may well be true. How often do we ‘see’ the modeler or the number cruncher doing his business? How often do we see the statistician come ‘onstage’ to explain his analysis? How often has the cricket lover seen the face of Anandji Dossa, B B Mama, Sudhir Vaidya or even dear old Mohandas Menon?
Who then qualified to be the statistician’s ‘new’ face? I eventually decided that I would pick Nate Silver, who is now a successful blogger and author, and who very confidently predicted an easy victory for Barack Obama in the 2012 election. I wasn’t surprised when many of my friends objected, making comments like “Oh, he merely computes weighted averages!”, or “Can he ever write a research paper for The Annals of Statistics?”
Perhaps Nate could write for the Annals, but I’m guessing that he wouldn’t. Today’s research paper doesn’t go very far. In fact a well known calculation is that the average number of readers of a research paper is less than 2 … and this includes the author!
Indeed that could be a part of the problem: good statisticians write good stuff in serious journals instead of shouting out their good ideas from rooftops and Twitter timelines! Nate has been among the first to spread the statistical idea via blogs and social networks.
As someone who studied statistics back in the 1970s I’m from the statistician’s silent and faceless era. Our ‘tools’ were squared sheets, pencils with sharp points, a calculating machine (digital calculators had appeared, but were frightfully expensive), and the hallowed statistical table that enabled us to decide if we can reject our ‘null’ hypothesis.
We weren’t expected to talk too much. In fact we weren’t even expected to use too many words, we had to tell our story using only mathematical symbols! This past still haunts statisticians of our generation; most still find it hard to give a popular talk without bringing in the sigmas and rhos.
I sometimes wonder how Statistician Rip Van Winkle would react if he woke up after a 40-year slumber. After a massive mathematical (and cultural) shock he would figure out that the computer has changed everything. The data scarcity of his era has become a data deluge! And it isn’t just the incredibly large data volume that would stupefy him. He would be bewildered by the data variety (text, images, video, animation) and the data velocity (the signals keeps coming in at astonishing pace).
So what else has changed? For starters, a sample size of 30 would no longer be considered ‘large’; it would be easy to ratchet up a billion data points with all those electronic sensors piling up information once every second. From a world of “we have tools, but no data”, we’ve quickly moved to a world of “we have lots of data, but not enough tools!”. [It reminds you of that song in Roti Kapada aur Makaan about paisa and shakkar; see this video between 5:10 and 5:30.]
There is also another big paradigm shift: in the past we had to get data to fit known models, now we can get models to describe the infinity of data. Indeed we are reaching a point where people like Chris Anderson are questioning the need for theory; the Google way, of correlating variables based on the frequency with which they are cited together, could well become the only way!
We are also discovering that the best information is embedded in text, not numbers. Another imperative of this data deluge is that we need to get even more visual; nobody can find a story hidden in a heap of data, we need images and animations to really understand what’s going on.
This transition from data to ‘big data’ is going to become a huge game-changer. Tools, mindsets and paradigms will need to change as we prepare for this impending data revolution. In particular we need to ask how India can get ready to join this big data race. My talk at Chennai contained eight prescriptions that could get India going.
One, instead of the thousand statisticians (or ‘analysts’) that we currently produce every year, we will have to produce a hundred thousand! When I said that at Chennai, predominantly to an audience of teachers and researchers, I got that you-must-be-kidding look. They told me that our universities simply can’t do this. Of course they can’t … in fact our universities won’t even know how to do this. We will need new initiatives in public-private partnership. After all universities didn’t produce all those workers India needed to join the IT race either!
Think of the Indian engineering or manufacturing industry. While we need enough engineers to be on top of things, we need significantly more assistants … and even more operators. This sort of training can easily be imparted outside universities: by those mushrooming coaching and training institutions, and by the multitude of online teaching tools that are popping up on the Internet. The Khan Academy, pioneered by another Salman Khan, is already becoming a trend-setter and a harbinger of the shape of things to come. In future I expect universities to earn more by how they teach, not by what they teach.
Indeed that was my prescription two: teach better! I am worried — even pained — at the way we teach statistics and probability in high schools and early college. When I meet my students before starting a statistics course I begin by asking: “Do you like statistics?” A few hands go up, more out of youthful exuberance and the desire to please the teacher. But when I ask: “Do you love statistics?” I again get that you-must-be-kidding look. Statistics is seen to be something with painful formulas and agonizing theorems … and all that horrible mean-median-mode stuff. How could you ever fall in love with it?
Things get even worse when we teach probability. Practically every student thinks probability is “number of favourable cases divided by the total number of cases” ignoring the caveat that events must be equally likely, and no student really understands conditional probability. I’ve found that the best way to explain the concept is to ask what’s the chance that India will win an ODI game if Tendulkar gets a century. Incidentally, Tendulkar is often a victim of conditional probability himself as Arunabha Sengupta recently explained so brilliantly.
Another malaise in teaching statistics is that our books, and our syllabi, haven’t changed. We still teach topics and themes that are no longer relevant. My prescription three therefore is to cut out all the old riffraff. We should stop teaching the old theory and those old derivations. It is now clear that the really useful parts of statistics deal with clustering, classification and regression. And with brutal computing power now available, simulation is the way to go (as an aside, this could change how rain rules in cricket are formulated).
Our training must also focus on the new tools at the workplace. For example, Excel spreadsheets are becoming the universal platform to store and manipulate data, and most of the data processing involves playing around with SAS or R scripts. Our universities never teach such stuff!
Four, we must learn to communicate smart. There is a strong disconnect between users and developers of analytical solutions. I remember an anecdote where a magnate of the shipping industry required an analytical solution for his scheduling problem. When he sought professional help, his statistical consultant was very cooperative: “No problem. Just write down your governing equations and I will provide you the uniformly most optimal dynamic programming solution!”
The magnate never came back! It is this attitude that makes the statistical analyst a ‘go-away’ person instead of being a ‘go-to’ person like a doctor or a lawyer. Even the delivery of analytical solutions needs to be more stylish: attractive and animated dashboards must be created that can deliver solutions … even on-the-move!
Five, it is very important to get visual. We’ve all heard that old one about a picture being worth a thousand words (or numbers). A corollary of sorts is that if there are billions of numbers then a picture is the only way to convey the message: medical scans, pressure distributions on aircraft structures, weather predictions … all these are applications crying out to be visualized. In fact, when we were analyzing player performances on rediff.com during the 2007 cricket World Cup we used visualizations using Chernoff faces; Steve Chang did something similar to depict of performance of baseball coaches in 2008.
That is perhaps the way to go. We must, six, integrate analytics more intimately with our daily lives. We have to carry forward the message that statistics and analytics is for everyone … and every day. My old friend Sastry Pantula, who became the first Asian President of American Statistical Association (ASA), once gifted me a T-shirt listing the A-Z of statistical applications: in retail, telecom, healthcare and finance. Indeed, statistics is also poised to make deeper inroads in gaming, sport and travel.
One way to ensure that statistics stays in public cynosure is to devise more and more indexes that appeal to the common man and the opinion maker. There is already enough talk about Sensex, the 100 crore box-office film and the TV show TRP ratings. We should have more indexes; perhaps to measure the popularity of hospitals and colleges (the current surveys tend to be dubious) or the state of an ODI cricket match (we introduced the idea of a pressure index, but it didn’t take off). The big idea is to make analytics more sexy, something that Google’s Hal Varian believes has already happened.
Seven, Indian statistics and analytics also needs more idols. I ran an impromptu poll on Facebook (18 responses) and the grand old man of Indian statistics, Prof C R Rao, now 92, continues to be the most popular idol, followed by the eminent probabilist Prof S R S Varadhan who won the coveted Abel Prize in 2007. I know of so many other worthy analysts, but the are never sufficiently admired or visible; a rare exception is Rajeeva Karandikar who appears on CNN-IBN to make his election seat predictions.
We need more statistical faces in public view; while it would be impossible to match film actors and cricketers, we at least need to match doctors, writers, lawyers or engineers!
Finally, my prescription eight, and certainly the most compelling, is to chase the dollar. Talking to a cross-section of Indian consultants, I find that the Indian marketplace is still unwilling to pay the right sort of money for analytical solutions; consultants have been asked if the same solution or service cannot be offered at one-tenth the cost! It boggles the imagination when you realize that a Ravindra Jadeja can be offered two million dollars to dish out his average cricketing stuff while an analytical solution that can significantly enhance quality or performance is at best priced at a few lakh rupees.
It would therefore make much more sense to chase the dollar. Everything that one reads or sees in the West suggests a growing respect and appreciation for the world of analytics. With big data likely to overwhelm the business space, the best opportunities for India will come from the US … just like it has been with IT in India. So the time to come up with killer products and solutions in analytics and statistics is now!
— I would like to thank Rajeeva Karandikar for advising me to make one correction, and to Sateesh Kumar for correcting a typo.