“Is there any point to which you would wish to draw my attention?” | “To the curious incident of the dog in the night-time.” | “The dog did nothing in the night-time.” | “That was the curious incident,” remarked Sherlock Holmes.
Not everyone can infer like Sherlock Holmes, and even Holmes himself couldn’t have done much without data. Indeed he once exclaimed impatiently: “Data! Data! Data! I cannot make bricks without clay.”
Holmes was fortunate. He only lived in Conan Doyle’s make-believe world where all the pieces of the jigsaw puzzle were available inside the cardboard box. He only had to ingeniously reconstruct the picture pasted on top of the box!
But what if some of the pieces are missing and you have to first go out and find them? What if some of the pieces required actually have to be fabricated? Or what if there are thousands of pieces available when only a few dozen are required to complete the picture?
These are fascinating problems that data analysts have to grapple with before they can make the right inference in the real-life world. ‘Finding’ the missing pieces is a bit like querying an external database; ‘fabricating’ a new piece is like undertaking some specialized data analysis; and choosing a small subset of useful data from a big data set requires compression techniques and evaluation of probabilities.
A technical discussion on such questions would be too tedious for readers of this blog. So let us create a ‘next gen’ Sherlock Holmes and return to the idyllic world of Hindi film music to get a flavour of this business of analytics and inference.
Listen to the song below first, and then we’ll meet our ‘next gen’ Holmes, and, of course, the ‘next gen; Watson.
Here’s our question for the new avatar of Sherlock Holmes: Who is the music director of this song?
Given that unmistakable beat of horse hooves pounding away, Holmes would surely think of O P Nayyar. But can he prove this rigorously? Here’s how his thought process goes:
“The female voice is Asha, not Lata. O P Nayyar never used Lata’s voice; so it could indeed be OP … “
But Holmes needs irrefutable proof. He replays the video:
The hero is Manoj Kumar. The heroine is Sharmila Tagore … what does the hero-heroine pair of Manoj Kumar-Sharmila Tagore tell us?
At this point Holmes quickly constructs a database query to count the number of Manoj-Sharmila films, and a second later tells Watson that he knows the answer.
“Manoj-Sharmila had only one film together: Sawan Ki Ghata; and O P Nayyar was the music director of Sawan Ki Ghata. So it has to be OP!” QED.
Most whodunit plots provide clues that are ‘complete’ and ‘consistent’; a thoughtful chain of deductions will therefore always uniquely reveals the criminal. The big difference in our example is that the eventual solution required access to an external database.
Since this seems like fun, let us construct another example based on a Hindi film song.
Our question to Holmes this time is: Who is the lady singing this duet with Rafi?
Next morning, Watson can hardly contain his excitement: “Holmes, I have cracked your puzzle! This is a duet from a 1966 film called Mamta. While it is hard to decipher the lady’s voice in the duet, I was fortunate to find a solo version of the same song … which was sung by Lata Mangeshkar.”
Holmes merely smiled in response. “One should always look for a possible alternative, and provide against it”, he replied, even as he submitted an online query on his Android cellphone. A beep announced the response to his query. Holmes chuckled as he saw the response and then fired a second online request. “I would consider the year in which the film was released, Watson”, he said before walking away.
“So it was Lata Mangeshkar, wasn’t it?”, Watson asked as they sat down for dinner.
“Of course not! It was Suman Kalyanpur”, Holmes replied. “I was well aware that in the mid-1960s, at the peak of their careers, Rafi and Lata were not singing duets together. My first database query was to list all the female singers who sang duets with Rafi in Mamta. As I expected, Lata did not figure in this list. But, to my surprise, I found that Rafi had sung a duet with both Suman Kalyanpur and Asha Bhosle. I therefore had to submit an online request for a cepstrum analysis to analyse the individual voices and confirm that this duet was indeed sung by Suman Kalyanpur.
Seeing Watson dumbfounded Holmes told him: “Education never ends.”
This is indeed true. The science of inference is not just a matter of simple logic now: we also need to query databases and undertake digital analysis.
Indeed the problem could get even harder, as Holmes and Watson discovered on a wet and dismal monsoon-ravaged evening in Mumbai when a distraught and dishevelled Mr Pestonji burst into their chamber. “Only you can save me sir”, he implored, “I have bet all my life’s earnings that this song is composed by Jaikishan, and I now have to provide the evidence!”
This Rafi solo is from the 1967 film Brahmachari with the musical score attributed to the Shankar-Jaikishan (SJ) duo. It is however well known that Shankar and Jaikishan composed their tunes separately for most of their career span. So was this song composed by Shankar or Jaikishan?
Holmes spent a troubled night pondering over the problem and playing every SJ song available on his iPod. He found a few leads but they were not enough to conclusively establish Mr Pestonji’s definitive assertion.
The film was released well before Jaikishan’s death in 1971 so it could certainly be Jaikishan | The song was written by Hasrat Jaipuri who collaborated far more frequently with Jaikishan than Shankar | A lengthy background interlude precedes this song; Jaikishan often handled background scores …
The real difficulty here is that this jigsaw puzzle has too many pieces, and it is not even certain if any subset of pieces could ever complete the jigsaw picture. To provide answers we would have to enter the realm of probability theory; the best answer we can perhaps give is that there is less than a 10% chance that this song is not by Jaikishan.
As Holmes would later tell Watson: “We balance probabilities and choose the most likely. It is the scientific use of imagination.”
Holmes also envisioned a new wind coming (“such a wind as never blew before”). This “big data” wind would bring data of great volume, velocity and variety. But it will be “God’s own wind none the less, and a cleaner, better, stronger land will lie in the sunshine when the storm has cleared”.
And in that cleaner, better and stronger tomorrow of big data analytics we might finally know for sure that it was indeed Jaikishan who composed that memorable song in Brahmachari.
— Cepstrum analysis was used in the investigation of the Airbus A320 crash at Bangalore airport on February 14, 1990. But that’s another story for another time.