A couple of years ago I was asked to give a talk at the C R Rao Advanced Institute of Mathematics, Statistics and Computer Science in Hyderabad on what we do, or might do in the future, with all the telecom data that we keep accumulating. I spoke for about an hour and felt happy and privileged that Prof C R Rao himself spent a few minutes listening to the talk.

Later I was asked to write up the talk for the proceedings volume, which I did. Some weeks ago the editors advised me to change my presentation and writing style if they wanted my article to be accepted for publication. I chose to withdraw my article and post it, instead, here on my blog.


Data avalanche

The last five years have seen a massive upsurge of data; we’ve generated 90% of the entire world’s data in the last two years.

This change is dramatic. 20 years ago, we had all the data analysis tools in place, but data was scarce. Today there is a frightening abundance of data, but nobody quite knows what to do with it.

This is especially so with telecom data. Not very long ago most of the telecom data was probably contained in the humble telephone directory. A telecom switchboard was manually operated and involved little more than plugging a wire in and out of pre-defined sockets.


The digital telecom switch changed all that. Today, a near infinity of data is being generated. In the beginning only a small fraction of the telecom switch data was actually used – and mostly for billing. The rest of the data simply flowed away, because it wasn’t profitable to either store or exploit it. If it cost you $100 to store and analyze data, and you could at best earn $50 from it, why would anyone take the plunge?

But things are quickly changing. Data storage is getting dramatically cheaper, and data retrieval is getting unbelievably faster. So there’s now a good chance that your $100 investment will return you $120 instead, or even $1000.

That’s why everyone is jumping onto the bandwagon and making wild promises. “We’ll make your data talk and sing”, one analytics enthusiast recently told me.

I’m predicting that there will be no easy song and dance. You can’t ‘read’ telecom data as you might read the next book from Jeffrey Archer; it is usually an endless stream of zeros, ones and mumbo-jumbo. There is almost always a great story hidden within the data, but you need immense skill to ferret it out of the mad rubble.

The big challenge

The major challenge arises because the data has immense volume, keeps coming in at great velocity, is characterized by variety (numbers, text, voice, and video) and needs to be checked for veracity. To respond to this challenge we need a combination of computer science, machine learning and statistical analysis.

Most people don’t understand this. Indeed all the big data hype suggests that you only have to pour all this data into a calculating furnace – and exciting and lustrous inferences, that will save millions of dollars, will magically pop up. It is a little harder than that.


The volume and velocity require rapid parallel processing of streaming data. You might want to quickly take action to unlock a traffic jam on a major highway, offer a prompt advisory before an approaching killer cyclone, or quickly seek to curb the transmission of a deadly infectious disease. Speed is essential in every case; you must use parallel algorithms to analyze the data as it comes in because there’s no time to store, reflect and retrieve.

The best analytics in telecom is often achieved by leveraging location data. Today’s smartphones have the wonderful ability of continually sending location information to the service provider. If you marry this information with global positioning systems, you can pinpoint the user’s location at every point of time on a map with an accuracy of 100-200 meters.

Statistical techniques provides the vital cutting edge to telecom analytics in two big ways: retrospective analytics, where, for example, you look at large volumes of subscriber data – demographic, location and preference profiles – to enrich the streaming data, and predictive analytics, where, based on previous behavioral patterns, you might predict that the subscriber is likely to change his service provider with a high probability in the near future.

Finally the icing on the telecom analytics cake is provided by attractive and evocative data visualization. Nothing conveys the gravity of a situation, or the depth of the investigation, better than pictures and animations. A growing number of telecom analysts embed their analysis and recommendations on high resolution maps. As always, a picture is worth a thousand and more words.

To be sure all these techniques must be based on the golden first principles of data analysis. The data must first be correctly ‘gathered’, i.e. by amassing all the useful pieces from the rubble, and discarding everything that’s unnecessary or irrelevant. Next, it must be ‘mediated’, so that the nonsensical looking data is transformed into something understandable and usable. For example, we must be able to say that the subscriber with number 123456, spent so much time and money respectively on local calls, long distance calls, texting, data downloads, gaming etc. Finally the data must be ‘enriched’ by adding more information about subscriber 123456. Where does he stay? How old is he? What are his hobbies?  Is he even a ‘he’?

So while the drama, romance and mystery seems to be about models, algorithms, visualizations, animations and inferences, that’s merely the glossy end of the picture. Real success is assured only if the data preparation, aggregation, mediation and enrichment happen correctly. This backroom pain and labor is actually responsible for perhaps 90% of the eventual success.

What are the current big data challenges that telecom service providers are facing? In a widely circulated white paper, IBM identifies the following six big data challenges for telecom service providers:

Identify the location of a potential customer – to better deliver services and promotional offers

Support intelligent marketing campaigns – to, for example, tell a customer that the book he’s desperately searching for is available in a book shop five minutes away

Make next best recommendations – to, for example, placate a customer, angry with frequent call drops, with 10 more hours of free talk time

Glean insights from social media – to quickly determine the responses to a new film release or new product launch

Quickly detect fraud – if the same SIM card is making calls at the same time from countries 8000 km away something is surely amiss

Instantly monitor network performance – if a power outage has disrupted connectivity in a remote location quickly arrange for corrective action.

In this note we will describe several examples and situations that highlight how big data is going to completely change the way we do telecom analytics.

Contain churn

Let us start with the mobile phone, now everyone’s ubiquitous companion. Every telecom service provider seeks to increase the number of its mobile phone subscribers, and hates it when a subscriber migrates to a rival service provider. In telecom jargon, this is the problem of customer churn. The problem is serious because empirical studies suggest that gaining a new customer is several times more expensive than retaining an existing customer.

Why does a customer churn, and what does he really want? From the voice perspective the customer wants good connectivity, crisp conversations, seamless handovers and no call drops. From the data perspective he wants quick downloads, richer content and no traffic jams. And, for both voice and data, the customer wants the lowest costs. A customer will churn if he is unhappy with the quality of his connectivity and the associated costs. To mitigate churn, therefore, the service provider must identify customer preferences and continually monitor the customer experience.


The initial naïve view is that there must be scores of customer preference types, and it would be impossible to identify and deal with all of them. After all, the businessman who has long prime time conversations with a customer in Dubai or Alaska is so different from the thrifty man from Pune who only believes in missed calls; likewise, the streaming video addict is so different from someone who checks his mail only once in four hours.

In reality things are much less complicated. Less than a dozen segments usually suffice to classify most customer preferences. We use cluster analysis to identify segments so that customers within a segment share similar preferences, and customers between clusters have markedly different usage preferences.

Once these segments are identified, the service provider devises usage plans that best address the preferences of customers in each segment, thereby significantly reducing the tendency to churn.

Monitoring the customer experience pro-actively is just as important to contain churn. There are always tell-tale signs; for example, has the customer recently had angry interactions with the call centre, made critical observations on Facebook or Twitter, or unexpectedly changed his usage pattern?

The classical approach to treat such symptoms was to create loyalty programs, offer more discounts, or throw some gifts and freebies. But as mobile usage data becomes richer and more pervasive, we now have more opportunities to extend that personal touch.

Imagine, for example, the following scenario when you make an angry call to complain. The operator greets you cordially and says “I’m guessing you are calling to report the sudden spurt in call drops from your office location. I’m sorry we are having power failure issues in that tower. We hope to fix the problem by 6 pm this evening, but, to show that we are really sorry, we are offering you Rs 100 more of talk time!”

This is no longer a fantasy scenario. Indeed there are other ways too in which you can provide the personal touch: alerts on mobile data usage, tips on bandwidth hogging apps and services, or bundling in an attractive but unexpected offer for free usage of WhatsApp or Candy Crush.

This was unthinkable even a decade ago, but big data analytics – in particular the use of a wider variety of data available on social networking sites – now makes this both possible and feasible. And we can do more: determine how we can increase revenue from existing loyal customers, identify risky customer behaviour sooner and take corrective steps faster, reduce the customer retention cost, or estimate the lifetime value of a customer.

With big data analytics we can make sure that the grass is never greener on the other side.

From the switch to the bill

How would an inter-state bus transport service decide the price of its ticket to go from, say, Bangalore to Mumbai? It would consider the purchase cost of the bus, fuel costs, staff salaries, road taxes, toll charges and maintenance costs … and then add the expected profit margin per cent. Each of these cost components can be easily estimated, so it is not difficult to determine the final bus ticket cost.

If a telecom service provider needs to determine what rates to offer on his rate sheet, the problem would be decidedly harder. The calculation would involve a detailed breakdown of all the individual costs: connect cost, termination cost, circuit usage cost etc., and would further need to factor in all the associated contracts and agreements between different service providers who participate in the service delivery.

In practice – and this isn’t therefore such a surprise – the rate is often based on what the competitor charges. But is this rate right? What is the telecom carrier’s true cost?


Determining the true cost requires intelligent use of the different elements of telecom analytics like mediation, costing, enrichment, and billing. Such analysis often throws up unexpected results, and therefore holds the promise of a significant competitive advantage. For example, one may find that the big revenue-earning trunk route is actually not profitable. A lot of money comes in, but much more goes out from the back door!

Big data analytics promises to change all that. With deeper and faster insight into actual costs, and with the availability of streaming real-time usage data, one should expect a dynamic variable pricing scheme to enter the mix of things. Indeed the old circuit-switched networks will themselves give way to packet-switched networks; all variables will then change, and it will become a whole new ball game.

Let us hub

When you call your brother in Seattle or Secunderabad how is your call routed? There could perhaps be six possible paths to make this connection, so which path must one choose?

In all likelihood there is a default algorithm at work that advises the switch which path to take, but one can never be sure if this path is ‘optimal’. There could well be an alternate path that costs marginally less while providing the same voice quality.

An interesting analytical problem is to identify the optimal routing path. The general idea is to break down the path into its individual components, calculate the cost of each component, and sum the costs for each path to obtain the (optimal) least cost path. Calculating individual costs isn’t trivial; it requires a look-up of multiple rate sheets, and awareness of the many intricate details in contracts and agreements signed among the interacting telecom service providers (e.g., are costs fixed or variable, usage-sensitive or allocated, wholesale or retail, one-time or recurring?).

If every call connected by the service provider is transmitted along the optimal path there is a marginal saving on each call. This small saving, when aggregated over all calls day after day, grows into a significantly large – and recurring – benefit.


The bigger piece of cake however comes from ‘hubbing’ – by positioning oneself at the hub where telecom deals are made. All telecom service providers publish rate sheets (e.g., an out payment of $0.04 per minute) that depend on their installed capacity on a given sector and its utilization pattern. These rate sheets contain an abundance of information, and offer opportunities to broker some really attractive low cost deals if one acts intelligently and with alacrity.

In tomorrow’s world of big data expect the hubbing to become even more frenetic, and acquire the velocity and frenzy that one associates with stock markets today.

Pump it up

Every US citizen has the right to connectivity – even if he stays in the middle of nowhere. Providing this statutory ‘last mile’ connectivity can be quite a nuisance for long-distance telecom service providers; so they usually prefer to partner with local service providers to travel that difficult last mile.

The local service provider normally charges a very stiff rate for this last mile connectivity (e.g., 5 cents per minute, instead of the normal 0.5 cents per minute), but justifies it by arguing that traffic is expected to be very low and they need to be profitable.


The problem arises when the local partner gets greedy and contrives to ‘pump up’ big traffic along this normally deserted last mile. A favourite ploy is to advertise an adult chat or conferencing service from this remote location. That sends traffic soaring! The loser in this devious plot is the long-distance service provider who gets billed for tens or thousands of dollars more and, worse still, doesn’t even discover this breach till it is too late.

The way out is to use more analytics; enriched switch data can immediately spot such traffic pumping and the long-distance carrier can immediately initiate corrective action. One recalls a celebrated legal battle some years ago in which a US long-distance service provider successfully won a legal battle to defeat this scourge.

Traffic jam

Managing a telecom network is not very different from managing a public transport network. As usage grows, the network too must grow if we are to avoid traffic congestion.


It is easy to spot such congestion: phones beep and groan before you hear that reassuring ‘it is ringing’ sound, the ‘I’m fetching data’ icon keeps going round and round for what seems like eternity, and your phone battery drops below 50% almost before you can say “Jack Robinson”.

There are multiple reasons for this congestion.

We have many more devices now – phones, tablets, smart TVs, PCs, laptops – and they are all greedy for data. The demand for data is rising, but the bandwidth can’t cope up with all this demand.

Today’s data services often operate in the ‘always on’ mode. The device is apparently idle, but it is still connected to the network, and background apps continue to upload or download data.

 Data plans are getting cheaper, and download and upload options are getting unlimited. This encourages a lot of users to set up data transfers and literally go to bed.

The popularity of social media portals like Facebook, Twitter and LinkedIn, and of instant messaging apps like WhatsApp, continues to grow exponentially. Social get-togethers today involve high resolution photography, audio and video recordings and their instant transmission to the rest of the world.

 Business and engineering is getting rapidly data-centric. Almost everything that happens gets computed, plotted, recorded, analysed, replayed and quickly stored ‘upstairs’ on the cloud. This requires massive bandwidth. We are also reading how stock markets can be manipulated by high velocity data; such subterfuge would require this massive bandwidth to become doubly massive.

 Content is getting increasingly globalized. Today’s reader in Bangalore or Mumbai doesn’t just read The Times of India; he also reads New York Times. He doesn’t just live stream cricket from India; he also live streams football from England and Spain. This forces network carriers to offer a significantly higher international bandwidth.

While this congestion problem must eventually be resolved by creating more connectivity and building more roads, it is just as important to make sure that existing networks are managed optimally, and any threat of a traffic jam is quickly identified before it gets aggravated.

Typically networks are grown by positioning base transceiver stations – active devices that facilitate communication between mobile phones and the network – at numerous chosen field locations so that they span the entire region being serviced by the service provider. When the network needs to grow we simply increase the density of the device installations.

These devices are usually housed in protected shelters and an information system is constantly monitoring the performance and the uptime, rather like a roving helicopter hovering over the region’s road networks.

The analytics challenge is to design an information system that provides more and more insights. Today’s systems continually gather information from the field, store it in databases, run queries to monitor key performance indicators (KPI) and send out alarms when there is some disruption in the shelters – a bit like the hovering helicopter verifying that all is well and sending out messages when it spots some traffic congestion.

In tomorrow’s world of big data we should expect much more network intelligence:  pattern recognition models to identify the ‘symptoms’ of an impending congestion and take corrective actions to dissolve this developing ‘network blood clot’, and  more correlations to facilitate better root cause analyses, with the use of naïve Bayes methods to spot richer associations between events.

Indeed it could even be possible to configure a ‘learning’ network that is continually readapting itself so that it always stays efficient and error-free. In other words, tomorrow’s helicopter won’t hover above just to observe and inform; it will also physically interact with the network, and intervene to banish any congestion at any time.

Azimuthal tilt

Many of us don’t realize that the smartphone that we carry is actually a radio. In fact it can also double up as a probe.

It is a radio because when we make or receive calls, or download or upload data, we are actually using radio frequencies. It also becomes a probe because the smartphone is constantly sending out information about its user’s physical location, signal strength and other relevant telecom parameters.


The planning and optimization of radio access networks (RAN) is a significant analytics challenge. A typical planning challenge is to determine the minimum number of towers – and their optimal location – if one wants to cover 90% of the users with a call drop rate never exceeding 20%?

The conventional way to answer such questions is to carry out drive tests; you actually drive through every road in the region with instrumented cars, download the data obtained on a database, and then run models to determine the best locations.

But ‘best’ locations do not always stay best. The number of users might change, new and more buildings may mushroom on hitherto barren terrain, a new highway, mall or a stadium may change the user location or dispersion, or there might be interference from faulty towers. All this will result in degraded performance and call drops.

There will therefore be a need to review and optimize the choice of these locations. The growing number of smartphones – often the real cause for the performance decline – will ironically aid such optimization, because, being also a probe, the smartphone is continually sending such field data. This data can then be successfully mined to draw lessons and inferences – for example that a three-degree azimuthal tilt of the antenna could bring the call drop percentage down from 50% to 20%.

One must marvel that we can make such a recommendation without any of the old trial and error experiments; this writer still recalls how he used to fiddle with antenna positions on his terrace to make the ‘ghosts’ vanish from the TV screen after a wild monsoon storm.

Actually as smartphone usage grows, and more big data machinery is established, things will get even better. We could determine ‘hot’ zones with the highest traffic density and ‘cold’ zones where bandwidth is potentially getting wasted … and then optimize capacity by redistributing the bandwidth. We could provide better connectivity ‘on-the-fly’ when the network ‘discovers’ a massive political rally or a cricket or football World Cup game.

Eventually we could even create an intelligent self-organising radio network! The options and opportunities appear infinite and endless.

Scarce spectrum

In the wireless context, ‘spectrum’ refers to the radio portion of the electromagnetic spectrum. The radio spectrum has a limited frequency range, and the range of frequencies useful for mobile telephony is even more limited. Spectrum is therefore scarce (and expensive).


The challenge for every service provider is to use his available spectrum efficiently and intelligently while exploiting every possible advantage – for instance that the same frequency can be used at two locations sufficiently apart since radio signals spread out and fade over geographic distance. Data analytics solutions current offered include: automatic frequency planning, neighbour list optimization and spectrum reduction and re-farming. However the big breakthrough will come when today’s rigid spectrum allocation regulations give away to a more efficient spectrum sharing and management system. Big data analytics will almost certainly be the catalyst for this inevitable change.

We can’t see spectrum, touch it or hear it, but it will change our lives in ways that we still can’t fully understand or even fathom.

Revving up revenues

A couple of years ago the telecom industry – especially in India – appeared to be in the doldrums. The service provider’s average revenue per user (often called ARPU) was sinking, and, with fierce competition, revival prospects appeared bleak.

The outlook today is much more encouraging, because the focus has changed from voice to data services. Service providers now make money by monetizing data, not transport.  And things are poised to get even better.

Big data has been the game-changer by creating new revenue streams; Gartner for example estimates that ad revenues alone can bring in an additional $300 million every year.


Consider the example below that was unthinkable even a decade ago. The big harbinger of change is the ubiquity of smartphones that continually provide geo-location information.

Imagine that you are a salesman who drives to the western end of your city on Mondays, Wednesdays, and Fridays. You usually start at 10 o’clock in the morning and – on most days – you stop by for lunch at 1 pm at a big food court at the western periphery.

As you enter the food court your smartphone beeps out exciting offers from different hotels located inside: 5% discount from Subway, 3% from Dominos. Depending on your mood you make an impromptu choice, but you wonder if anyone could ever afford a 10% discount instead.

One Wednesday, you are surprised to actually receive a 10% discount from Dominos – and instantly make up your mind that you will lunch at Subway.

This is just one example of big data in action. Based on the geo-location data sent out by your smartphone, the communication service provider has already created a model of your movements. So as soon as you go westwards on a Wednesday, the model determines that you will lunch at the food court with a high probability. It then ‘negotiates’ a profit percentage for itself with Subway. Subway is happy to pay the service provider the extra percentage because of the early booking advantage it gains vis-à-vis its competitors.

Rich intersections

The telecom service provider industry traditionally split into the Operations Support Systems (OSS) and Business Support Systems (BSS) segments. OSS was concerned with network management, service assurance, RAN planning and, generally speaking, systems closer to the network.  BSS was more concerned with product management, revenue management and customer management, and was therefore closer to the business.


Today OSS and BSS are coming closer with smartphones providing the vital OSS-BSS link.  Indeed, the most attractive business opportunities are at the BSS-OSS intersection.

Consider the example of an entrepreneur wishing to start a video streaming service. This is essentially an operations service. So 10 years ago he would probably have just created his OSS network and waited for customers to somehow ‘find’ him (because BSS hold customer data).

With big data systems now in place, things would be very different. Based on demographic and usage data the entrepreneur would know who to target; based on geo-location data he’d know where to target; and by looking at enriched consumer data of users who buy DVDs or play games on their smartphones, he would also obtain a profile of his potential top customer.


This review strongly confirms something that we already knew – Big Data is coming and is poised to dramatically change the way we do business. Telecom analytics in particular will be a big beneficiary.


Paradigms – even cherished ones – are changing. From ‘store first and analyse later’ we are moving to ‘analyse first and store later’. From ‘pick a representative sample to validate your model’ we are moving to ‘just analyse all your data, and see if there is some hidden model’. From the hallowed and sacrosanct parametric world we are poised to enter the fascinating and intriguing non-parametric world.

IBM says: “If you can ingest and analyse networks, location and customer data in real-time or near real-time you have much to gain!” To that I would like to add: “Tomorrow’s wars will not be fought for fair maidens, land or oil. But for data!

The rich used to hide their money in Swiss banks. It can’t be a coincidence that they are now getting ready to store data in Swiss mountains.


Big data analytics for communication service providers. IBM Software White Paper in Telecommunications. 2010.

The big data potential. Nomura, Global Markets Research. 2014.

The future of big data analytics in the telecoms industry. Justin van der Lande. Analysys Mason report for Amdocs. 2014.

One thought on “Things we can do with telecom data

  1. hi, am in telecom domain and i could relate to most of the concepts mentioned here. well written with good examples. No doubt, Data is the new OIL.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s