Keeping pace with IT's rapid evolution can be a challenge. Predictive analytics allows companies to follow user...
trends, see where each is headed and often stay one step ahead of the competition. And large cloud vendors like Google and Microsoft now offer predictive analysis as a service in the cloud. But to get the most out of these cloud-based predictive analytics and avoid problems, cloud admins must know what these services offer and when to use them.
Cloud predictive analytics services build and run statistical and machine learning programs to help users make decisions based on data. For example, a retailer might offer a discount to a customer that the model predicts will likely go to a competitor, or "churn." To determine which customers will stay and which will go, the service collects data for both current and former customers. Then, machine-learning algorithms analyze the data to develop rules, formulas and decision trees to predict the likelihood that a customer will leave. These are known as predictive models.
Predictive analysis services such as Microsoft Azure Machine Learning and Google Prediction API eliminate the hassle of finding, installing and configuring machine learning for particular use cases. You still have to provide the data, however. Most predictive algorithms require labeled data sets. For former customer studies, you need to provide customer data, including an attribute that indicates whether or not the customer left.
Cloud admins may need to help with data preparation by creating a good set of features to include in the data set. For former customers, include such features as the time since last purchase and the amount purchased, as well as average purchase amount and average number of purchases for the last two years, among other things. It's common to have hundreds of features in a data set.
When do I use predictive analytics?
Use predictive analysis tools only when you have the sufficient data to understand the full scope of the business problem at hand. For example, don't assume a data sample of customers from the Northeast is sufficient enough to highlight the purchasing patterns of customers throughout the entire U.S. Data needs to be as clean as possible, but many machine-learning algorithms can still produce reasonable results in spite of the noise. Use some of your data to build the model and some to test it. Consider how long it will take to collect, transform and label data. Generating training data can be time-consuming and costly, so make sure the predictive benefits outweigh the costs associated with building it.
Be prepared to work and rework your models to get the most out of predictive analytics services. Cloud admins may need to work with data model developers to assess the performance of the predictive models.
Common mistakes with cloud predictive analytics
Mistakes are common when you start using machine learning services. Don't assume a particular algorithm will work well for you because it worked well for others. Azure offers a number of different algorithms, so you may need to experiment to find the best fit. Google Prediction API is a black-box service; you supply the data and it supplies the predictions. There is no need to pick a particular algorithm.
If you are successful in building a predictive model, run it in a production environment. Be aware of the volume of expected predictions and the cost of the service at production scale. Similar to other SaaS offerings, work with predictive analytics modelers to plan for outages. Service outages are less problematic for predictive analytics models in batch mode than predictive models used to support transactions, such as offering a discount at checkout time.
Predictive analytics services eliminate a lot of the technical overhead associated with implementing machine-learning systems. Users are responsible for data preparation and model evaluation. Cloud admins can help analysts more accustomed to spreadsheets than to programming languages. However, it only helps to understand what modelers are trying to accomplish.
About the author:
Dan Sullivan holds a Master of Science degree and is an author, systems architect and consultant with more than 20 years of IT experience. He has had engagements in advanced analytics, systems architecture, database design, enterprise security and business intelligence. He has worked in a broad range of industries, including financial services, manufacturing, pharmaceuticals, software development, government, retail and education. Dan has written extensively about topics that range from data warehousing, cloud computing and advanced analytics to security management, collaboration and text mining.