justinkendra - Fotolia

News Stay informed about the latest enterprise technology news and product updates.

Why is it possible for good data to sometimes result in bad analytics?

Luminoso analytics predicted 'Jackie' would garner a best-picture Oscar nod. It didn't. Understanding why is a good lesson for application developers and corporate decision-makers.

It was a dark and stormy California morning when the Academy of Motion Picture Arts and Sciences announced its nominations for best picture on Jan. 25 (2017). Of the nine movies nominated, the Kennedy biopic Jackie was not among them, a cruel plot twist to the prediction made two weeks earlier by Luminoso Technologies, a maker of artificial-intelligence-based analytics software. Is this a case of bad analytics or something else?

"Even though Jackie wasn't nominated, I'm happy with the results," said Dan Mitus, Luminoso's solutions engineer. What was proven, he said, is that an analytics prediction or ratings scale should not be judged as a single point in time and that there is great value in crowdsourced textual content. "You should learn from it and continually improve," he said.

The difficulty in making predictions, even when based on superior data, is that those prognostications do not come with guarantees. "You can never predict with certainty what will happen, only what might happen," said Judith Hurwitz, president of Hurwitz & Associates, a Needham, Mass., IT consultancy. "It is important to understand that this is in the realm of likelihoods, not absolute certainties."

The simple fact is that despite rapid advances in technology, predictive analytics can never be perfect. One need look no further than the presidential election of 2016 to know that.

You can never predict with certainty what will happen, only what might happen.
Judith Hurwitzpresident, Hurwitz & Associates

That's what happened in the case of the Oscar nomination miscue. The prediction was based on the best and broadest information available at the time. Yet sometimes, even when the data is accurate, the prediction derived can simply be off the mark. As sports pundits often say, that's why they play the games.

Of the four types of analytics -- descriptive, diagnostic, predictive and prescriptive -- it is predictive that shifts from pure hindsight to a blend of insight and foresight, attempting to answer the question "what might happen," not what will happen.

Not an exact science

"Analytics is an art form that's driven by conjecture," said Simon James, global lead for performance analytics at SapientNitro, the digital subsidiary of marketing consulting firm Sapient.

Acknowledging in its own corporate blog that predicting Oscar winners is a "messy affair," Luminoso, based in Cambridge, Mass., incorporated more than 84,000 reviews from average moviegoers -- not critics -- posted on the Internet Movie Database website spanning 2013 to 2016. Homing in on the top 50 most-popular movies from each year, Luminoso Analytics examined those reviews to, as the blog notes, "identify correlations between topics discussed in the reviews and the eventual Oscar nominees and winners."

Looking back at prior years was an essential part of the analytics process. Luminoso tweaked its algorithms so that the analysis of user reviews yielded the actual winner. Luminoso's underlying technology takes a collection of unstructured text data as the basis for distilling meaningful concepts. The software then identified items that are "conceptually similar," Mitus said.

Only as good as the data

Luminoso's blog went on to say that its prediction methodology, while scientific, "cannot completely account for the whims of marketing efforts and the impact on voting." Providing itself with some wiggle room, the blog entry concludes, "If Jackie does not win Best Picture, that will indicate that the winner's marketing efforts outperformed the film's actual virtues and may not have won on its own merit." In other words, it's not so much a matter of bad analytics, but rather about opinions being swayed by advertising and public-relations campaigns foisted upon the movie-going public. It explains, in part, why box-office receipts for a legendary flop like Gigli could drop by 81.9% in its second week of wide release -- marketing can pull people into a theater, but it's powerless to compete with subsequent unfavorable word-of-mouth.

Mitus said that the analysis did not take the marketing and advertising aspect into account, because it was not based on the reviews posted to IMDB, the only type of data analyzed.

Another aspect that was not analyzed was potential bias in the underlying user movie reviews. "We didn't investigate to confirm that the sample is representative, but there is tremendous value in crowdsourced information," Mitus said. In such a clear case of bias, for example, comments posted to a sports website favoring the Super Bowl-bound Atlanta Falcons is not likely to have content that presents a balanced picture of the opposing New England Patriots. "Had we been conducting a study where there was an obvious bias, we would account for that," he said. Again, it's not about bad analytics, but understanding the framing and completeness of the data.

Given Luminoso's miss on picking the Best Picture winner for 2017, will the company have another go next year? "We will do it again next year and look at additional award categories," Mitus said. "The human element through crowdsourcing is the foundation of the technology and that will continue to improve."

Joel Shore is news writer for TechTarget's Business Applications and Architecture Media Group. Write to him at [email protected] or follow @JshoreTT on Twitter.

Next Steps

Nordstrom buys into text analytics

Is text analytics the next big thing?

Analyzing unstructured text is no easy task

Dig Deeper on Big data, machine learning and AI