Big data myths and the dangers they pose

Precision is what most people strive for in using analytics. Crunching all those numbers with computing power that could run a small province has to generate an exact measure, right?

No, says an expert, who believes it pays to be vague — particularly in big data projects.

“I’ve seen many end results in files and they have very impressive numbers,” says Alan Khara, director of information of the First Nations Education Steering Committee (FNESC) of Vancouver. But “the end result is the numbers were never achieved or were way off.”

It’s more accurate in a report to give a range of percentages in a prediction, he told a Toronto big data conference Wednesday. Ironically, in many cases the narrower the results the greater the margin of error.

They believe that huge datasets generate precise results is one of the five myths about big data that contributes to the failure of projects, he said.

Khara, who has worked in analytics for a university, financial institutions and a British Columbia health institution, led him to the myths. The others are:

–The more data you have, the more you’ll get out of it

Believing this is why organizations spend so much time sweeping in as much data as they can get their hands on, said Khara. But volume doesn’t guarantee information. He recalled working for an institution that collected 8 billion images of space (each about 10 MB) looking for a planet that was allegedly pulling Neptune further from the sun than its regular orbit. With the technology available at the time the research team thought it would take three years for their data model to search the images.

The hidden gravitational force wasn’t found. But other teams using the similar data found some small but useful particles. What did they do different? They invested more time in the data model than collecting data.

“From a business point of view It is important that you’re collecting big data, it is how you’re analyzing it,” he told the conference. The more diverse skills your team has the better, he added.

Investing time in a model doesn’t mean having the latest tools, he cautioned.

– Structured data is better than unstructured

Not true in many cases. One problem is organizations often convert unstructured data (like metadata) into a structured form. But, said Khara, what that does is alter important data from the unstructured file. If that file included time-sensitive or -related data that can be fatal if it isn’t considered.

He recalled a B.C. construction company complaining that after investing in big data tools to analyze its decades of data on physical infrastructure there was no useful results. But, Khara points out, data models can’t be static — data collected into a system years ago under a set of assumptions may not fit assumptions of today, so data models today may not be able to analyze old data. “Everything changes with time.”

More big data myths busted here

Would you recommend this article?

Share

Thanks for taking the time to let us know what you think of this article!
We'd love to hear your opinion about this or any other story you read in our publication.


Jim Love, Chief Content Officer, IT World Canada

Featured Download

Howard Solomon
Howard Solomon
Currently a freelance writer, I'm the former editor of ITWorldCanada.com and Computing Canada. An IT journalist since 1997, I've written for several of ITWC's sister publications including ITBusiness.ca and Computer Dealer News. Before that I was a staff reporter at the Calgary Herald and the Brampton (Ont.) Daily Times. I can be reached at hsolomon [@] soloreporter.com

Related Tech News

Featured Tech Jobs

 

CDN in your inbox

CDN delivers a critical analysis of the competitive landscape detailing both the challenges and opportunities facing solution providers. CDN's email newsletter details the most important news and commentary from the channel.