What is Data Science and Why Should I Care?
March 6, 2012 Leave a comment
Let’s start with the second question first. Why should you care? You should care about data science because data is quickly becoming one of the most sought after commodities in business world. The data the companies are currently storing has the potential to create new insights into new business opportunities. Data science is the art of using the data to discover those new opportunities and insights into your business. Your fellow competitors are already using data science to gain an edge over you and your business model. They have begun performing “experiments” on their data in an attempt to assemble new ways to better position themselves in the market. And, if you are a direct competitor, these companies may have even used some data analysis to better understand your business as well.
What is data science? Data Science is the process of using data in the wild (unstructured, unformatted, multiple sources, etc.), manipulating the data, discovering what type of story it tells, and presenting it to a group. It’s that simple. To recap:
- Find and Store the Data
- Analyze Data
- Find meaning in the Data
- Present Data results.
It’s that simple, this isn’t rocket science, or is it? We will dig deeper into the challenges of data analysis in part 2 of this series.
How does Big Data relate to Data Science?
Big Data is in short big…data. That is all it is. The importance of big data comes from the fact that the types, and amount of data being generated allows for new breakthroughs in the world of data science. Big data is becoming an important topic because as data grows, new tools are to being developed in order to leverage the power of the data. This is the essence of big data, management of large data sets.
Due to the amount of data now available and accessible (some companies have petabytes), there have been significant gains in the accuracy of predictive analysis, machine learning algorithms, and cross-industry correlations. Consequently, the different types of data being stored are creating opportunities for new insights into relationships of areas that were previously not thought of.
One of the challenges of this exponential growth is that a lot of the information being created is becoming less structured and more difficult to analyze. Resulting in the analysis becoming more complex and less accessible to traditional data analysis practices. This in turn has resulted in a new need and new skillsets to work with the data. Because of the size of the data, new tools, such as Hadoop are needed for scalability and storage. These new storage systems allow efficient storage of data while also creating a place where parallel processing can be leveraged to crunch the large amount of information stored. A good example is weblogs. Statistically the data can be sparse, for example, a specific use case happens one every million hits. Then in order to get a good representation, you will need a large data set. If you wanted to capture 1000 of those hits of that use case and analyze it you would need to store 1 billion individual users and all of their activity.
We are already doing BI what’s the difference?
Data Science has been picked up by the BI industry as something they are already doing. While there may be some BI groups performing true data science, it is typically not the case. According to Steven Hillion is vice president of analytics at EMC Greenplum:
“The skill set of the data scientist goes beyond the capabilities of what many would call ‘traditional business intelligence (BI).’ Traditional BI is interested in the ‘what and the where,’ while data scientists are interested in the ‘how and why,’ Hillion says. ‘They’re interested in inferring things that are not already present in the data’”
As mentioned above one of the key differences in data science and traditional BI, is data science is more than reporting. Data science uses mathematical models to find the underlying meaning behind the report. This doesn’t mean that BI and data science are competitors, they are complimentary to each other, and if used together can create a strong foundation in understanding current business cases.
Hopefully, this introduction helped clear up what data science and big data are. The intent is to not overwhelm you with too much detail early on, but give you a high level understanding of the challenges ahead.
|Learn more about our Seattle office||Learn more about Slalom Consulting Mobility|
Subscribe to follow new Mobile posts