introduction to data science - Forum

introduction to data science

Big data is a blanket term for any collection of data sets so large or complex that it becomes difficult to process them using traditional data management techniques such as, for example, the RDBMS (relational database management systems). The widely adopted RDBMS has long been regarded as a one-size-fits-all solution, but the demands of handling big data have shown otherwise. Data science involves using methods to analyze massive amounts of data and extract the knowledge it contains. You can think of the relationship between big data and data science as being like the relationship between crude oil and an oil refinery. Data science and big data evolved from statistics and traditional data management but are now considered to be distinct disciplines.

The characteristics of big data are often referred to as the three Vs:

? Volume—How much data is there?

? Variety—How diverse are different types of data?

? Velocity—At what speed is new data generated?

Often these characteristics are complemented with a fourth V, veracity: How accurate is the data? These four properties make big data different from the data found in traditional data management tools. Consequently, the challenges they bring can be felt in almost every aspect: data capture, curation, storage, search, sharing, transfer, and visualization. In addition, big data calls for specialized techniques to extract the insights.

Data science is an evolutionary extension of statistics capable of dealing with the massive amounts of data produced today. It adds methods from computer science to the repertoire of statistics. The main things that set a data scientist apart from a statistician are the ability to work with big data and experience in machine learning, computing, and algorithm building.


priyanka_r on 17th Nov 2017, 10:07 AM

What are the disadvantages of Data Science? 

priyadharshini on 17th Nov 2017, 10:19 AM

Data science often have heavy noise, i.e. there may be many meaningless data points. The analyst should work hard to separate wheat from the tares. And it also implies privacy problems which can be seen, for instance, from the analysis of social networks.

Arivazhagan on 17th Nov 2017, 10:20 AM

Is there any specific tools for Data Science related works?

priyanka_r on 17th Nov 2017, 10:29 AM

Thank you Priyadharshini for your reply. Your answer is useful for me.

priyadharshini on 17th Nov 2017, 10:30 AM

Yeah.... There are several tools available for Data science which includes RapidMiner, DataRobot, BigML, Paxata, Trifacta, Narrative Science, MLBase, Automatic Statistician. Most commonly used language for Data science is Python. 

Arivazhagan on 20th Nov 2017, 05:27 AM

Thank you priyadharshini for your answer. 

Leave a comment