From the course: Data Ingestion with Python
Overview of data scientists work - Python Tutorial
From the course: Data Ingestion with Python
Overview of data scientists work
- [Instructor] When people enter the data science world, they have a certain image in their head, of how work will look like. They'll try some cool algorithm on some data, tweak some parameters and produce code that will learn by itself and improve business results, identify cat pictures or protect servers from new kinds of attacks. However when you interview data scientists, you'll find out that most of their time is spent on getting and cleaning data. That's why I think that data science should be written as DATAscience. Data is dominating two places, one is the amount of time you'll spend on it, the second is the quality of your algorithms. In their influential article "The Unreasonable Effectiveness of Data" Halevy, Norvig, and Pereira, show that dumb algorithms will perform much better than smart ones, given enough data. This implies you'll want a lot of high quality data available to you. As a data scientist you'll find yourself doing the following Acquire data from various sources, clean this data, train a model, evaluate your model, realize you need more high quality data and go back to step one. At some point, your results will be good enough and you'll be able to ship, however data changes over time, and you'll always need to train your algorithms on high quality and relevant data.
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.