From the course: Rust for Data Engineering

Unlock this course with a free trial

Join today to access over 24,900 courses taught by industry experts.

Using public data sets for data science

Using public data sets for data science - Rust Tutorial

From the course: Rust for Data Engineering

Using public data sets for data science

- [Instructor] A very common way to build machine learning systems is to use public data sets. Let's talk through a few of the common public data sets that are available. A very popular merging public dataset is the hugging face dataset, and you can use it to fine tune a model. So let's say you get a pre-trained model from hugging face, and you use an environment that has GPU enabled like GitHub code spaces or an Amazon SageMaker environment with GPU enabled, you can then take that hugging face dataset and fine tune it based on the new data that's available and then create a new model and put it either into production or back into hugging face. Likewise, with Amazon S3, it's a very common scenario to have a big public data set and you can pull that data set into let's say a Jupyter Notebook and do exploratory data analysis on it, find out what it is you're trying to build, and then create a model based on that S3 dataset.…

Contents