From the course: Rust for Data Engineering
Unlock this course with a free trial
Join today to access over 24,900 courses taught by industry experts.
Using public data sets for data science - Rust Tutorial
From the course: Rust for Data Engineering
Using public data sets for data science
- [Instructor] A very common way to build machine learning systems is to use public data sets. Let's talk through a few of the common public data sets that are available. A very popular merging public dataset is the hugging face dataset, and you can use it to fine tune a model. So let's say you get a pre-trained model from hugging face, and you use an environment that has GPU enabled like GitHub code spaces or an Amazon SageMaker environment with GPU enabled, you can then take that hugging face dataset and fine tune it based on the new data that's available and then create a new model and put it either into production or back into hugging face. Likewise, with Amazon S3, it's a very common scenario to have a big public data set and you can pull that data set into let's say a Jupyter Notebook and do exploratory data analysis on it, find out what it is you're trying to build, and then create a model based on that S3 dataset.…
Contents
-
-
-
-
-
-
-
-
-
-
-
-
-
(Locked)
Selecting the correct database on GCP3m 46s
-
(Locked)
Rust SQLite Hugging Face zero-shot classification9m 55s
-
(Locked)
Prompt engineering for BigQuery9m 20s
-
(Locked)
BigQuery to Colab pipeline5m 32s
-
(Locked)
Exploring data with BigQuery12m 36s
-
(Locked)
Using public data sets for data science1m 44s
-
(Locked)
Querying log files with BigQuery3m 49s
-
(Locked)
There is no one-size database1m 44s
-
(Locked)
Course conclusion1m 24s
-
(Locked)