From the course: Rust for Data Engineering

Unlock this course with a free trial

Join today to access over 24,900 courses taught by industry experts.

Multi-threaded deduplication with Rust

Multi-threaded deduplication with Rust - Rust Tutorial

From the course: Rust for Data Engineering

Multi-threaded deduplication with Rust

- Let's talk a little bit about data engineering. In my opinion, data engineering is the classic systems programming problem, and Rust is a systems programming language. A lot of my career I've worked with building command line tools that would do some kind of data operation like moving thousands of files somewhere or using a petabyte scale file server to build movies, for example, when I worked on Avatar, or Sony Movies or Disney Movies. But when I see something like Rust, what's so exciting about it is that you can build things that are multi-threaded, efficient, use low memory, and also build portable responsive technology. So really that's the key thing that you get with Rust versus a scripting language is your ability to build high-performance tools. Now the question is, is it too difficult to do? What I'm going to show you is how to build a deduplication tool. You can actually replace the deduplication and put in…

Contents