I am a Senior Data Engineer at Amazon with more than 11+ years of experience. Here are 5 pieces of advice I would give to people in their 20s, who want to make a career in Big Data in 2025: ◄ Stop obsessing over fancy tools [ Master SQL first ] - Become fluent at writing complex joins, window functions, and optimizing queries. - Deeply understand ETL pipelines: know exactly how data moves, transforms, and lands in your warehouse. - Practice schema design by modeling real datasets (think e-commerce or user analytics data). ◄ Get hands-on with cloud, not just theory - Don't just pass AWS certification exams, build projects like a data pipeline from S3 to Redshift or an automated ETL workflow using AWS Glue. - Learn Kafka by setting up a simple real-time data streaming pipeline yourself. Set up an end-to-end analytics stack: ingest real-time data, process it with Airflow, Kafka, and visualize with QuickSight or Power BI. ◄ System Design is your secret weapon - Don't memorize patterns blindly, sketch systems like a Netflix-like pipeline, complete with partitioning and indexing choices. - Practice explaining your design to someone non-technical, if you can’t, redesign it simpler. - Understand real trade-offs like when to pick NoSQL (DynamoDB) vs SQL (Postgres) clearly, with real-world reasons (transaction speed vs consistency). ◄ Machine learning isn't optional anymore - Go beyond theory: integrate real ML models into your pipelines using something like Databricks or SageMaker. - Experiment with ML-based anomaly detection, build a basic fraud detection pipeline using real public datasets. - Know basics of Feature Engineering, prepare datasets used by data scientists, don’t wait for them to teach you. ◄ Soft skills will accelerate your career - Learn to clearly communicate business impact, not just tech specs. Don’t say "latency reduced," say “users see pages load 2x faster.” - Document like your future self depends on it, clearly explain your pipelines, edge cases, and design decisions. - Speak up early in meetings, your solutions won’t matter if no one understands them or knows you created them. – P.S. I'm Shubham - a senior data engineer at Amazon. Follow me for more insights on data engineering. Repost if you learned something new today!
Skills for Data Engineering Positions That Matter
Explore top LinkedIn content from expert professionals.
Summary
Building a strong foundation is crucial for excelling in data engineering roles, where a blend of technical skills and strategic understanding of data systems plays a significant role in success.
- Master SQL fundamentals: Focus on writing complex queries, understanding data joins, and optimizing database performance to handle large-scale datasets effectively.
- Develop cloud and orchestration skills: Build projects that involve cloud platforms like AWS and tools like Apache Airflow to implement end-to-end data pipelines and real-time processing workflows.
- Learn data modeling and integration: Understand how to design robust database schemas, perform seamless data transformations, and gain hands-on experience with tools like Pandas, Polars, and PostgreSQL.
-
-
If you want to get a job as a data engineer, don’t start by learning tools. Yes, you might read a lot about Databricks, dbt or some other hot tech solution of the month....but guess what! When I started everyone was talking about Hadoop and Data Lakes and well, there aren’t as many people talking about that anymore(at least not in the same context). Learning tools and solutions before understanding the basics is like trying to learn calculus without understanding what numbers are. Instead, here are the basic skills that got me farther than learning how to work in a Hadoop environment or write an Airflow DAG. Databases/Data Warehouses - I took my first database course in college. It was dreadfully boring. But, those basics, have likely been some of the most important foundational skills I have learned. They eventually led me to dig into data warehousing and help me understand solutions that aren't traditional databases far better. SQL - As much as some data professionals may dislike it, SQL still makes up a good portion of how data is transformed. Whether you work at Facebook or a small start-up, you'll likely be using SQL to transform data Programming - Should you learn Python, Go, Rust, Java, Scala, etc? The best advice I can give is to pick the language that is used frequently for the work you want to do. I learned Java when I started and I haven't programmed in it for the last 3-4 years. Data Modeling - For many this means Kimball, but I think Joe Reis 🤓 made a good point a few months back when he said analytical data modeling starts at the transaction layer. Because if that data isn't set up well, good luck trying to create analytics on top of it. Computer Networking And Sys Admin work- I don't really know how to classify this. But being able to understand the basics of networking, working through command lines, etc. It'll all end up feeding into the Cloud, Docker, and so many other skills. It'll honestly feel like superpowers sometimes. Distributed computing - Data is getting bigger, and if you know how to manage and processes that efficiently, you’re way ahead of the curve. I love tools, and they can make my job easier. But the more solid your foundation, the easier it'll be to pick up all the new solutions that swear they are the answer to all your problems.
-
For those looking to start a career in data engineering or eyeing a career shift, here's a roadmap to essential areas of focus: 𝗗𝗮𝘁𝗮 𝗜𝗻𝘁𝗲𝗴𝗿𝗮𝘁𝗶𝗼𝗻 𝗧𝗲𝗰𝗵𝗻𝗶𝗾𝘂𝗲𝘀 - 𝗗𝗮𝘁𝗮 𝗘𝘅𝘁𝗿𝗮𝗰𝘁𝗶𝗼𝗻: Learn both full and incremental data extraction methods. - 𝗗𝗮𝘁𝗮 𝗟𝗼𝗮𝗱𝗶𝗻𝗴: - 𝗗𝗮𝘁𝗮𝗯𝗮𝘀𝗲𝘀: Master the techniques of insert-only, insert-update, and comprehensive insert-update-delete operations. - 𝗙𝗶𝗹𝗲𝘀: Understand how to replace files or append data within a folder. 𝗗𝗮𝘁𝗮 𝗧𝗿𝗮𝗻𝘀𝗳𝗼𝗿𝗺𝗮𝘁𝗶𝗼𝗻 𝗦𝘁𝗿𝗮𝘁𝗲𝗴𝗶𝗲𝘀 - 𝗗𝗮𝘁𝗮𝗙𝗿𝗮𝗺𝗲𝘀: Acquire skills in manipulating CSV and Parquet file data with tools like Pandas and Polars. - 𝗦𝗤𝗟: Enhance your ability to transform data within PostgreSQL databases using SQL. This includes executing complex aggregations with window functions, breaking down transformation logic with Common Table Expressions (CTEs), and applying transformations in open-source databases such as PostgreSQL. 𝗗𝗮𝘁𝗮 𝗢𝗿𝗰𝗵𝗲𝘀𝘁𝗿𝗮𝘁𝗶𝗼𝗻 𝗙𝘂𝗻𝗱𝗮𝗺𝗲𝗻𝘁𝗮𝗹𝘀 - Develop the ability to create a Directed Acyclic Graph (DAG) using Python. - Gain expertise in generating logs for monitoring code execution and incorporate logging into databases like PostgreSQL. Learn to trigger alerts for failed runs. - Familiarize yourself with scheduling Python DAGs using cron expressions. 𝗗𝗲𝗽𝗹𝗼𝘆𝗺𝗲𝗻𝘁 𝗞𝗻𝗼𝘄-𝗛𝗼𝘄 - Become proficient in using GIT for code versioning. - Learn to deploy an ETL pipeline (comprising extraction, loading, transformation, and orchestration) to cloud services like AWS. - Understand how to dockerize an application for streamlined deployment to cloud platforms such as AWS Elastic Container Service. 𝗦𝘁𝗮𝗿𝘁 𝗬𝗼𝘂𝗿 𝗝𝗼𝘂𝗿𝗻𝗲𝘆 𝘄𝗶𝘁𝗵 𝗙𝗿𝗲𝗲 𝗥𝗲𝘀𝗼𝘂𝗿𝗰𝗲𝘀 𝗮𝗻𝗱 𝗣𝗿𝗼𝗷𝗲𝗰𝘁𝘀: Begin your learning journey here : https://lnkd.in/e5BxAwEu Mastering these foundational elements will equip you with the understanding and skills necessary to adapt to modern data engineering tools (aka the modern data stack) more effortlessly. Congratulations, you're now well-prepared to start interviewing for data engineer positions! While there are undoubtedly more advanced topics to explore such as data modeling , the courses and key areas highlighted above will give you a solid starting point for interviews.