- Understand the subtle difference between being an Engineer, and a Data Engineer. The difference between a Data Engineer and an Engineer, is an understanding of the importance of data, and how introducing pollution to a data set, can have catastrophic consequences downstream
- If this is all new to you, Data Warehousing concepts can cover most conversations, as these will help you get a foundational understand of what Data Engineering is trying to achieve. There are lots of nuances of course, but just as a general principle, when to use 3rd Normal form vs 2nd etc. Dimensional modeling, distributed data etc.
- If that sounds somewhat overwhelming, I promise you it’s not. In fact, there are 2 videos, less than an hour each, that will cover pretty much everything
4. Choose a starting point. From entry level job specs, I see enough commonality across Spark/PowerBI/Python that makes me think the Microsoft Azure Data Fundamentals and/or the Data Engineering on Microsoft Azure class and certification would be a great star. This has pipeline concepts with Azure Synapse. It leverages Databricks, which is Spark & PySpark. And it integrates directly with PowerBI
Free online Data Fundamentals Training
Free Youtube course claiming it’s all the content you need to pass. I have not reviewed it, but freecodecamp.org bootcamps are usually pretty good
Free online Data Engineering Associate Training – https://docs.microsoft.com/en-us/learn/certifications/azure-data-engineer/
Another Free Youtube course. It seems a little click-baity in the title, but it’s worth checking out
If you currently have student status, you can claim $100 of free Azure credits per month, and a heavily discounted exam fee. This makes this path perfect for hands on learning.
https://azure.microsoft.com/en-us/free/students/
https://docs.microsoft.com/en-us/learn/certifications/student-discounts
- $69 vs $99 for Fundamentals exams
- $99 vs $165 for Associate exams
- https://wsr.pearsonvue.com/vouchers/pricelist/microsoft.asp#prices
Hands on work
I’m a big fan of small projects to persist the information in my brain, here are a few suggestions as a starting point
- Azure Synapse – https://docs.microsoft.com/en-us/azure/synapse-analytics/get-started
- Azure DataBricks (Transferrable to AWS) – https://databricks.com/discover/free-training/getting-started-with-azure
- Modeling data in PowerBI – https://docs.microsoft.com/en-us/learn/modules/model-data-power-bi/