1. Understand the subtle difference between being an Engineer, and a Data Engineer. The difference between a Data Engineer and an Engineer, is an understanding of the importance of data, and how introducing pollution to a data set, can have catastrophic consequences downstream
  2. If this is all new to you, Data Warehousing concepts can cover most conversations, as these will help you get a foundational understand of what Data Engineering is trying to achieve. There are lots of nuances of course, but just as a general principle, when to use 3rd Normal form vs 2nd etc. Dimensional modeling, distributed data etc.
  3. If that sounds somewhat overwhelming, I promise you it’s not. In fact, there are 2 videos, less than an hour each, that will cover pretty much everything 

4. Choose a starting point. From entry level job specs, I see enough commonality across Spark/PowerBI/Python that makes me think the Microsoft Azure Data Fundamentals and/or the Data Engineering on Microsoft Azure class and certification would be a great star. This has pipeline concepts with Azure Synapse. It leverages Databricks, which is Spark & PySpark. And it integrates directly with PowerBI

Free online Data Fundamentals Training

Free Youtube course claiming it’s all the content you need to pass. I have not reviewed it, but freecodecamp.org bootcamps are usually pretty good

Free online Data Engineering Associate Training – https://docs.microsoft.com/en-us/learn/certifications/azure-data-engineer/

 

 

Another Free Youtube course. It seems a little click-baity in the title, but it’s worth checking out 

If you currently have student status, you can claim $100 of free Azure credits per month, and a heavily discounted exam fee. This makes this path perfect for hands on learning.
https://azure.microsoft.com/en-us/free/students/
https://docs.microsoft.com/en-us/learn/certifications/student-discounts

Hands on work

I’m a big fan of small projects to persist the information in my brain, here are a few suggestions as a starting point