From the course: Azure Data Factory Administration Essential Training: Manage, Secure, and Monitor Environments
What is Azure Data Factory? - Azure Tutorial
From the course: Azure Data Factory Administration Essential Training: Manage, Secure, and Monitor Environments
What is Azure Data Factory?
- [Instructor] Let's begin with a quick look at what Data Factory is and why it matters in today's data-driven workloads. To get the general idea of what Azure Data Factory is, let's begin by discussing the challenges it intends to address. Information today is everywhere. It may be hosted on the CRM systems such as Salesforce, your internally developed applications and databases, on ERP software such as Dynamics or SAP, or even less structured inputs such as social media posts or IoT data. While each one of these platforms on its own is quite significant to any business, there's much more value to be gained by correlating the data from all of them into a single, unified source of truth. For instance, although your ERP platform might show that revenue went up last month, your Salesforce data might identify which advertising effort led to that increase. Due to this integrated dataset requires you to extract the data from all of the sources and load it into a destination, typically a data warehouse system such as Azure Synapse Analytics. Additionally, some information, such as that produced by IOT devices or social networks, might not offer much value in its raw format. You may need to transform it through multiple methods to derive actionable insights from it. The process of extracting, transforming, and lodging data when performed in this order is referred to as ETL. Alternatively, you might opt to apply the transformations after lodging the data into the target system. This technique is known as ELT, extract, load, and then transform. As you might expect, managing all the sources, destinations, and transformation steps is quite a complex task, especially considering the large number of databases, file structures, and data formats in today's data landscape. But that's precisely the role that Azure Data Factory is designed to fulfill. According to the Microsoft documentation, Azure Data Factory is a cloud-based data integration service that orchestrates data movement and transformation. Descriptions like this can sound a bit vague, so let's break it down one part at a time to better understand what ADF can do. First, ADF is a cloud-based platform-as-a-service offering, and as such, it benefits from the chip advantages of paths, such as high availability and scalability, and no upfront expenses. You pay only for what you use. Additionally, since Data Factory is serverless, it requires minimal operation maintenance. There's no need to manually stop dates or configure high availability as you do with any on-premises infrastructure. Next, ADF is a data integration service. As we have discussed before, this is the process of combining data from multiple sources into consolidated view, enabling sites that would be difficult to obtain if the data were kept in isolation. This process is done through pipelines, which are logical groupings of activities that perform a data engineering task. These pipelines can be created either graphically or through code, and they're the core unit of work in Data Factory, much like a spreadsheet is what you work with in Excel. Now, let's look at orchestration. ADF includes approximately 90 built-in connectors, not only for a wide variety of on-premises and cloud-based data systems, like Oracle and Blob Storage, but also for numerous compute platforms used for data enrichment and transformation. That allows ADF to become the main orchestrator of data-related workflows in Azure. In practice, this means that if ADF itself cannot handle a particular task, it can call an external service to carry it out, increasing further the functionality of the service. And the best part is, once your pipelines are built and validated, you can set them up to run automatically on a schedule of your choosing. Finally, Azure Data Factory allows you to perform code-free transformations such as filtering, parsing, or sorting data. This can be done using two distinct activities, mapping data flows, which are based on Apache Spark and intended for high volume data processing, and Power Query, which uses the M language and shared across several Microsoft tools, including Power BI and Excel. We'll cover transformations in depth on our data engineering course.