Azure Data Factory

Azure Data Factory Concepts

Azure is a cloud computing platform provided by Microsoft. Azure provides numerous services. Azure data factory is one such service. In this blog, we will discuss Azure Data Factory Concepts and how it works.

What is Azure Data Factory (ADF)

Azure Data Factory is basically an ETL (Extract Transform and Load) and ELT (Extract Load and Transform) service on Azure cloud. It is a cloud based data integration service that orchestrates and automates the movement and transformation of data.

Azure Data Factory can handle huge amounts of structured or unstructured data and can move this data from your on-premise data to cloud or vice versa. It helps you create workflows and these workflows can be parameterized, scheduled and monitored.

Visit What is Azure Data Lake and how to load data in Azure Data Lake.

Components of Azure Data Factory

Azure Data Factory uses the following components:

  1. Datasets. Azure can handle huge amount of raw data in structured or unstructured format. It can also move data from relational databases and large files. Two kinds of data sets are defined in ADF – Source and Destination
  2. Storage account. ADF supports data stored in a storage accounts such as Azure Blob, Azure Data lake  and many more.
  3. Pipeline. A data factory can have one or more pipelines. It is a logical grouping of activities/actions that together perform a task. Pipelines are executed to perform an Activity.
  4. Linked Service. Linked service defines the connection string required by data factory to connect with the data sets. If you need to copy the data from Source to Destination, you need to define two linked services connecting to each of the data sets.
  5. Activity. Activity in a Pipeline defines action(s) to perform on data. For e.g. Copy Activity is used to copy data from on-premise database to Blob storage. Activities can be divided into 3 groups:
    • Data movement activities
    • Data transformation activity
    • Control activity
  6. Triggers. Triggers are used to run a pipeline. Triggers are defined to schedule a Pipeline execution. There are 3 types of triggers :
    • Schedule Trigger: Run a Pipeline at a scheduled time
    • Tumbling window Trigger: Operates on a periodic interval
    • Event-based Trigger: Responds to an event

How Azure Data Factory Works

The flow of Azure Data factory to copy data in a file or table (Stored in Azure Storage Account such as Blob or Data Lake) to the target database table. This shows “Copy Activity” performed by a Pipeline.

Azure Data Factory


Author:

Chandraish Sinha

Chandraish Sinha has  20+ years of experience in Information Technology. He is an accomplished author and has published multiple IT books. Please visit Author’s profile.

He has implemented IT solutions in various industries viz: Pharmaceutical, Healthcare, Telecom, Financial and Retail.

He coaches Organizations and Consultants in various Technologies.

He blogs regularly on Business Intelligence : http://www.learntableaupublic.com/   , http://www.learnallbi.com/

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.