Azure Data Lake

What is Azure Data Lake

Azure Data Lake is a data storage or repository of Structured and Unstructured data. It has no restrictions on the data size, thereby allowing storage of huge amounts of data. The stored data can be secured with Azure Active Directory. It also supports Azure Data Analytics that is used for processing the stored data.

Learn how to load data in Azure data lake with example. 

What are the Components of Azure Data Lake

Azure Data Lake has two components:

  • Azure Data Lake Storage (ADLS). It is a high performance, scalable storage for variety of data. It is based on HDFS (Hadoop Distributed File System). Data can reside in ADLS till it is accessed. Depending on the frequency of data access, data in ADLS can be:
    • Cool: Optimized for storing data that is infrequently accessed and stored for at least 30 days.
    • Hot: Optimized for storing data that is accessed frequently.
  • Azure Data Lake Analytics(ADLA). It is a processing component of Azure Data Lake. It helps in creating massive parallel data transformation and processing in U-SQL, R and Python. You can process data on demand, scale instantly and pay-as-you-go.

How to Create Azure Data Lake Storage

  1. Login to Azure portal using your Azure credentials or create a free Azure account.
  2. On Azure Portal search for Storage Accounts and select Add

Create Azure Data -ohio Computer

3. In the Create storage account screen, go thru each tab and provide the needed information. Under Basics, provide Subscription and Resource group. If the Resource group doesn’t exists you can Create new.

ADLS Create Storage Account

4. In the same screen, provide the Storage account name and Location near to you or your region. Select default in Performance. Provide entries for Account kind, Replication and Access tier.

  • Account kind: of general-purpose v2 storage account provides access to all of the Azure Storage services: blogs, files, queues, tables, and disks.
  • Replication: Azure keeps copies of your data. Locally redundant storage (LRS) copies data synchronously three times within a single physical location in the primary region. LRS is the least expensive, but is not suggested for applications requiring high availability.
  • Access tier: Hot option is optimized for storing data that is accessed frequently.

Deployment Model

5. Leave the defaults in the Networking.

6. Click on Advanced tab. Here you will provide the information regarding Data Lake Storage.

Advanced Tab

7. Click on Review + Create and wait for Validation.

8. Once Validation is passed, click on Create from the bottom. In a few minutes, you get the alert that your deployment is successfully created. This will create your Azure Data Lake Storage account.

Now you have create your Azure Data Lake account, its time to load data in it.


Author:

Chandraish Sinha

Chandraish Sinha has  20 + years of experience in Information Technology. An accomplished author, published multiple IT books. Please visit Author’s profile.

He has implemented IT solutions in various industries viz: Pharmaceutical, Healthcare, Telecom, Financial and Retail.

He coaches Organizations and Consultants in various Technologies.

He blogs regularly on Business Intelligence : http://www.learntableaupublic.com/   , http://www.learnallbi.com/

Related posts

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.