Computer Science > Lab Report > Study of ETL process and its tools. Data Mining File (All)
EXPERIMENT-1 AIM: Study of ETL process and its tools. What is ETL? ETL is an abbreviation of Extract, Transform and Load. In this process, an ETL tool extracts the data from different RDBMS source... systems then transforms the data like applying calculations, concatenations, etc. and then load the data into the Data Warehouse system. It's tempting to think a creating a Data warehouse is simply extracting data from multiple sources and loading into database of a Data warehouse. This is far from the truth and requires a complex ETL process. The ETL process requires active inputs from various stakeholders including developers, analysts, testers, top executives and is technically challenging. In order to maintain its value as a tool for decision-makers, Data warehouse system needs to change with business changes. ETL is a recurring activity (daily, weekly, monthly) of a Data warehouse system and needs to be agile, automated, and well documented. Why do you need ETL? There are many reasons for adopting ETL in the organization: ● It helps companies to analyze their business data for taking critical business decisions. ● ETL provides a method of moving the data from various sources into a data warehouse. ● As data sources change, the Data Warehouse will automatically update. ● Allow verification of data transformation, aggregation and calculations rules. ● ETL process allows sample data comparison between the source and the target system. ETL Process in Data WarehousesStep 1) Extraction In this step, data is extracted from the source system into the staging area. Transformations if any are done in staging area so that performance of source system in not degraded. Also, if corrupted data is copied directly from the source into Data warehouse database, rollback will be a challenge. Staging area gives an opportunity to validate extracted data before it moves into the Data warehouse. Hence one needs a logical data map before data is extracted and loaded physically. This data map describes the relationship between sources and target data. Three Data Extraction methods: 1. Full Extraction 2. Partial Extraction- without update notification. 3. Partial Extraction- with update notification Irrespective of the method used, extraction should not affect performance and response time of the source systems. These source systems are live production databases. Any slow down or locking could effect company's bottom line. Some validations are done during Extraction: ● Reconcile records with the source data ● Make sure that no spam/unwanted data loaded ● Data type check ● Remove all types of duplicate/fragmented data ● Check whether all the keys are in place or not Step 2) Transformation Data extracted from source server is raw and not usable in its original form. Therefore it needs to be cleansed, mapped and transformed. In fact, this is the key step where ETL process adds value and changes data such that insightful BI reports can be generated [Show More]
Last updated: 2 years ago
Preview 1 out of 36 pages
Buy this document to get the full access instantly
Instant Download Access after purchase
Buy NowInstant download
We Accept:
Can't find what you want? Try our AI powered Search
Connected school, study & course
About the document
Uploaded On
Sep 07, 2021
Number of pages
36
Written in
This document has been written for:
Uploaded
Sep 07, 2021
Downloads
0
Views
112
In Scholarfriends, a student can earn by offering help to other student. Students can help other students with materials by upploading their notes and earn money.
We're available through e-mail, Twitter, Facebook, and live chat.
FAQ
Questions? Leave a message!
Copyright © Scholarfriends · High quality services·