Skip to main content

ETL testing: Basic Understanding

 
                                         ETL testing: Basic
ETL stands for Extract-Transform-Load
In other words, any form of bulk data movement from a source to a target can be considered ETL
ETL testing is done to ensure that the data that has been loaded from a source to the destination after business transformation is accurate.
We also ensure the verification of data at various middle stages that are being used between source and destination. 
ETL testing is a data centric testing process to validate that the data has been transformed and loaded into the target as expected.
Lets try to understand ELT concept before we jump to ELT testing perspective.
Just take an example:
A manufacturing company having multiple departments such as sales, HR, Material Management, etc. All departments have separate databases which they use to maintain information
Now, if the company wants to analyze historical data and generate reports, all the data from these data sources should be extracted and loaded into a Data Warehouse to save it for analytical work.

An ETL tool extracts the data from all these heterogeneous data sources, transforms the data (like applying calculations, joining fields, keys, removing incorrect data fields, etc.), and loads it into a Data Warehouse. Later, we can use various Business Intelligence (BI) tools to generate meaningful reports, dashboards, and visualizations using this data.

         

Let’s talk about some basic terms when it comes to ETL testing:
1.       BI: Business Intelligence is the process of collecting raw data or business data and turning it into information that is useful and more meaningful.
2.      Data Ware house: The data warehouse is constructed by integrating the data from multiple heterogeneous sources
3.      ETL: An ETL is a process of extracting, transform and load the data from different data sources like transaction systems, applications and other database to Data warehouse so that it can be used for BI reporting



There is 3 Steps in ETL as the name suggests:
1.       Extraction of Data: Extraction of data is a process to extract desired data from multiple sources. the extraction process should be carried out in a way that it does no harm to the original data source with respect to performance and response time.
Depending of type of source system and destination business requirement we can select the extraction type:
Basically, three types of extraction methods are used widely:
Update notification – the system notifies user when a record has been changed. This is typically referred to as the easiest method of extraction.
Incremental extraction – some source systems can identify the changes made in the DB, they allow extraction of the modified data either in the form of new changed table or a source application column. This approach requires keeping a track of data source changes performed after the last extraction.
Full extraction – some systems can’t identify when data has been changed at all, so the only way to get it out of the system is to reload it all. This is usually only recommended for small amounts of data as a last resort. t gives back all the data stored in the data source and thus, there isn’t any necessity of tracking the changes in the data source since the last extraction.
2.     Transforming Data: The next step in the ETL process is transformation. After data is extracted, it must be physically transported to the target destination and converted into the appropriate format. This data transformation may include operations such as cleaning, joining, and validating data or generating calculated data based on existing values. In data transformation, we use a set of functions on extracted data to load it into the target system. Data, which does not require any transformation is known as direct move or pass through data.
3.      Loading Data: Third and final step is in the ETL process involves loading the transformed data into the destination target (a database or a data warehouse)
Basically, there is two methods for this process:
·         The full load method involves an entire data dump that occurs the first time the source is loaded into the warehouse
·         The incremental load, on the other hand, takes place at regular intervals.
Why organization should use ETL:
·         Transforms data from multiple sources and loads it into various targets
·         Provides deep historical context for businesses
·         Allows organizations to analyze and report on data more efficiently and easily
·         Increases productivity as it quickly moves data without requiring the technical skills of having to code it first
·         Evolves and adapts to changing technology and integration guidelines
In next post I would explain ELT testing
Happy testing!!!

Pic credits: Google

Comments