ETL testing: Basic
ETL stands for Extract-Transform-Load
In other words, any form of bulk data movement from a source
to a target can be considered ETL
ETL testing is done to ensure that the data that has been loaded
from a source to the destination after business transformation is accurate.
We also ensure
the verification of data at various middle stages that are being used between
source and destination.
ETL testing is a data centric testing process to validate
that the data has been transformed and loaded into the target as expected.
Lets try
to understand ELT concept before we jump to ELT testing perspective.
Just take an example:
A manufacturing
company having multiple departments such as sales, HR, Material Management, etc.
All departments have separate databases which they use to maintain information
Now, if
the company wants to analyze historical data and generate reports, all the data
from these data sources should be extracted and loaded into a Data Warehouse to
save it for analytical work.
An ETL tool extracts the data from all
these heterogeneous data sources, transforms the data (like applying
calculations, joining fields, keys, removing incorrect data fields, etc.), and
loads it into a Data Warehouse. Later, we can use various Business Intelligence
(BI) tools to generate meaningful reports, dashboards, and visualizations using
this data.
Let’s
talk about some basic terms when it comes to ETL testing:
1.
BI: Business
Intelligence is the process of collecting raw data or business data and
turning it into information that is useful and more meaningful.
2.
Data Ware house: The data warehouse is constructed by integrating
the data from multiple heterogeneous sources
3.
ETL: An ETL is a
process of extracting, transform and load the data from different data sources
like transaction systems, applications and other database to Data warehouse so
that it can be used for BI reporting
There is
3 Steps in ETL as the name suggests:
1.
Extraction of Data: Extraction of data is a process to extract desired
data from multiple sources. the extraction
process should be carried out in a way that it does no harm to the original
data source with respect to performance and response time.
Depending
of type of source system and destination business requirement we can select the
extraction type:
Basically,
three types of extraction methods are used widely:
Update notification – the system notifies user when a record has
been changed. This is typically referred to as the easiest method of
extraction.
Incremental extraction – some source systems can identify the changes
made in the DB, they allow extraction of the modified data either in the form
of new changed table or a source application column. This approach requires
keeping a track of data source changes performed after the last extraction.
Full extraction – some systems can’t identify when data has
been changed at all, so the only way to get it out of the system is to reload
it all. This is usually only recommended for small amounts of data as a last
resort. t gives
back all the data stored in the data source and thus, there isn’t any necessity
of tracking the changes in the data source since the last extraction.
2. Transforming
Data: The next
step in the ETL process is transformation. After data is extracted, it must be
physically transported to the target destination and converted into the
appropriate format. This data transformation may include operations such as
cleaning, joining, and validating data or generating calculated data based on
existing values. In data transformation, we use a set of functions on extracted
data to load it into the target system. Data, which does not require any
transformation is known as direct move or pass through data.
3. Loading Data: Third and final step is in the ETL process
involves loading the transformed data into the destination target (a database
or a data warehouse)
Basically, there is two methods for this
process:
·
The full load method involves an entire data dump that occurs the
first time the source is loaded into the warehouse
·
The incremental load, on the other hand, takes place at regular
intervals.
Why organization should use ETL:
·
Transforms
data from multiple sources and loads it into various targets
·
Provides
deep historical context for businesses
·
Allows
organizations to analyze and report on data more efficiently and easily
·
Increases
productivity as it quickly moves data without requiring the technical skills of
having to code it first
·
Evolves
and adapts to changing technology and integration guidelines
In next
post I would explain ELT testing
Happy
testing!!!
Pic credits: Google
Comments
Post a Comment