Skip to main content

ETL testing: Basic Understanding

 
                                         ETL testing: Basic
ETL stands for Extract-Transform-Load
In other words, any form of bulk data movement from a source to a target can be considered ETL
ETL testing is done to ensure that the data that has been loaded from a source to the destination after business transformation is accurate.
We also ensure the verification of data at various middle stages that are being used between source and destination. 
ETL testing is a data centric testing process to validate that the data has been transformed and loaded into the target as expected.
Lets try to understand ELT concept before we jump to ELT testing perspective.
Just take an example:
A manufacturing company having multiple departments such as sales, HR, Material Management, etc. All departments have separate databases which they use to maintain information
Now, if the company wants to analyze historical data and generate reports, all the data from these data sources should be extracted and loaded into a Data Warehouse to save it for analytical work.

An ETL tool extracts the data from all these heterogeneous data sources, transforms the data (like applying calculations, joining fields, keys, removing incorrect data fields, etc.), and loads it into a Data Warehouse. Later, we can use various Business Intelligence (BI) tools to generate meaningful reports, dashboards, and visualizations using this data.

         

Let’s talk about some basic terms when it comes to ETL testing:
1.       BI: Business Intelligence is the process of collecting raw data or business data and turning it into information that is useful and more meaningful.
2.      Data Ware house: The data warehouse is constructed by integrating the data from multiple heterogeneous sources
3.      ETL: An ETL is a process of extracting, transform and load the data from different data sources like transaction systems, applications and other database to Data warehouse so that it can be used for BI reporting



There is 3 Steps in ETL as the name suggests:
1.       Extraction of Data: Extraction of data is a process to extract desired data from multiple sources. the extraction process should be carried out in a way that it does no harm to the original data source with respect to performance and response time.
Depending of type of source system and destination business requirement we can select the extraction type:
Basically, three types of extraction methods are used widely:
Update notification – the system notifies user when a record has been changed. This is typically referred to as the easiest method of extraction.
Incremental extraction – some source systems can identify the changes made in the DB, they allow extraction of the modified data either in the form of new changed table or a source application column. This approach requires keeping a track of data source changes performed after the last extraction.
Full extraction – some systems can’t identify when data has been changed at all, so the only way to get it out of the system is to reload it all. This is usually only recommended for small amounts of data as a last resort. t gives back all the data stored in the data source and thus, there isn’t any necessity of tracking the changes in the data source since the last extraction.
2.     Transforming Data: The next step in the ETL process is transformation. After data is extracted, it must be physically transported to the target destination and converted into the appropriate format. This data transformation may include operations such as cleaning, joining, and validating data or generating calculated data based on existing values. In data transformation, we use a set of functions on extracted data to load it into the target system. Data, which does not require any transformation is known as direct move or pass through data.
3.      Loading Data: Third and final step is in the ETL process involves loading the transformed data into the destination target (a database or a data warehouse)
Basically, there is two methods for this process:
·         The full load method involves an entire data dump that occurs the first time the source is loaded into the warehouse
·         The incremental load, on the other hand, takes place at regular intervals.
Why organization should use ETL:
·         Transforms data from multiple sources and loads it into various targets
·         Provides deep historical context for businesses
·         Allows organizations to analyze and report on data more efficiently and easily
·         Increases productivity as it quickly moves data without requiring the technical skills of having to code it first
·         Evolves and adapts to changing technology and integration guidelines
In next post I would explain ELT testing
Happy testing!!!

Pic credits: Google

Comments

Popular posts from this blog

Pacing Calculation in Load runner

                                        Today i will share my understanding on “Pacing”. Usually people take pacing as think time only. I will try to explain both terms first Think time and Pacing, both are a kind of “wait” or “delay” in terms of time but still both differ from each other. Think time: Think Time is the time or delay which user can take between two actions For example, User opens a URL by pressing “Enter” and login page appears but to enter user name or password user is taking some time like copying from other place or remembering it or something like that. This wait or delay we can term as think time in load runner. Think time can be inserted by user in script or captured by tool itself Whereas, Pacing is the delay between two iterations of test. We can give pacing time by using “Runtime Settings” of load runner tool. We can calculate Pacing/Think...

MI Listener in LoadRunner

In a regular LoadRunner load test scenario (not over a firewall), The Controller has direct access to the LoadRunner agents running on remote machines. This enables the Controller to connect directly to those machines. When running Vusers or monitoring applications over a firewall, this direct connection is blocked by the firewall. The connection cannot be established by the Controller, because it does not have permissions to open the firewall. LoadRunner solves this problem by using a communication configuration based on HTTPS or secured TCP/IP. This configuration uses the standard SSL port on the firewall (port 443). The MI Listener is a component that serves as router between the Controller and the LoadRunner agent. When the LoadRunner agent connects to the MI Listener, the MI Listener keeps a listing of the connection to the agent using a symbolic name that the agent passed to it. When the Controller connects to the MI Listener, it communicates to the MI Li...

Configuring Vuser, Controller and Load Agent using MI Listener

          Configuring Vuser, Controller and Load Agent using MI Listener To Configure Vuser, Controller and Load Agent using MI Listener , make sure MI listener is running Running Vusers over a firewall : To set up your system to run Vusers over a firewall Ø   On each load generator machine that will be running over a firewall, configure the LoadRunner agent to communicate with the MI Listener. Ø   Configure the Controller machine to recognize the load generator and MI Listener machines. Configuring LoadRunner Agents over the Firewall Ø   Stop the LoadRunner agent by right-clicking its icon in the system tray and selecting Close. Ø   Run Agent Configuration from Start > Programs > LoadRunner > Advanced Settings, or run \launch_service\bin\AgentConfig.exe. Ø   Select the Enable Firewall Agent check box, and then click Settings. Ø   The Agent Configuration dialog box opens. Ø ...

Mobile Web(Http/HTML) Protocol IN Loadrunner

Today i will share my learning on performance testing performed on mobile application. We can use Load runner tool for mobile based applications. HP LoadRunner 11.50 has two new protocols for helping to record mobile applications: ·          Mobile Application – HTTP/HTML : recording scripts at the transport level for both browser-based mobile applications and native mobile applications, that communicate with their servers over HTTP ·          Mobile TruClient: recording scripts for browser-based mobile applications through the browser-based user interface.  This protocol is based on Ajax TruClient, using a browser modified to emulate the mobile browser. These protocols are independent of mobile operating systems, so will work on different versions of iOS, Android, Windows Mobile, WebOs (Palm), Blackberry, etc. Let’s see the picture behind the scene: We can run the p...

Performance Testing of Documentum

Before starting details of how to do performance testing on Documentum I will try share a basic of Documentum so you can get an idea why we need a different approach while doing performance testing on it. What is Documentum? Documentum is content management software that provides management of document content and attributes. Documentum product is a three-tier, client-server system built on top of a relational database. Documentum components comprise the following: ·          Content Server - Manage access to the metadata stored in the relational database and the  Content files, usually stored in the file system ·          Webtop/Wdk Server - J2EE-based web application framework Behavior behind the scene: An applet is invoked at the client side when a file within the docbase is accessed and while files are imported into a docbase. Applet/UCF client being client side Java applic...

Run Vugen script using command prompt

We can also run or replay our Vugen script using Dos Command Prompt (CMD) " LoadRunner install dir" \bin\mmdrv.exe -usr "path to .usr file" Note: LoadRunner install dir: Path of directory where Loadrunner has installed Hope this will help you guys. Happy Exploring!!!!

CITRIX Error in Load runner: Failed to get window size, wrong format.

Citrix based Applications are now days very popular. We can perform load testing on Citrix based application using load runner tool also. Quick tips for Citrix applications: 1. You can use "HTTP" and "Citirix" protocol together in load runner. 2. Try to keep windows in full screen mode. Some time while replaying the recorded script can give some error too. one of the error i have encountered was: Error: Failed to get window size, wrong format. Unable to detect client version. Warning: Extension CitrixClientImpl.dll reports error -1 on call to function ExtPerThreadInitialize Error: Vuser failed to initialize extension CitrixClientImpl.dll Resolve the issue: 1. Save the script after recording. 2. Go to Script directory and look for "default.cfg” file 3. Open the file into edit mode 4. Add below mentioned code in file and save it [CITRIX] DesktopColors=32 Colors=True Color (24 bit) Enctyption=128 Bit Window=1024 x 768 ...

Heap Dump : how to generate heap dump

Heap dump is important when it’s come to performance testing. Let’s start with what is heap in an application. Heap: Heap is dynamically collective amount of data /memory in partially binary sorted tree structure. But we need to loop up on heap in term of memory. Heap is part of memory where memory for objects assigned dynamically. Heap dump : Heap Dump is snapshot of memory at certain point which contains below mentioned information: -information of java objects -information of java classes or we can define as A heap dump is a textual or binary representation of the Java heap which is usually written to a file.   Heap dump is always helpful to analyze memory leak or high usage of memory of java application. There are various tools which help us to generate the heap dump. We can also generate heap dump without using tools. What we require is JVM (Java Virtual Machine)on system. There are various vendors for JVM but let’s concentrate on Sun JVM. ...

Software being installed: Cucumber Eclipse Feature 0.0.23.201811220126 Missing requirement: Cucumber Editor 0.0.23.201811220126

Error:    cucumber.eclipse.feature.feature.group 0.0.23.201811220126 While installing cucumber plugin for eclipse, sometimes we faced below mentioned error Error : Cannot complete the install because one or more required items could not be found.   Software being installed: Cucumber Eclipse Feature 0.0.23.201811220126 (cucumber.eclipse.feature.feature.group 0.0.23.201811220126)   Missing requirement: Cucumber Editor 0.0.23.201811220126 (cucumber.eclipse.editor 0.0.23.201811220126) requires 'osgi.bundle; org.eclipse.pde.ui 3.5.0' but it could not be found   Cannot satisfy dependency:     From: Cucumber Eclipse Feature 0.0.23.201811220126 (cucumber.eclipse.feature.feature.group 0.0.23.201811220126)     To: org.eclipse.equinox.p2.iu; cucumber.eclipse.editor [0.0.23.201811220126,0.0.23.201811220126 ] Solution: Clear your eclipse cache with below steps: Open Eclipse and navigate to the ...

Record scripts on mobile applications in LoadRunner

Record scripts on mobile applications in LoadRunner : As LoadRunner 11 in Patch 3, Mobile App (HTTP/HTML) protocol added which can help us to record scripts of mobile apps. There are three recording options in the recording Wizard: 1.Record and Analyze Traffic - Use mobile sniffer agent to capture traffic and then analyze. This allows Vugen to connect to the mobile agent on the server or localhost and carry out the business process for recording. 2. Analyze Traffic - Analyze traffic using existing capture file to generate script. This allows Vugen to scan a pre-captured network traffic (WinPcap) file and generate the code based on the traffic. 3. Record Emulator - Record script using a mobile device emulator. This allows Vugen to record an emulator installed on the Vugen machine. With this option, you need to verify that you are able to bring up the emulator and navigate to a URL outside of Vugen to make sure the emulator settings are correct.