Data & Analytics

What’s the Difference Between Extract Transform Load (ETL) and Extract Load Transform (ELT)?

Caroline Kinnen
September 20, 2024

Imagine turning raw data into powerful insights that drive your business forward. No matter the order, most data technologists are likely familiar with extracting, transforming, and loading data. Generally, these terms together refer to the practice of moving data from various sources to a centralized location and changing the state of the data to make it useful for analytics, dashboards, machine learning, and more. Extracting involves gathering data from different sources, transforming is about cleaning and structuring this data, and loading ensures it is correctly placed into a database or data warehouse. Understanding these processes is crucial for anyone looking to harness the full potential of their data, as they form the backbone of efficient and effective data management.

Extract

Extract refers to pulling data from a source. Depending on the source, it might come in the form of an API response, SFTP file, or any other number of data extraction results. The exported data may or may not be structured. Given the varied state of the data at this point, performing some transformations is essential before attempting to make sense of it.

Transform

In order to use the data, some transformations are necessary. Example transformations might include renaming columns to fit a certain standard, converting time zones, or dropping rows or columns that don’t fit a certain criteria. The exact transformations that are needed depends on the data source and the final business use case.

Load

Load refers to moving the data from a staging area into its final location. Companies often have a central data warehouse made up of data from multiple sources. The data is organized into tables that are accessible to stakeholders and easy to make use of within the larger context.

The traditional order of events is Extract Transform Load (ETL) - first pulling raw data, then performing the transformations, and lastly loading the data in its final state to the central repository.

(Drawing link)

ELT offers greater flexibility and cost savings by allowing transformations to be applied post-load. This approach is especially beneficial in dynamic environments with evolving data requirements.

Extract Load Transform (ELT)

(Drawing link)

For the end user, the result is the same. Analysts can still build dashboards and decision makers can use the same data to drive results. However, ELT can provide flexibility and cost saving opportunities on the backend.

Consider this example: a shipping company uses Redshift as their data warehouse. The Point of Sale software sends item information every day. Using an ETL process, the company’s data pipeline calls the relevant API endpoint, transforms the response using Python, then uploads the dataframe to a table in Redshift called  warehouse_inventory_items. Its schema includes a price column that represents the cost of an item in US Dollars. The source sends the price column in Mexican Peso, but the Python transformation can easily make the conversion.

The company is growing as part of a dynamic industry, and the business identifies a use case to move internationally. The price column now needs to be represented in Canadian Dollars.

To make this change, the data team would have to update the transformation logic to change the conversion. However, there is the problem of the historial table - the company doesn’t store the original cost in Pesos, and converting to Canadian Dollars from US Dollars would lead to errors. The company can check its data retention policy with the Point of Sale software, but there’s no guarantee they have it.

The company would have more success with an ELT process. In this setup, the pipeline would still call the API endpoint, but instead of immediately transforming the response, it would store the API response in an AWS S3 bucket. After that, the same Python script would transform the response and convert the price column to US Dollars before uploading the CSV to Redshift.  

Now imagine the same request - change the transformation to convert the price to Canadian Dollars and apply it to all historical data. The data team would simply update the transformation logic to change the conversion, then clear the table and re-run the script for every response stored in the S3 bucket. This covers the historical data, and the transformation will be applied correctly moving forward.

ELT can be very useful in dynamic environments with changing requirements and use cases. More general benefits include:

  • Flexibility - ETL often requires predefined transformation logic which is less adaptable to changing business requirements over time. With ELT, the logic is all contained within the targeted warehouse transformations, so changes can be modified and applied quicker.
  • Diversity in data - ELT allows for less structured data in a variety of storage formats
  • Scalability - ELT better leverages modern cloud-based storage solutions. Large volumes of data can be stored without significant overhead infrastructure

When deciding if ELT is right for their team, developers and architects should also take these potential downsides into consideration.

  • Cost of more data storage - although the cloud offers affordable storage options, it's not free to store data. Costs could potentially double as you would be storing two copies of the data - raw and the final form
  • Changes in data security and compliance - in ETL, sensitive data can be masked or encrypted during the transformation phase. In ELT, the most raw version of the data is stored, and that might include sensitive data. Data warehouses and storage offer robust security features, but teams will have to consider the settings before the data is loaded into the target system

Choosing between ETL and ELT depends on factors like cost, data volume, and infrastructure. ELT is ideal for handling unstructured data and adapting to changing requirements. It leverages modern data warehouses to manage large data volumes efficiently and offers greater flexibility and faster processing by loading raw data and transforming it later. This approach helps businesses streamline data workflows and maximize data value.

In summary, understanding ETL vs. ELT is key to optimizing data architecture and staying agile in a dynamic environment.

Related Insights