Transition from ETL to ELT

Transition from ETL to ELT

ETL (Extract-Transform-Load) and ELT (Extract-Load-Transform) are two terms that are often used in data engineering, especially in the context of data capture and transformation. Although these terms are often used interchangeably, they refer to different concepts and have different implications for data pipeline design.

In this post, we’ll clarify the definitions of ETL and ELT processes, point out the differences between them, and discuss the advantages and disadvantages they offer to engineers and data teams in general. And most importantly, I’ll describe how recent changes in the makeup of today’s data teams have affected the landscape of the ETL vs. ELT battle.

Understanding Extract, Load and Transform independently of each other

The main issue when comparing ETL and ELT is obviously the sequence of execution of Extract, Load and Transform steps within the data.

For now, let’s ignore this execution sequence and focus on the terminology itself and discuss what each individual step entails.

Withdrawal: This step refers to the process of extracting data from a persistent source. This data source can be a database, an API endpoint, a file, or anything else that contains any form of data, including structured or unstructured.

The extraction step pulls data from various sources

Transformation: At this stage, the pipeline is expected to perform some changes in the structure or format of the data to achieve a certain goal. A transformation can be a selection of an attribute, a change of records (for example, a transformation 'Великобритания' in 'UK'), validating the data, connecting to another source, or anything else that changes the format of the raw input data.

The transformation step performs a series of transformations on the raw input data

Download: The download step refers to the process of copying data (raw or converted) to the target system. Typically, the target system is a Data Warehouse (that is, an OLAP system used for analytical purposes) or an Application Database (that is, an OLTP system).

Loading data into the target system

Inevitably, the order in which these three steps are performed matters. And as the amount of data to be processed increases, so does the order of execution great value. Let’s discuss why!

Extract Transform Load (ETL)

ETL stands for Extract-Transform-Load, and the term itself refers to a process where a data extraction step follows a transformation step and ends with a load step.


Extract > Transform > Load

The data transformation step in ETL processes takes place in an intermediate environment outside the target system.where the data is transformed just before it is loaded into the destination.



The ETL transformation step takes place in the staging environment/server and then the transformed data is loaded into the target system

Extract Load Transform (ELT)

On the other hand, ELT, which stands for Extract-Load-Transformation, refers to a process where the extraction step follows the loading step and the final data transformation step occurs at the very end.


Extract > Load > Transform

Unlike ETL, ELT does not require a middleware/server because the data transformation is done inside the target system, which is usually a Data Warehouse hosted in the cloud.


In ELT, the transformation step takes place inside the target system

How to choose between ETL and ELT

ETL and ELT have their pros and cons, and you’ll likely come across both in your day-to-day work, given that they’re commonly used for different use cases.

ETL best suited for use cases where the data resides within the enterprise and needs to be structured before being loaded into a target database or repository. Therefore, the ETL process is usually better when smaller amounts of data are involved and/or when complex transformations need to be performed.

Also, because ETL transforms the data before the download stage, sensitive data can be masked, encrypted, or completely removed before download. This aspect of ETL can help companies and teams more easily comply with and implement compliance with various regulations (such as GDPR).

Since the transformation takes place on an intermediate server, there is an additional burden of moving the transformed data to the target system. In addition, the target system will not have raw data (that is, data in its pre-transformation form). This means that if additional transformations are needed, we have to re-extract the raw data.

On the other hand, ELT provides greater flexibility compared to ETL, as the latter was originally intended for structured (relational) data. Modern cloud architectures and data warehouses allow ELT to be used for both structured and unstructured data.

As mentioned earlier, ETL should be used for small amounts of data. ELT provides a faster conversion because it depends on the size of the data and is usually performed on an as-needed basis.

Also, if the data is loaded before it is transformed as part of the ELT process, it means that users and systems still have access to the original data. This means that if additional transformations are needed later, we already have the raw data in the data store that we can refer to at any time. The only downside is the need for additional storage for this raw data, but given the ever-decreasing cost of storage, I don’t think that’s a major problem.

Now that we all have a good understanding of the technical aspects of ETL and ELT processes, let me ask a question. When it comes to choosing between one and the other, is it just a matter of technical implementation?

It’s not just about when to perform conversions

In addition, the field of data is constantly evolving and moving forward, and data roles are no exception to this rapidly changing environment.

ETL vs. ELT is not only about where the transformation step happens – it’s (also) about who should perform them.

The conversion step usually includes some business logic. Traditional ETL processes used to be done by data warehouse engineers (not sure if that’s the case now), which means those people were also responsible for developing the business logic.

On the other hand, ELT processes have evolved due to the nature and landscape of today’s data teams and their formation. The EL (extract-load) steps are usually performed by data engineers, while the transformation step is performed by so-called analytical engineers.

And it seems quite logical to me. A data engineer is a purely technical person who cares about efficiency, scalability, availability, and availability (among a million other things). An analytical engineer, on the other hand, is a technical professional with a better understanding of the business. And so it makes sense for an analytics engineer to be responsible for data transformation, given that (usually) the transformation is aligned with business value.

Modern cloud architectures, data stacks (including cloud OLAP systems) and team building have made ELT processes more relevant and efficient. Based on my personal experience, I would say that there is a shift from ETL to ELT, even though ETL is still relevant and useful.

Today’s data stacks and teams favor ELT processes

Although ETL is not dead, in my opinion, today’s data stacks and technologies favor ELT processes. As an example, let’s look at dbt (data build tool), which is one of the hottest innovations in data and has become a de facto conversion tool for analysts and engineers.

Typically, we want to load raw data (ie. without applying any transformation) into the data warehouse from external or internal data sources. Then, based on these data models (in dbt we usually call them stage models), we build additional models (intermediate and mart) that are the result of some transformation processes that take place inside the data store.

In such workflows, it therefore makes sense to load the data into storage before it is transformed. It also allows raw data to be accessed at any time to support future use cases.

If you are interested in a deeper understanding of how dbt works and how the various components interact to transform raw data and create meaningful data models to support decision making, I would recommend the following article.

Conclusion

Designing data pipelines is a complex task, and many factors must be carefully considered when performing it. When it comes to loading data from data sources into a data warehouse, you can usually choose one of two approaches.

In this article, we discussed how ETL and ELT perform a sequence of steps to transform and load (or load and transform) data into a destination system.

Depending on your organization’s landscape and specific use case, you can choose one of them. I hope this guide has given you all the details you need to choose the best and most efficient approach when it comes to downloading and converting data.


And the most important thing is ours TG channels. No extra spam.

Related posts