May 22, 2023
ETL Data Pipeline at Logward
Logward makes world-class supply chain software to provide logistics professionals with better visibility. How does Logward do it? With three types of solutions to manage supply chains on a single platform:
- Sourcing Solutions – Collaborate with suppliers, manufacturers, operations team, and customers to align sales and purchase orders for freight consolidation
- Procurement Solutions – Manage all rates and tenders in one place, and visualize allocated capacity that has been agreed with carriers in advance
- Transport Solutions – Visualize transport from deciding on routes to booking to tracking delivery, as well as optimize processes by connecting with authorities and managing documentation
To achieve all the above, Logward deals with a huge amount of data, which it receives from different stakeholders in numerous formats through various mediums. One of the biggest hurdles is dealing with fragmented and non-standard data. These massive sets of non-standard data need to be transformed into a standard format to provide better supply chain visibility and analytics. Logward’s data pipeline plays an instrumental role in accomplishing this.
What is a data-pipeline?
A data pipeline is a set of steps that moves data from one system to another. Different types of data pipelines perform different operations through the data transit. This process can include measures like data duplication, filtration, migration to the cloud, and enrichment processes. Pipelines automatically aggregate information from disparate sources, then transform and consolidate it into one high-performing data storage.
Thanks to data consolidation, everyone who uses data to make strategic and operational decisions or build and maintain analytical tools, can easily and quickly access it. These are data analysts, data quality associates, and operations specialists at Logward.
Building and managing infrastructure for data movement, and its strategic usage, is what data engineers at Logward do.
There are two kinds of data pipeline:
- Batch - A batch data pipeline periodically transfers bulk data from source to destination
- Real-time - A streaming data pipeline continually flows data from source to destination while translating the data into a receivable format in real-time
A data pipeline has the following components:
- Origin – Source where the original data resides
- Destination – Final point to which the data is transferred. Can be a data store, API endpoint, analytics tool, etc.
- Data flow – Movement of the data between origin and destination
- Processing – Manipulation and transformation involved in moving the data
- Storage – All systems used to preserve data through the stages of the data flow
- Workflow – Series of processes to describe the flow of data
- Monitoring – Ensures all stages of the data pipeline are working correctly
One type of data pipeline that Logward uses is an ETL pipeline.
What is an ETL data pipeline?
ETL is an acronym for extract, transform, and load. ETL is the most common data pipeline architecture, one that has been a standard for decades. It extracts raw data from disparate sources, transforms it into a single pre-defined format, and loads it into a target system.
Typical use cases for ETL pipelines include:
- Migrating data from legacy systems to a data warehouse
- Pulling user data from multiple touchpoints to have all information on customers in one place
- Consolidating high volumes of data from different types of internal and external sources to provide a holistic view of business operations
- Joining disparate datasets to enable deeper analytics
One downside of the ETL architecture is rebuilding the data pipeline each time business rules and data format requirements change. To address this problem, there is another architecture: ELT Data Pipeline (yes you guessed it correct: Extract, Load, Transform). Rather than transforming the data on a processing server like ETL, ELT first loads the data into a data warehouse and then transforms the data in the data warehouse. This seemingly minor shift changes a lot. Instead of converting huge amounts of raw data, it moves directly into a data warehouse or data lake. Then, it can be processed and structured as needed, at any moment, fully or partially, once or numerous times.
Logward follows the ETL architecture of data pipeline in a batch mode. Let's dig a bit deeper on how each component of ETL works at Logward.
ETL Data Pipeline at Logward
Logward receives data from different stakeholders in numerous formats through various modes. Some of the modes through which Logward receives data:
- Emails – data sent as attachments to custom-created email address
- API – data sent in JSON or XML formats through REST APIs
- EDI – pipeline with various customers to receive data in standard formats like EDIFACT
Some of the common formats for receiving data:
After extraction from these various sources and formats, the data enters the transformation layer.
On top of receiving data in numerous formats, the data has a different structure depending on who sends it. Logward transforms this into “Logward standard” to save it in the database. This provides better insights using analytics and better management of shipments, allocations, rates, and other data. For this purpose, Logward has two transformation layers:
- Seeburger – establishes EDI connection with clients and transforms the EDIFACT files into JSONs which can be transferred to backend services and saved into the database
- Custom Python pipeline – consumes data in the above-mentioned format, maps that data to “Logward standard”, applies business logic, and publishes it to a message queue; it is custom-written using various technologies in Python language (Pandas, DataFrames, openpyxl)
After the transformation is complete, the data is published to a message queue (Logward uses Amazon SQS) and read by the consumer, which saves it to the database. And voilà! The data is available in the database for customers to visualise and for Logward’s Operations Team to build beautiful dashboards for better insights.
Some other important components in the ETL pipeline:
- Mapping – Customers add their own custom mapping to transform the value of certain fields, for example, changing port names to LOCODEs
- Backup – Logward creates backups of all files in AWS S3 storage
- Monitoring – There are alerts for each step of the data pipeline to ensure notification are sent in case something goes wrong, allowing Logward’s Data Engineers to react quickly
- Error Files – Logward generates error files and sends them to customers in case of source files errors
This is one of the major components in the Logward tech architecture that helps make efficient supply chain software. Want to see your data in one of our dashboards too? Contact our sales team now!
Logward is a Hamburg & Bangalore based logistics technology company.
We build software, move containers, and change mindsets.
If you have any questions or just want to say hi, reach out to firstname.lastname@example.org. Or you can book time with one of our logistics experts here.