This Is How You Can Drive Down Distribution Costs

Distribution Costs

This Is How You Can Drive Down Distribution Costs

600 400 Admin

By Andy Eschbacher, Data Scientist, CARTO 

The logistics industry is under increasing pressure thanks to rising costs, a shrinking labour force, and unrealistic customer expectations for priority deliveries. Even Amazon is not immune to these changes. Between 2015 and 2017, the e-commerce retailer’s shipping costs skyrocketed from $11.5 (approx. RM47.6) billion to $21.7 (approx. RM89) billion.

Logistics optimization problems are notoriously difficult to solve. However, with spatial data science, we can build data models that simulate existing network conditions, providing us insights on existing constraints, inefficient territory assignments, and much more.

Looking at Malaysia, one of the challenges faced include higher operating costs (manpower cost, fuel cost and warehouse space expansion). According to a recent article by The Star, it was reported that “some of their last-mile delivery challenges include the lack of new technology and automation in their operations. This can cause a further loss in data collection and in the correct use of information, which perpetuates their inability to measure up to market demands.”

On top of this, there is also growing competition with the entry of new players such as GoGoVan, Lalamove, Deliveree, Ninja Van and TheLorry, which poses a threat to those who don’t adapt their business models fast enough.

This is why it is more important than ever to prepare contingency logistics plans. Below, the team at CARTO built a network model to enable a more sophisticated approach to logistics planning.

Transportation Problems in First Mile Logistics Network

“How can goods be distributed from supply points to demand points at the lowest possible costs?” drives many logistics optimization projects.

For this scenario, one can answer this question from the perspective of a retail company tasked with shipping 6,000 parcels across the Greater New York City region, for example, using its existing distribution network of 5 warehouses and 13 fulfillment centres. The map below gives a sense of the project’s scope:

Entire Network


The goal is to define distribution routes moving supplies from warehouses to fulfillment centers and finally to delivery address. In data science, network models can solve transportation problems by redirecting the flow of goods along optimal routes in light of network constraints. The diagram below presents an overview of these various computations involved in determining optimal shipping routes.

RelatedWhat is Supply Chain Network Design and How Does It Work?

Transportation Diagram


In this scenario, distance traveled, shipping costs, and processing costs are optimization factors and since we are looking to find the lowest possible distribution costs, we will be looking to reduce fuel and labour costs.

The road network provides distance traveled, and shipping and processing estimates and are based on past costs:

  1. Shipments from warehouse to fulfillment center cost on average $0.002 per kilometer per package
  2. Shipments from fulfillment center to delivery address cost on average $0.005 per kilometer per package
  3. Fulfillment centers and warehouses have estimates for the total cost for processing a package, which vary from $0.90 to $1.36 for fulfillment centers and $0.84 to $1.99 for the warehouses

Although it seems sensible to base shipping routes on proximity of warehouse, fulfillment center, and delivery addresses, this approach overlooks fulfillment center and warehouse capacity levels and their respective processing costs.

The primary objective is to deliver goods to customers within a certain period of time. When fulfillment centres are overwhelmed, parcel deliveries can be delayed, damaged, returned, or even lost, which can increase costs and decrease customer satisfaction.

Building an Origin-Destination Matrix

In order to solve this transportation problem, you need to find a global combination of possible routes at the lowest possible costs while also ensuring that the constraints are met:

  1. Fulfillment centres do not exceed capacity limits
  2. All packages are delivered

You will need to create an Origin-Destination Matrix (ODM) using the road network to determine the global combinations for distance and time travel estimates.

ODMs are useful when solving spatial problems because they give all possible paths between origins and destinations to help decide on which permutation of visit order to pick (e.g., traveling salesman type problems), or which combinations of origin/destinations for logistics delivery problems (e.g., network problems).

For this problem, the team at CARTO worked mostly in Python, a popular choice for data scientists. Why Python? This programming language’s framework allows one to easily connect with services like Valhalla, an open source routing software needed to build our OD matrix, and CARTO thanks to CARTOframes.

Finding all possible transportation routes requires building 2 OD matrices showing paths (1) from warehouses to fulfillment centers, and (2) from fulfillment centers to delivery addresses. Location data on the warehouses, fulfillment centres, and customer addresses will be needed to complete both OD matrices.

OD matrices bring together a lot of information. However, as the map of warehouse to fulfillment centers below illustrates, we’re still not sure which routes are most cost effective (let alone what routes from fulfillment center to delivery addresses!).


While the map shows only straight-line distances between OD pairs, the road distance and time of traversal are used in the optimization problem and the straight lines are used for clearer cartography here.

Designing a Logistics Optimization Algorithm

Now, you need to design an algorithm that applies the constraints defined in section one to the data gathered in section two so that this information is filtered down to the network’s optimal routes.

Our basic model that we are minimizing over – our objective function– is the following:

num packages * (transport cost * distance + transfer processing cost)
num packages * (transport cost * distance + warehouse processing cost)

The map below represents CARTO’s optimized network model based on the most cost-effective shipping assignments given our input data:


The map shows the following information:

  • The warehouses (red circles) and fulfillment centers (purple circles) are rendered in proportion to their capacity
  • The black lines represent the number of parcels shipped from warehouses to fulfillment centers
  • The coloured regions represent each fulfillment center’s delivery area
  • For each delivery area the number of parcel packages and average costs of deliveries are calculated

With this information, you can calculate the costs of distributing 6,000 packages from warehouses to fulfillment centers along the most cost-effective routes to be $41,988.37 (approx RM174,126).

At the same time, the team at CARTO noticed that some fulfillment centers are serviced by multiple warehouses, especially the one located in Brooklyn, which handles the bulk of New York City deliveries. The processing fees at each warehouse, however, vary quite a bit:

  • North Haven, Connecticut: $.84 per package
  • Gouldsboro, Pennsylvania: $.94 per package
  • Easton, Pennsylvania: $1.34 per package
  • Carteret, New Jersey: $1.72 per package
  • Cranbury, New Jersey: $1.99 per package

If processing capacity were increased at warehouses with lower processing fees, that would enable one to lower total distribution costs as parcel packages would be redistributed through warehouses with lower fees.


After capacity constraints were readjusted, the team at CARTO ran an objective function algorithm again which eliminated the Cranbury warehouse from CARTO’s network since parcels cost the most to ship from this location. In making this change in the network total distribution costs from warehouse to fulfillment center now amount to $35,121 (approx. RM145,648), reducing costs by $6,866.71 (approx. RM28,473).

The Location Intelligence Imperative in Logistics

With this algorithm, you can extend the analysis beyond optimal delivery assignments. For instance, you can run scenarios that calculate the costs of removing a fulfillment center from the network if there were a power outage or natural disaster impacting operations at a given site.

Another option would be to find optimal routes at certain times of day to help our fleet avoid traffic and high toll prices by adding advanced truck routing and TomTom GPS data.

While these scenarios only go as far as the quality of the data and complexity of the model, they demonstrate the benefits of working with spatial data science when addressing logistics problems.

Article first appeared on the CARTO blog.

Lava Labs brings together innovation and technology, combined with expertise and deep understanding of businesses and their needs by engaging with industry leaders to empower organizations. We specialise in building custom web and mobile applications in Malaysia and around the APAC region.