Data Analysis and Visualization

Going from data to action is a recurring challenge in a start up. And the process has never been easier due to the wealth of amazing open source tools including Python (pandas, numpy, matplotlib), iPython Notebook, and D3,js.

I’ve recently worked on a project in the container shipping industry where we had a large database of information about repairs to shipping containers. The challenge was to find actionable opportunities based on insights gleaned from the data. Here’s how I went about the data analysis.

Mungeing and Probing

I started the project by flexing the data this way and that using pandas and the ipython notebook (both amazing tools you should get to know). This took a few passes. First I got it loaded into a DataFrame. Then I altered the structure to make it easier to understand, such as replacing coded names with full text. With that out of the way it was time to explore.  The most helpful chart I made was this pareto chart which reveals the relative significance of various drivers in the data. Below is the code to generate the chart for any data series.

import pandas as pd
import matplotlib.pyplot as plt
def combined_label(perc, tot):
Format a label to include by Euros and %.
return "{0:,.0f}k EUR, {1:.0f}%".format(perc * tot / 1000, perc * 100)
def cost_cum(data, focus, subject):
Accumulate the stats.
– data is a DataFrame,
– focus is the colum to group by
– subject is the column to aggregate.
# Setup data frame
parts = data[[focus, 'cost']].groupby(focus).sum().sort(subject, ascending=False)
parts['percent'] = parts['cost'] / parts.cost.sum()
parts['cum_percent'] = parts['percent'].cumsum()
return parts
def cost_pareto(data, focus_name, limit_percent = 0.75):
# Filter and organize the data frame
top_parts = data[data['cum_percent'] < limit_percent]
# Draw the plots
fig = plt.figure(figsize=(10,7))
fig.subplots_adjust(bottom=0.4, left=0.15)
ax = fig.add_subplot(1,1,1)
top_parts['cum_percent'].plot(ax=ax, color="k", drawstyle="steps-post")
top_parts['percent'].plot(ax=ax, kind="bar", color="k", alpha=0.5)
ax.set_ylim(bottom=0, top=1)
tick_nums = [x/float(100) for x in range(0,101,20)]
tot_cost = top_parts['cost'].sum()
ax.set_yticklabels([combined_label(x, tot_cost) for x in tick_nums])
ax.set_title("Top %s%% of Cost Split By %s" % (int(limit_percent * 100), focus))
return ax
accumulated = cost_cum(data, 'Part', 'Cost')
chart = cost_pareto(accumulated, 'Part', 0.9)

view raw
hosted with ❤ by GitHub

Using these pareto charts, plus a variety of histograms and scatter plots, I was able to provide the team with an initial window into the data which we used to identify an avenue that was worthy of further investigation.


With a more clear destination in mind, my goal became creating a visualization that would reveal the opportunity within the data. The tool for this is D3.js. D3 is a little bit confusing to get ones head around at first, but it is well worth figuring out because the things that you can do with it are amazing.

In our case, I wanted to let our team explore the impact of various interventions to curtail types of damage or to protect various parts of the containers. While the pareto chart (above) provides a insight about the cost of various damage types or container parts, it falls short when the two dimensions need to be considered together.

My solution is at this interactive visualization (view full size) . With it our team has been able to explore the data set without having to write more code. They are no longer dependent on me to “run the numbers”. And, it didn’t even take too long to make.


I highly recommend adding data analysis and visualization tools to your toolkit. They aren’t hard to learn and they are amazingly powerful.

  1. Leave a comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s