This is the start of a post I’ve added to my Pivotal Labs blog
What does an agile software development team look like?
At core, software engineers turn ideas into results. Of course we are not the only ones with this job description. We share it with many other creative professions. Writers, for example, turn ideas into words that inspire, educate and inform. That’s pretty much what we do too: we turn ideas into words that instruct a computer system to perform a desired behaviour.
Focusing on writers for a moment… there are a wide range of writing environments and styles. On one side of the spectrum are novelists who secret themselves away into a quiet room, where they can find the time and space to breathe life into their intricate vision. On the other are journalists on the newsroom floor, sampling from an avalanche of information, responding quickly to what’s new and what’s important.
At Pivotal Labs, we work with many fast-moving companies, helping them to fashion the software engineering capability they need to succeed. Our approach is to create an environment that resembles a newsroom more than a writer’s hide away.
Like most engineers, I do a lot of optimizing, often just for fun. When walking to work I seek the shortest route. When folding my laundry I minimize the number of moves. And at work, of course, I optimize all day long alongside all my engineering colleagues.
Optimization, by definition, requires an objective function as the basis for measuring improvement or regression. How short is the route? How many moves does it take to fold a shirt? But what is the objective function at work around which my team and I should optimize?
I’ve worked in many software engineering organizations where the objective function is an unstated confusion that evolved on its own over time. It’s often a bit of “don’t break things” mixed with a dose of “conform to the standards”. Sometimes more destructive objectives find their way into the culture: “get yourself seen,” or worse “don’t get noticed.” And my least favorite, “horde your knowledge.”
Recently, while working with a client, I had to state my views on a good objective function for a software engineering team. It’s to predictably deliver a continuous flow of quality results while minimizing dark time — the time between when a feature is locked down for development and when a customer starts using it.
Predictable: Your process has to be stable and sustainable. It’s not about sprinting to collapse; nor is it about quick wins followed by a slog through technical debt. It’s about a steady pace over a long time. Hence the volatility measure in Pivotal Tracker; a good team has low volatility, and therefore their rate of delivery is highly predictable.
Delivery: Code is not worth anything until it is in use by customers. Calling delivery anything else often leads to spinning wheels and wasted effort.
Continuous flow: Activities that potentially disrupt the flow and would be better off if dealt with in the moment, in the normal run of things. For example, I find mandatory code reviews disruptive and demoralizing. Gatekeepering steps like these, by definition, stop the normal flow and send things back for rework. In contrast, pair programming often achieves the same quality and consistency objectives in real time and without disrupting the flow
Quality: This is a relative measure. The work needs to be done sufficiently to avoid rework (i.e. bugs) and to prevent the accumulation of technical debt. Spending more time trying to achieve “quality” beyond these measures is just waste.
Results: What it’s all about.
Minimizing dark time: Many software engineering organizations miss this one because it’s driven by the business rather than the needs and craftsmanship of the engineers themselves. And yet, minimizing dark time is perhaps the most critical contribution that an engineering team can make to a business.
Dark time is what the business experiences between when the engineers remove the businesses ability to re-spec a bit of work and when they hand back a working result. In this dark time the business can no longer refine their decision nor observe and learn from the results. They’ve ordered (and paid for) the new car, but are waiting for the keys. It’s dark because during this stage there is nothing for the business to do, with respect to that feature, but wait.
While coding I experience the same dark time when working on a slow code base or, worse yet, working with a slow test suite. My TDD cycle grinds to a crawl as I wait a minute or more (the horror!) between when I hand my code to the compiler/interpreter and when it spits back the red-green results.
If you hate twiddling your thumbs waiting for slow tests to run, think how frustrating it is for the business folks when their engineering team throws them into the dark for days, perhaps even weeks. Of course they pipeline their work and find ways to be useful, but the dark time still sucks.
When a software engineering team chops dark time down from a month to a week business folks cheer. When the engineers chop dark time down to a day or less, the business folks do what us coders do when working with a fast test suite… we fly.
This post also appears on my blog at Pivotallabs.com.
Every once in a while I spend a bit of time reviewing and streamlining my GTD process. This time I hit on a pretty big win — automating a connection between Gmail and Remember the Milk so that collecting actions from across all of my Gmail accounts is one-click easy. This automation makes it a breeze to process email on my phone. Woohoo – GTD on the can!
Remember the Milk has been at the core of my GTD stack for several years now. I’ve looked into other systems, even trying Astrid for a month, but RTM still wins for my requirements set. It’s got a scriptable API, a command line interface, a solid Android app, and a nice web interface that’s keyboard friendly. Of course it has it quirks and annoyances (why can’t I move tasks between lists using the keyboard?), but it’s the best that I’ve found.
For email, Gmail has wormed its way into being my primary mechanism. It too has it quirks and annoyances, but the network effects are strong and the price is right (well, was right for business accounts). I still use Thunderbird too, but now primarily as a local Gmail client.
For this round of streamlining the challenge I set was to enable keyboard-based creation of a RTM task from a Gmail email that includes a link back to the original thread for quick action. This is now possible using a Google App Script. Here’s the code:
Please feel free to form the gist and improve it!
I’ve recently been consulting as a lean startup expert at the retail side of a large bank in the UK which is exploring how to increase their rate of innovation. The project has been challenging, inspiring and filled with lessons learned.
The bank hired me to help them create a new venture, and in doing so it became clear that the bank is struggling with a larger underlying challenge: how to drive innovation in and around their organization.
I went into the project expecting to find bank managers who had no interest in innovation – why would a bank that is “too big to fail” have much interest in shaking things up? To my surprise the situation was quite the opposite. Almost everyone I met, which spanned a range of management levels, shared three views. First, they wanted to innovate. Second, they were frustrated at the inability to innovate within their organization. And third, they were proud of the ideas that the bank had managed to nurture and launch.
The bank, it turns out, has the potential to be what Steve Blank would call an earlyvangelist customer.
Earlyvangelists are a special breed of customers willing to take a risk on your startup’s product or service. They can actually envision its potential to solve a critical and immediate problem—and they have the budget to purchase it. Unfortunately, most customers don’t fit this profile. (source)
In this case:
- The bank management have a problem – lack of innovation
- They are aware of the problem.
- They are actively looking for a solution.
- The problem is so painful that they cobbled together interim solutions.
- And they have even allocated budget to continue to tackle the problem.
With this in mind I used the project as an opportunity to iterate towards a business model for delivering an effective intervention for driving innovation in large banks. As you’ll see in the remainder of this post, my team and I have learned many lessons. We’ve also made considerable progress towards finding a model that is likely to work.
The rest of this post details the business models we tested and the resulting lessons we learned. It ends with a proposal for a new model, the Inside-Out Incubator, that’s centered around seeding an ecosystem of innovation in and around the bank. It uses an indirect approach that is more likely to succeed than trying to directly change the bank’s deeply engrained culture.
So, without further ado…
With the pandas library, if you have read in a csv file with a date field that is sometimes empty and are getting the error:
TypeError: can't compare offset-naive and offset-aware datetimes
It may be caused by dateutil.parser.parse which is the default function for parsing dates when the csv file is read. This function returns the current non-timezone aware date when given an empty string as input. According to the dateutil documentation “the default value is the current date, at 00:00:00am.” This causes confusion in the context of pandas for three reasons:
- the data in the DataFrame is not derived from the source CSV file
- the expected value of an empty field is numpy.NaN.
- the returned datetime object is not time zone aware.
Fortunately pandas.read_csv has the date_parser argument which allows you to include your own parsing function. One might thing that the following function is the right fix…
def date_or_nan_parse(str): if not(str): return numpy.NaN return dateutil.parser.parse(str)
BUT NO. If you then try to compare a date against the pandas.Series of the dates the same “TypeError: can’t compare offset-naive and offset-aware datetimes” may get thrown. In fact, both of these will fail:
df['datefield'] > dateutil.parser.parse("January 1 1901 00:00 UTC") df['datefield'] > dateutil.parser.parse("January 1 1901 00:00")
This time the problem is that the comparison fails on the numpy.NaN values.
My fix is:
OLD_DATE = dateutil.parser.parse("January 1, 1901 00:00 UTC") def date_or_nan_parse(str): if not(str): return OLD_DATE return dateutil.parser.parse(str)
In this case I introduce a new value rather than (the more correct) numpy.NaN. The new value is a date so it doesn’t fail during comparison operations (assuming that OLD_DATE has the same time zone awareness as the rest of your data). And at least it is a date that I’ve explicitly chosen, rather than just the current date.
I hope this saves you some confusion.
Going from data to action is a recurring challenge in a start up. And the process has never been easier due to the wealth of amazing open source tools including Python (pandas, numpy, matplotlib), iPython Notebook, and D3,js.
I’ve recently worked on a project in the container shipping industry where we had a large database of information about repairs to shipping containers. The challenge was to find actionable opportunities based on insights gleaned from the data. Here’s how I went about the data analysis.
Mungeing and Probing
I started the project by flexing the data this way and that using pandas and the ipython notebook (both amazing tools you should get to know). This took a few passes. First I got it loaded into a DataFrame. Then I altered the structure to make it easier to understand, such as replacing coded names with full text. With that out of the way it was time to explore. The most helpful chart I made was this pareto chart which reveals the relative significance of various drivers in the data. Below is the code to generate the chart for any data series.
Using these pareto charts, plus a variety of histograms and scatter plots, I was able to provide the team with an initial window into the data which we used to identify an avenue that was worthy of further investigation.
With a more clear destination in mind, my goal became creating a visualization that would reveal the opportunity within the data. The tool for this is D3.js. D3 is a little bit confusing to get ones head around at first, but it is well worth figuring out because the things that you can do with it are amazing.
In our case, I wanted to let our team explore the impact of various interventions to curtail types of damage or to protect various parts of the containers. While the pareto chart (above) provides a insight about the cost of various damage types or container parts, it falls short when the two dimensions need to be considered together.
My solution is at this interactive visualization (view full size) . With it our team has been able to explore the data set without having to write more code. They are no longer dependent on me to “run the numbers”. And, it didn’t even take too long to make.
I highly recommend adding data analysis and visualization tools to your toolkit. They aren’t hard to learn and they are amazingly powerful.
At the bottom of this post are instructions if you want to do the same:
Big Data about people = stereotyping and prejudice
Recently in my work for LevelBusiness I’ve been learning about big data. It’s powerful, amazing and fun tech, and it’s all about stereotyping. As we all learned in primary school stereotyping, and it’s flip-side prejudice, are generally bad things that often lead to bad ends.
Big data involves boiling down vast data sets into actionable conclusions. Big data gets dicey when the topic at hand is people rather than things. The conclusion about people are of the form:
- persons a,b,c…. are likely to be pregnant.
- this set of people are often left-leaning (this TED talk on filter bubbles is worth watching).
- this people in this neighborhood claim more frequently on their insurance.
Then the actions are respectively:
- promote baby products to this group.
- only show certain search results to this group.
- decline to insure this group.
The critical but often overlooked point as that these grouping are always just probabilities based on the underlying data set. For example, Amazon infers your religion based on the wrapping paper you buy. They don’t know for sure that your are Christian, Jewish, Muslim or Sikh, but they think that they have enough evidence to make it worth their while to treat you as if their assumption is true. This is the definition of prejudice:
Prejudice (or foredeeming) is making a judgment or assumption about someone or something before having enough knowledge to be able to do so with guaranteed accuracy, or “judging a book by its cover”.
Prejudice + Hearsay = No Good
Every company needs to get to know its customers. But when every internet-using person is your customer you have to take care to be responsible about what conclusions you draw from your vast data sets and what actions you take based on them. Google may or may not be up to this task, I think this remains an open question. What bothers me though is the following clause in the new Google Terms and Conditions:
We have a good faith belief that access, use, preservation or disclosure of such information is reasonably necessary to (a) satisfy any applicable law, regulation, legal process or enforceable governmental request…
Adam Levin’s of the Huffington Post’s analysis clarifies the risk:
Hold on, Bucky.
What exactly constitutes an “enforceable governmental request?” This sentence should read: “We will share information with a Governmental entity only when presented with a valid search warrant issued by a court of competent jurisdiction.” Such a provision would make it obvious that by giving information to Google, you do not intend to waive your constitutional rights, and it would make it clear that despite the fact that your information was shared willingly with a private sector entity, you reasonably retained an expectation of privacy against Government intrusion.
In other words, Google is stereotyping you, and then not only are they acting on that prejudice but they are saying that if a government comes calling then they will happily share what they think they know about you. If you have even the slightest distrust of government, your own or any other in the world, then this should worry you.
I know, this data-driven stereotyping and prejudice is happening all over the place, but that does not mean it’s good or safe. And, it certainly doesn’t mean that you have to be a willing sheep in the process. That’s why I’m switching away from Google as my default search service. I don’t want to feed more of my data into their prejudice machine.
Here are instructions if you want to do the same.
How to make DuckDuckGo your Default
Instructions from http://seodesk.org/address-bar-awesome-hacks/
- Chrome: Right-click the Chrome Omnibox » in the last entry fill the search engine name and keyword and copy/paste the URL http://duckduckgo.com/?q=%s » click on ‘make default’. (Alternatively you may add DuckDuckGo with suggestions to your search engines’ list and make it your default engine).
- IE: Enter DuckDuckGo hompage » click on the left arrow and select DDG from the sub-menu ‘Add Search Provider’ » check ‘Make this my default search provider’ » click on ‘Add’. (As in the previous example you may set DDG with suggestion as your default engine).
- Firefox: Type ‘about:config’ in the awesome bar and press ‘enter’ » confirm the declaration » type ‘Keyword.URL’ in the filter box » copy and past the following URL https://duckduckgo.com/?q= » click ‘OK’ and close this tab or window.You can also install the DuckDuckGo search plugin here.
- Opera: Right-click the DDG search box » Select ‘Create Search’ » type your preferred keyword » check ‘Use as default’ » click ‘OK’. (Note: Adding DDG and suggestions is more complicated and described here).
- Safari: Enter DDG hompage » click on ‘Add to Safari’ » follow the instructions.
At Power of Two everyone on our team has a coach. The idea started because our core business is offering online coaching for couples in challenging relationships. We’ve applied the idea to ourselves and found it to be hugely valuable.
An effective coach answers questions that you may not have even thought to ask.
- Review your work product. The purpose of a coach is to advance your understanding beyond what you can do on your own. this only works if they have information beyond what you tell them. A coach should review developer’s code, designer’s designs, writer’s words, a customer developer’s iteration plans and results, etc. A person who gives advice without reviewing your work product is simply a mentor. Mentor’s are helpful, and good for one’s morale, but they are not a coach.
- Have deep respect. The amount your learn from your coach depends on how much expertise they bring to the table and whether or not you value the suggestions they make enough to act on their suggestions. If you don’t act on your coach’s advice then it’s all just a waist of time.
- Pay for the time. When you give your coach work to review you are asking to spend their time for your benefit. This relationship is much simpler and more likely to succeed if there is a balanced exchange of value.
- Ask stupid questions. Your coach works for you. They are there to help you with both complex and things that you might think are stupid. Often it is the questions that initially seem stupid that point to an gap in your knowledge base or skill set.
- Be a bit scared. Your coach’s job is to tear into your work, expose the weaknesses and then help you address them. This is ego-busting stuff. If you aren’t a bit scared about sending work to your coach then it’s time to find a new coach. At the same time, you should feel empowered after addressing the shortcomings that your coach has identified.
Engaging with a good coach will accelerate your learning curve and get you to the top of your game. As an essential tool for success, it is worth trying to ensure that everyone on your team has their own coach.
Django promotes the Model-View-Controller (some call it MVT for template in the Django world) pattern but encapsulates the pattern within each distinct apps. It’s a great approach because it means that reusable apps can deliver functionality at any or all of the MVC layers. But MVC applied to Django says nothing about how to structure the relationships between apps.
At Power of Two we’ve started to get hit by a lack of app-level structure as our site grows in complexity. With each new app comes a potential birds nest of dependencies which, if left unchecked, would reduce the agility of our site and businesss.
I brought this challenge up with Carl Meyer as part of an ongoing conversation we have about best practices in Django. Carl is a core developer of Django and a mantainer of pip. He has tremendous experience and deep insight into creating solid, maintainable web app in Django.
The questions that I posed was where to put functionality that spans across apps. For example, Power of Two has an activity stream, we send have a mailer, we have a reporting system and we have staff management pages. I asked when should an app import another app’s API to push data through to it it. Alternatively, when should we use signals to decouple apps from each other.
Here’s Carl’s clarifying response:
What you mostly need to keep in mind is your dependency graph. Draw out an actual import graph between your apps if it helps you visualize. Mentally classify your apps into “layers”: “core/utility” apps at the bottom bedrock layer, user-facing apps that import and use the utility/core apps above that… however many layers you have.
what you don’t want is an app in a lower layer importing and using an app from a higher layer. Unidirectional coupling is much preferable, maintenance wise, to bidirectional coupling.
It’s ok to have a utility app that almost every other app in your system imports and uses, but you don’t want your overall dependency graph to just be a mess with import dependencies in all directions and no structure to anything. Ideally your module dependency graph has no cycles in it.
Applying this advice to our code base we concluded that we have four layers.
- Foundation and utility apps – this layer includes almost all of the external installed apps that we use plus a number of self built tools for things like managing A/B tests. None of these modules import any modules from the higher levels.
- User facing apps – this layer is the bulk of our custom code. These apps often import the foundation and utility apps and try to minimize the imports of each other. In order to achieve this we’ve had to split apart some over-reaching apps to drop the utility functionality down a layer. In doing this we have the realized the bonus of more reusable utility modules.
- Staff facing apps – our internal staff have to oversee whats happening on our site. To achieve this we have built some admin-like apps which by definition import from all the user-facing apps. What’s helpful though is to keep a clear division between staff facing apps which do import user facing apps, and user facing apps which should not import each other.
- Reporting apps – our final layer are reporting tools. These are drop-on systems driven by signals or asynchronous events. None of the lower layers import these apps, so we are free to rapidly evolve them to meet the ever changing need for metrics. A future goal is to entirely decouple this layer, which basically means swapping synchronous signals for asynchronous ones, so that there is no risk of this layer introducing bugs that impact the user or staff facing apps.
After completing this refactor we took the final step of ordering our settings file so that INSTALLED_APPS is split into these layers. It’s just a small extra reminder to think about where an app fits into the larger scheme.
The conclusion here is that it is worth thinking about the principles behind the structure of a code base. When functionality gets splattered all over the place and dependencies become circular it makes maintaining and extending much more difficult. While the Model-View-Controller pattern helps within an app, I find it helpful to understand the underlying principles of well structured code so that I can apply them appropriately to each unique situation.
We use Geckoboard to gather and display our key metrics. It looks great and is easy enough to configure that everyone on our team can adjust the dashboards as they see fit.
The whole point of a Geckoboard is that it should be displayed, full screen, in a place where you glance at it every now and again. I finally came up with the solution for achieving this result despite the fact that our small team sits in four cities and on two continents. Set the dashboard as a screensaver.
Using a web page as a screen saver is easy to do on Linux and Windows. On a Mac you can use Websaver. I recommend enabling a hot corner (or equivalent on a non-Mac) so that you can instantly pull up the metrics.
If you find this helpful please tweet it around!