Measuring and evaluating Scrum in complex environments

(you can read this article in Español in here)

Measure too much in a Scrum environment, and people will focus on data instead of improving their habits. When I visit a company, I always try to measure a couple of variables as a good starting point for a change. It helps people secure some important alignment, make their ideas more mature, and keep goals clear. The metrics also ensure that individuals get more insight into their day-to-day jobs.

Measuring the state of Scrum at the beginning of each engagement is a necessary step. I always clarify with clients that measurements are good, as long as they are an excuse to have good conversations but not to undermine people´s work. I recommend to all Agile professionals and consultants that they collect some initial metrics before jumping into a solution.

Complex environments require different metrics

You can find on the Internet many different approaches to measure Scrum Teams, but most of them are focused on the dynamics and how close to the Scrum framework they are. Making software is a complex process that requires more than just optimizing the Scrum Team. Additionally, some companies are complex environments. As Mel Conway said, “Organizations which design systems . . . are constrained to produce designs which are copies of the communication structures of these organizations.” It means that the more complex the enterprise is, the more complicated their products will become. For me, a complex environment includes (among other things):

  • Places where employees do not feel safe
  • Companies where there is high pressure and technical debt and/or teams do not take ownership of the code
  • Enterprises where there are different definitions of business value
  • Teams with a mix of contractors and employees and lack of code excellence
  • Organizations with a lot of bureaucracy and/or rules
  • Places with a huge number of people working on a product that has no proper vision or road map, or the road map is not realistic or changes frequently

I have been asked several times to help companies improve their software teams. Unfortunately, enhancing just one part of the “system” does not produce a real improvement for the whole enterprise or even the product. If you want to get better at what you do, you need to improve the whole value network.

Optimizing the whole system, not just the Scrum Teams

Just in case you have never heard of this concept, value network refers to all the associated activities that will materialize your service or produce a specific good. The value network process flow starts when a client has a great idea and ends when he gets it realized. In theory, as in practice, you generally want to get feedback after your service is delivered, so that it can extend even longer.

I have many theories about why companies don’t really understand the concept of optimizing the whole system instead of only the Scrum Teams. One of the main reasons is the old idea that every enterprise is a software company. It would be easy to assume here that optimizing only the IT department would lead to great outcomes. The second reason is related to the idea that Agile is only for software teams. For many years, I have seen management pushing ceremonies, practices, values, and metrics onto teams without themselves being able to follow any of them when required.

If you focus on measuring and improving only IT teams instead of the whole value network, you will end up refining one area of your enterprise but degrading the others; this is what we call a local optimization. This obviously creates a much more expensive product that will not solve any real problem.

Measuring the Scrum Teams in complex organizations

In complex companies with complicated products, if you want to create a sustainable ecosystem, it is important to initially consider four actions:

  1. Optimize the whole value network instead of improving only your Scrum Teams.
  2. Reduce organizational complexity (bureaucracy and structures) across the company.
  3. Have the right leadership style in place that constantly reinforces a clear message.
  4. Make sure that Scrum operates in a healthy environment with high visibility, clear rules, and high-quality outcomes.

Based on these actions, I have created a way to measure Scrum Teams in complex organizations with complicated products by using the following eight indices:

  1. Scrum values and principles in action
  2. Use of Scrum artifacts
  3. Scrum ceremonies
  4. Technical excellence
  5. Simplicity in the software codebase
  6. Product owner skills and capability
  7. Business value delivered
  8. Lack of cultural debt and simplicity in processes

Each of these indexes can help you address some specific problems I generally see in companies.

The whole assessment includes 44 questions and takes approximately 10 minutes per team. You can download the Questions.pdf document listed at the end of this article, with the list of questions, and the Categories.pdf document, which is organized by categories. I generally run an informal interview to get the answers. Some of the questions in the Questions.pdf document can be deduced by the context during the interview, while others need to be explicitly asked.

Once you have all the answers, a weighted system is used to evaluate the questionnaire and show the results in a chart.

Weighted system

Most options as answers can score between -3 and 3. A negative value indicates an action or behavior(s) that are counterproductive or would harm Scrum, the organization, or outcomes. The highest value (+3) indicates great behaviors and discipline.


Example of a weighted system

Some options can have a minor maximum score, which means a minor impact on the system. Let’s take look at a couple of examples so that you understand how the weighting works.


It is expected that a retrospective is held on the last day of the sprint, so any other option will score -3.

The following question uses a range and asks how empowered is the product owner in making decisions. The last two options will get the highest score, whereas the first few options will get a negative value.


Let’s look at one more example. In the following, we try to find out how long it takes from the time someone asks for specific training until he acquires the skills. In complex organizations with dozens of rules, this kind of request generally takes a long time.


As you can see, each question can be easily weighted as long as you have an answer. I recommend that you remove the values when you interview the teams.

How the indexes work

I mentioned earlier that there are eight different indexes, each containing four to seven related questions. For example, the “Product Owner Skills and Capability” contains seven questions related to the product owner role, habits, and good practices. After completing all the answers, you will get a chart something like this:


I filled the data manually until my colleague Su Young created a great spread sheet, Automated_Scrum_Chart.xlsx, that automates all the work for you. You need only to download it, and place the numbers in the right boxes!

The great thing about a radar chart is that it can easily show you the progress in an area and whether any other area is affecting the first.


Su Young Kim and the author, preparing for a team’s interview

Let me show you how it works, even though the spreadsheet will calculate it for you. Imagine that you asked all the questions and got the following scores for one index:

Product Owner Skills and Capability
(Highest score for this question = 21 points)

  • Question 1: +3 (max. +3)
  • Question 2: -3 (max. +3)
  • Question 3: +1 (max. +3)
  • Question 4: +2 (max. +3)
  • Question 5: +2 (max. +3)
  • Question 6: +2 (max. +3)
  • Question 7: +2 (max. +3)

Total: +9

The point range of each area on the radar chart is from 1 to 10, with 10 being a great score and 1 a poor one. Because the total value for this group is 21 and you got 9 points (42%), the index is 4.2 out of 10.

Let’s see another example:

Scrum Values and Principles
(Highest score for this question = 11 points)

  • Question 1: +3 (max. +3)
  • Question 2: 0 (multiple choice +3 and +1)
  • Question 3: +3 (max. +3)
  • Question 4: +1 (max. +1)

Total: +7

In this case, the result is 7 out of 11, which is 63%. You would mark this area on the chart as 6.3 out of 10.

Imagine that behaviors in the index are mostly detrimental, and you get a score below zero. On this chart, you would be able to clearly see the outcome because there is a red line indicating the zero value. Keep in mind that the forces of the negative parts generally counteract the benefits of the good practices.

As you can see, the eight indexes connect different areas that need to be addressed to produce great outcomes. These also help people understand and create a strategy to connect Scrum, skills, complexity, delivery, behaviors, and code excellence.




If you want to know more tactics and technics, why don’t you check my book Leading Exponential Change?

ebuhler-exponential-3D-book-promo-img (1)

Thanks for listening,

Leave a Reply