Colin’s ALM Corner

Ingredients for scaling GitHub Copilot

2024-01-11T01:22:01+00:00

Speed for the Individual
Ingredients for Scaling
Conclusion

Photo by Tim van der Kuip on Unsplash

I work with a number of enterprises with large development communities: 5,000 - 25,000 developers. Managing DevSecOps at this scale is challenging, and keeping up with the pace on innovation in today’s AI-eaten world only adds complexity. While most organizations have dipped their toes into the generative AI waters, many are struggling to realize broad organizational benefits.

Speed for the Individual

Most customers I work with would agree (even if only intuitively) that GitHub Copilot is a productivity booster for developers. However, executives are often skeptical when they see numbers such as developers who use GitHub Copilot complete tasks ~55% faster than developers without GitHub Copilot. This is not just vaporware from GitHub - customers are also reporting large productivity gains:

Duolingo is seeing a 25% increase in developer speed with GitHub Copilot
Coyote Logistics is reporting 50% decrease in time to write Terraform config files
Marcado Libre reports a 50% reduction in time spent writing code with GitHub Copilot

These are just a few examples - there are more stories reporting similar numbers.

We may argue over exactly how much improvement developers get (and for what use-cases) - but there is enough evidence to assert that GitHub Copilot makes individuals faster. But how do we scale this individual productivity gain to the organization?

Ingredients for Scaling

There are some patterns that we see when analyzing customers that are successful:

Executive mandate
Systematic approach
Allowing time for developers to learn how to code with GitHub Copilot
Super simple onboarding
Establishment of Communities of Practice and identification of Champions
Tying GitHub Copilot to initiatives
Pragmatic measurement and measuring the right things

Executive Mandate

It is imperative the there is an executive mandate to use GitHub Copilot. Given the evidence of how effective GitHub Copilot is, executives should be tasking their teams with using GitHub Copilot and learning how to benefit from it - if for nothing else than to stay ahead of competitors!

Systematic Approach

“Just turn it on” is not a good rollout strategy. This applies to any tool - not just GitHub Copilot. Organizations must consider how they are going to scale out. Team-by-team is a common strategy. Other strategies include “lighthouse teams first” or “language by language” or some other means of starting small and expanding out. Starting with developers that are hungry for GitHub Copilot is crucial - these folks are more likely to spend the time it takes to become good at GitHub Copilot, iron out networking challenges and other onboarding road bumps. Once these teams have gained some experience, they become key to scaling out GitHub Copilot skills to other developers and teams.

Allowing Time

GitHub Copilot can feel magical - but it is certainly not infallible. It takes time for developers to learn how to craft prompts so that they are useful, what the limits of GitHub Copilot are and how to change the way they code to fit GitHub Copilot in. This would be the same for developers adopting Test Driven Development (TDD) or eXtreme Programming (XP) - new ways of coding take time to learn and to adapt to.

Many developers try a couple of (not so good) prompts and conclude that GitHub Copilot “isn’t that useful.” However, if given time and examples, most developers start to learn how to better craft prompts and to learn the boundaries of GitHub Copilot’s capabilities. Giving up too soon (or piloting too quickly) will prevent successful scale out.

Super Simple Onboarding

GitHub Copilot seats are “pay as you use”. This is different to GitHub Enterprise or GitHub Advanced Security licenses that are purchased up-front. This gives customers much more flexibility in how and when seats are assigned. While giving every developer access to GitHub Copilot from day 1 may be easy, it is not optimal. If customers are not going to give everyone access, they have to think about how they are going to manage how and when developers get seats. Making this process super simple is key. A few of my customers require developers to fill out a form in their internal ticketing system which in turn calls an API to allocate a GitHub Copilot seat without the need for an approval. They have effectively made seat allocation self-serve.

Along with self-serve onboarding, customers must create a centralized knowledge base with onboarding docs, starter docs and demos. Many enterprises have proxies or other networking and firewall rules that prevent GitHub Copilot from working out of the box. Having documentation about how to configure proxies and how to authenticate GitHub Copilot is very important. Along with that, some docs that show how to get started (sample prompts, sample use-cases etc.) and demo videos are critical for success.

Establishment of Communities of Practice and identification of Champions

GitHub Copilot is a tool that requires continuous investment - since it is an art as well as a science, developers need to continue to develop their prompt crafting skills. Additionally, GitHub Copilot is continuously improving and new features are being added frequently. The best way to support a skill that needs continuous investment is a Community of Practice (COP) (or Center of Excellence or Guild or whatever you call this cross-cutting construct within your organization). This CoP needs to meet frequently and continuously evangelize tips and tricks and wins to keep momentum high.

Along with the CoPs, scaling requires identifying Champions - these are super-users, tech leaders and influencers within your organization’s development community. These folks need to be recognized and empowered to become GitHub Copilot Advocates internally. The more of these you build, the faster you will scale. The Champions are going to be those that are excited about GitHub Copilot, but also those that make the most GitHub Copilot requests (and have the highest acceptance rate). Identifying Champions by language is also helpful.

While GitHub does provide expert services and there are many GitHub Partners that can assist organizations to scale out GitHub Copilot, organizations must develop their own internal competency and programs in order to create sustainability.

Tying GitHub Copilot to initiatives

Most developers don’t use tools for the sake of tools - they tend to look for the best tool for the job. Most organizations have existing initiatives for their development teams - improving velocity, app modernization, reducing technical or security debt and increasing test coverage are examples. When developers have something to tie learning GitHub Copilot to they are more willing to invest time and effort. This is going to accelerate and widen adoption.

Pragmatic measurement and measuring the right things

Along with tying GitHub Copilot to initiatives comes pragmatic measurement, as well as measuring the right things. If you tie GitHub Copilot to an initiative to improve test coverage, then you probably won’t (initially) see an improvement in velocity. Being pragmatic (and targeted) with your metrics will lead to faster realization of value - not simply because of gamification, but because you will be measuring the right things. Improving maturity in what is measured (and how those measurements are interpreted and applied) is a requirement for success at scale.

Conclusion

Enterprises must be systematic about their approach to GitHub Copilot. Enterprise that don’t invest in mastering AI-assisted pair programming and generative AI in DevSecOps are going to fall behind. While these tools are novel today, they are rapidly becoming table stakes. Enterprises must be intentional about this technology - just as they should be intentional about adopting any technology. By applying the ingredients I’ve outlined above, enterprises can confidently scale GitHub Copilot - and realize organizational improvement faster and more sustainably.

Measuring the impact of Developer Experience and GitHub Copilot

2023-09-11T01:22:01+00:00

Leading and lagging indicators
Applying indicators to developer productivity
Measuring the right things
Assessing the value of GitHub Copilot
The metrics challenge
Perceptual vs workflow metrics
Conclusion

Photo by Andreas Klassen on Unsplash

GitHub Copilot is radically transforming the software industry and highlighting the importance of Developer Experience (DevEx) as a key enabler to business success.

GitHub has published studies showing that developers are 55% faster with GitHub Copilot than without. Customers using GitHub Copilot are reporting numbers inline with those studies: Mercado Libre reports a 50% reduction in time spent writing code, and Duolingo is seeing a 25% increase in developer speed.

Accurately measuring the return on investment (ROI) for DevEx in dollar terms is nuanced, difficult and complex. For Copilot, it is more difficult. This is despite the fact that there have been many attempts to measure productivity - from counting lines of code to logging hours that developers spend in their IDEs to measuring velocity. Many of these methods are insufficient or subject to gamification.

GitHub Copilot is really a productivity tool. Productivity is so inextricably intertwined with DevEx, they can be spoken of synonymously. Any attempt to measure the value of GitHub Copilot must be tied to measuring DevEx in general.

DORA Metrics have long been used to measure DevOps: lead time, deployment frequency, mean time to recovery and change failure rate. When coupled with flow metrics as defined by Daniel S. Vicanti in Actionable Agile Metrics for Predictability - cycle time, work in progress, throughput and work item age - organizations have a powerful set of metrics that can track how well they produce software. The SPACE framework is an excellent framework for understanding developer productivity.

Why is it so hard to measure developer productivity? Firstly, it’s hard to define DevEx. There are many different opinions about what developer productivity is. Furthermore, both perceptual (qualitative) as well as workflow (quantitative) metrics should be considered. Measuring developer satisfaction is just as important as measuring how fast they work: happy developers are productive developers, since they spend more time coding and shipping great products, and are more likely to stay with your company. DevEx is multi-dimensional, so no single metric is going to tell the whole story.

You can read a much more detailed analysis of perceptual and workflow metrics and the dimensions of DevEx in this paper.

To fully understand how to measure developer productivity, we have to understand how leading and lagging indicators work. Let’s unpack these concepts.

Leading and lagging indicators

Leading indicators are measures of inputs into a system. They help us to predict how the system will perform in the future. Typically, these are fairly easy to measure and can be influenced in a short period.

A good example of a leading indicator for a development team is the count of work items on the backlog, or committed to a sprint. This is easy to measure (just check the backlog) and easy to influence - we can immediately remove (or add) items committed to a sprint.

Lagging indicators are measures of the outputs of a system. They help us understand how something happened in the system in the past: they are retrospective in nature. Typically, these require longer time periods to measure. Lagging indicators are also the result of aggregated leading indicators, so you can’t directly affect them.

A good example of a lagging indicator for a development team is how many items are delivered in a sprint. Measuring this requires us to wait until the end of the sprint, so it takes a while to measure. This count can’t be directly changed - you can try to add more committed items in the next sprint, but that may result in more bottlenecks or contention for testing environments or any number of other issues that don’t actually increase the number of items completed.

Applying indicators to developer productivity

Let’s apply these concepts to the problem of measuring developer productivity. Remember developer productivity isn’t an end in itself - it’s a means to an end. To what end? Ultimately, it’s to make our business successful! A business may have productive teams and not do well in the market. So what are we trying to achieve? And how would we know that we’ve been successful?

We may want to ask questions like:

How can we develop faster?
How can we reduce risk?
How can we improve quality?
How can we innovate more?

But how would we measure those? How would we know we’ve been successful? We could measure some of these:

Cycle times - how fast can we complete work?
Frequency of deployments - how frequently can we deploy?
Bugs - how many do we have in a release?
Vulnerabilities - how many do we have in a release?
How many code reviews do we do (and how fast do we do them)?
How much burnout do we see?
How easy is it to attract (and retain) talented people?
How much are we innovating vs maintaining?

These metrics give you insight into how well your team is performing - but even these must be analyzed in the context of the business. Are you attracting and retaining more customers? How delighted are your customers with your products and services? How competitive are you in your market? Delivering faster won’t help the business if you’re delivering the wrong things.

Let’s imagine that we measure the number of bugs in a release. Release A had 3 bugs, and Release B had 5 bugs. This tells us that there is a problem somewhere, since the number of bugs increased. But what? This is where we see the challenge of metrics - how do we interpret what happened? Perhaps we added a lot of code and didn’t add enough tests. Perhaps our senior developers were too busy to do proper code reviews, so they missed some bad code. Perhaps a developer was burning out and just pushed code without taking care to test it properly. Multiple inputs may have affected an output that we’re not happy with.

Measuring the right things

What does this mean for measuring developer productivity and the value of GitHub Copilot? Measuring lines of code that Copilot produced or how many prompts were accepted are leading indicators that should have an impact on lagging indicators down the line. In other words, the immediate improvement (which is easier to measure) will result in affecting the future impact (which is harder to measure). However, the dollar value impact (ROI), is typically tied to the lagging indicators.

What does that mean? Here’s the critical concept: measuring flow and other life cycle metrics is the best way to measure the dollar value of GitHub Copilot. This is the challenge to organizations: to mature in tracking these metrics so that they can really see the impact of developer productivity on business outcomes.

There is a caveat here: GitHub Copilot is a tool meant primarily to increase individual productivity at the task level. While making developers faster at task completion will certainly impact team performance metrics like cycle times, task completion is not the only factor affecting team performance. For example, team performance involves synchronization (code review must be scheduled into the reviewer’s calendar), meetings, design sessions and many other processes and ceremonies.

Assessing the value of GitHub Copilot

The hypothesis is that by utilizing GitHub Copilot we can affect leading indicators like speed of coding, quality of code, test coverage and speed of code review. Improving these indicators will affect the lagging indicators like velocity and deployment frequency, quality, mean time to resolution (MTTR) and risk.

Unsurprisingly, the lagging indicators are typical DevSecOps metrics! These typically require longer periods of time to measure. Furthermore, when they change, it’s not always easy to analyze why they changed.

If you look at the above list, you’ll see that the leading indicators are fairly easy to affect, and don’t require long time periods to measure. For a sprint (typically 2 - 4 weeks) we can easily measure how many items we delivered, or how many bugs we found or how long code reviews took. If we found few bugs and completed code reviews quickly, that should allow us to deploy more frequently. We can also improve these measures directly. For example, if we want to improve code review times, we can add automated quality gates that need to pass before code review. This can help ensure that code has higher quality by the time a reviewer opens it, leading to faster review times.

To tie this back to GitHub Copilot - if you really want to measure its impact on the team, you have to look beyond how many suggestions were accepted (a leading indicator) and measure lagging indicators. If you use GitHub Copilot, you should see improvements in the following areas:

More frequent deployments/reduced cycle times Developers are spending less time hand-coding boilerplate code and searching for answers outside the IDE and so can complete tasks faster. GitHub Copilot is generating unit tests and documentation - all tedious, labor-intensive tasks that GitHub Copilot can do in milliseconds. This will lead to improved cycle times - and improved DevEx.
Fewer build failures Developers can use Copilot Chat to explain code, meaning they can understand code more deeply. They can understand the impact of changes more clearly, and should lead to better code. As GitHub Copilot generates unit tests, buggy code is fixed before it’s even pushed to the repository. Copilot Chat can help developers debug and fix problems as the code is being written. When coupled with branch protection rules, status checks, and custom deployment rules, this should all translate into fewer build failures.
Improved code quality and higher test coverage GitHub Copilot can be used to generate test cases and test data faster, which should lead to more code coverage, which in turn will improve quality.
Faster code review times Since GitHub Copilot is like having a second developer with you all the time, developers can generate good code, understand existing code, debug code and generate tests for code all before the code review. This means that by the time the code reaches review, it’s higher quality, which should reduce the time needed to review it. Reviewers can use Copilot Chat to understand the impact of a proposed change by asking it to “explain this code”.
Fewer security vulnerabilities and improved MTTR Copilot Chat is an excellent way to scale AppSec since it can guide developers in fixing security vulnerabilities without the need to involve a security professional. Furthermore, with AI filters on code suggestions, it is less likely to generate code suggestions with security vulnerabilities. This means that MTTR should improve and risk should be lowered. Recent research suggests developers intend to spend their new found time in code review and vulnerability remediation.
Better flow metrics Cycle times should be improved, and Work in Progress (WIP) should be lowered. When developers are faster at their tasks, they work on fewer things at the same time, reducing the overhead of context switching, allowing them to spend more time “in the zone” as well as reducing cognitive load. Furthermore, work item age should decrease (since work items will be completed faster). All of this works to improve throughput.
Accelerated developer growth The Collaborative Software Process study shows that pair programming speeds development, improves quality and improves developer experience. GitHub Copilot allows every developer to have a pair programmer, even when remote. Furthermore, Copilot Chat acts like a “just in time” coach that can help developers grow their expertise.
Better talent acquisition and retention Happy developers are typically productive developers, but the corollary holds too: productive developers are typically happy developers. This has the dual benefit of attracting talent (developers love to work for high performing teams) as well as being good for the business, since developer churn costs in time and lost “tribal knowledge”. Furthermore, because of the improvements in quality and speed, developers will spend less time burning out, which is good for both talent acquisition and retention.

The metrics challenge

The challenge with these metrics is that they take time to measure. And many organizations don’t even have a baseline for some of these metrics. If organizations are going to be able to show the value of GitHub Copilot and improved DevEx, they are going to have to get to grips with these DevSecOps metrics, such as those from DORA, ActionableAgile and SPACE.

To further complicate things, many of these metrics have interdependencies. Optimizing one part of the development life cycle may highlight bottlenecks and inefficiencies in other parts of the development life cycle that could prevent the lagging indicators from improving. For example, let’s say that you give your developers GitHub Copilot and they start coding faster and completing tasks faster. Now you have more code reviews than before - and you could end up overwhelming senior developers that perform the code reviews, and they become a bottleneck that prevents you from deploying more frequently. So we see that the lagging indicators are related to an aggregation of the leading indicators, and we must take this into account when doing any analysis.

You cannot get Copilot Chat to help you fix a vulnerability if you can’t find the vulnerability, so you need good Application Security (AppSec) tools. You cannot attain more frequent deployments by improving developer speed alone - you have to invest in automation to build, test, scan, package and deploy your code. Improving cycle times won’t help if you’re not truly transforming the software delivery life cycle to be agile. And team performance improvements require streamlining processes and removing red tape, not just making individuals faster.

Perceptual vs workflow metrics

Most of the above discussion has focused on workflow (system) metrics. Even if the effect of these is understood, organizations must not forget the value of perceptual metrics. These are informed by how developers feel about GitHub Copilot and DevEx in general. Just as leading indicators interact with each other in complex ways to affect lagging indicators, perceptual metrics play an important role in DevEx. Any program to measure DevEx and the value of GitHub Copilot must include perceptual metrics such as how developers feel about the development process and their tools. More perceptual metrics are defined in this paper.

Perceptual metrics are best measured by surveys and self-assessments. They must be carefully designed to take into account bias and avoid survey fatigue. Organizations without expertise in these areas should consider outsourcing this kind of study to experienced partners.

Once the perceptual metrics have been obtained, organizations should analyze the perceptual and workflow metrics together with business key performance indicators (KPIs) in order to attain a clear, accurate picture of DevEx and value.

Conclusion

When organizations look at the return on investment (ROI) for investing in DevEx (including deploying GitHub Copilot) multiple dimensions must be considered. Analyzing which metrics will be impacted by improvements is a complex activity with many nuances. Organizations should start to analyze both input (leading) and output (lagging) metrics so that they can develop a fuller understanding of how productive their developers are individually, as well as how productive teams are. Ultimately the goal of such measurement is to help improve productivity and DevEx to accelerate achieving business outcomes.

What can organizations do today to improve developer productivity? First, start by asking developers what their view of DevEx and productivity is. Then start measuring both input and output metrics as defined above with a view to discovering where to most effectively invest to improve.

Mission Control - and what it means for DevSecOps

2023-06-12T01:22:01+00:00

Roots of Process Debt
Army Mission Control
Applying Mission Control to DevSecOps
Conclusion

Photo by Filip Andrejevic on Unsplash

Today’s markets move fast. Organizations that don’t keep pace are being left behind. DevSecOps is fairly easy to grasp conceptually, but is not easily implemented. Most organizations that struggle to implement DevSecOps effectively are hampered not by tooling, but by old ways of thinking.

DevSecOps requires a cultural shift - as well as a platform to support this shift. A reminder of Donovan Brown’s definition of DevOps is warranted:

DevOps is the union of people, process and products to enable continuous delivery of value to our end users.

We used to say it this way when I was a DevOps consultant:

You can’t but DevOps in a box.

There is no “silver bullet” or product that will “make you DevOps”. Finding the right tools and platforms is important, but culture is more so. Many teams talk about “technical debt” but I don’t hear a lot of teams talk about process debt.

Roots of Process Debt

There are probably many roots of process debt, but I think that many of them come from the Waterfall mindset. In Waterfall, the idea was to work out all the possible scenarios and outcomes up-front so that we could minimize risk. Ironically, this extreme “analysis paralysis” almost always led to building the wrong things which was the exact thing it was trying to prevent!

A second factor was the desire to find economies of scale. For example, it was common to have database administrators (DBAs) and security professionals that took care of all the database and security work respectively. “Developers don’t know how to optimize database work, so we’ll centralize that work to let the developers code faster.” Again, the irony is that DBAs became a bottleneck. The same is true of security teams - the desire to “offload” security from App Teams ends up slowing teams down!

As I was thinking about process debt, I came across a philosophy from the US Department of the Army called Mission Control that seemed to offer some insights into how to build a good DevSecOps culture.

Army Mission Control

Mission Control is the Army’s approach to command and control that empowers subordinate decision-making and decentralized execution appropriate to the situation.

In war, events are too chaotic and communication too fragmentary to rely on centralized control. Commanders need to rely on the innovation and decisive action of subordinates to meet intent in a complex operating environment. Sounds like this applies to DevSecOps, doesn’t it?

The seven principles of Mission Command are:

Competence - developed continually through training and self-development of soldiers
Mutual trust - shared confidence between soldiers and commanders that they can be relied upon and are competent to perform assigned tasks
Shared understanding - creating common language and culture and clear visions and values
Commander’s intent - commanders must clearly communicate intent to everyone, articulating purpose and desired end state
Mission orders - describing the situation, commander’s intent, desired results and required tasks, without specifying how tasks are to be accomplished
Disciplined initiative - whether the benefits of a localized decision outweigh the risk of desynchronizing the overall operation, and whether the action further’s the commander’s intent
Risk acceptance - commanders must assess risk to mission while mitigating risks with control measures, trusting that their intent has been relayed and subordinate decisions will be made based on that intent

Applying Mission Control to DevSecOps

We can apply these principles to our thinking about culture for DevSecOps.

Competence

Investing in people and their skills is a critical part of a successful DevSecOps culture. Developers need to be empowered to learn about new technologies, stacks and trends. Similarly, cross-functional teams need to have training available for the breadth of their responsibilities. These responsibilities go beyond just coding and include testing, automating, monitoring, security, hyper-scale, infrastructure as code, cloud operations, live-site culture and more. By investing in training and opportunities for learning, companies build competence.

Mutual trust

Trust is critical - but it must be mutual. App Dev teams must trust that their commanders (executives) are investing in them, and executives must trust their teams to do the right thing. This trust is earned and built over time, and can only be built on a culture that values innovation and won’t punish people for initiative.

Shared understanding

Executives must clearly communicate the vision and values of the organization so that it is well understood by everyone. Organizations should also spend time thinking about a common language as well as communication lines and types (see the three key Interaction Modes from Team Topologies). Organizations that are clear about how they communicate can take advantage of the homomorphic force of Conway’s Law to ensure that their architectures and culture are aligned, rather than opposing.

Commander’s intent

Beyond the values and vision, executives must clearly communicate purpose and desired end state. Clearly articulating what success looks like and what the key objectives are at the executive level keeps everyone aligned.

In my previous post I spoke about the balance between Team autonomy and Enterprise alignment. When executives are crystal clear on the purpose of an organization as well as desired end state, this gives teams strong enterprise alignment. Strong enterprise alignment at the strategic level promotes a culture where Teams feel empowered to innovate within the boundaries that the organization really cares about.

Mission orders

This is where most organizations get it wrong - mission orders are about distilling the commander’s intent, in language built from Shared understanding, and specifying what needs to be done, not how it should be done.

This requires the vision, purpose and values of the organization to be clearly understood. It requires good shared understanding, but it is also built on mutual trust. Will leaders trust that their teams have the competency to do what needs to be done? Do developers trust that the leaders are investing in them?

This also ties back well to Enterprise alignment - which emphasizes a core set of non-negotiables (the values) and lets teams innovate within these parameters to meet the Commander’s intent (Team autonomy).

Disciplined initiative

If organizations have clear Mission orders, know the Commander’s intent, and what they are tasked to achieve, then they can innovate to fulfil the goals. Rather than specifying how they should do things, which signals a lack of trust, executives show trust by letting teams apply initiative. This benefits the teams (they gain mastery and autonomy) and the company, since the company is now building a culture of innovation. This then feeds to better trust, which leads to more autonomy - and the virtuous cycle continues.

Team autonomy is what is being expressed here - within the boundaries of clear, concise Enterprise alignment.

Risk acceptance

This is a tough one for most organizations. However, if the other principles are in place, then this becomes the natural progression. Organizations that have a low-trust environments tend to be highly risk-averse.

This is not to say that risks should not be evaluated, weighed and mitigates when appropriate. However, teams that default to zero risk also stifle innovation and experimentation. When organizations build competent teams in a high-trust environment, are clear about their purpose and vision, then they can accept the risk of letting teams fail. If teams are never allowed to fail, they will never innovate. Once again, settling on a small core of non-negotiables (Enterprise alignment) and then giving teams room (Team autonomy) to innovate, experiment and (at times) fail, shows trust.

Conclusion

The principles of the Army’s Mission Control philosophy apply well to the culture of DevSecOps. Organizations that want to succeed need to develop a culture that builds mutual trust and empowers innovation, rather than stifling it.

Happy missioning!

Who needs GitHub Copilot?

2023-06-12T01:22:01+00:00

1. You prefer writing code in Notepad
2. You like writing boilerplate code
3. You know every regex expression.
4. You know every API
5. You like copying and pasting from StackOverflow.
6. You don’t need unit tests.
7. Comments? What for?
8. You’d rather leak IP by pasting code into ChatGPT.
Conclusion

Photo by Aideal Hwa on Unsplash

Generative AI, Copilot, blah blah blah - who needs it? You’re the Ultimate Programmer, so why would you want some pretentious “large” language model helping you? You’re a lone wolf that don’t need nobody (or no thing) to “help” you - it will only get in the way of your staggering intellect.

Well this post is just for you - top reasons why do DON’T need GitHub Copilot.

1. You prefer writing code in Notepad

GitHub Copilot only supports four IDEs: Visual Studio Code, Visual Studio, IntelliJ and NeoVim. But you prefer to code in Notepad. Or vim. Or emacs. All those plugins and breakpoints and live debugging - it’s overrated. You can debug in your head just by looking at the perfect code you wrote. And you can quit vim whenever you want to.

So what if Copilot integrates seamlessly and fades into the background as you code?

2. You like writing boilerplate code

Constructors. Getters. Setters. Who needs to think about business problems when you can write real code. I mean, you learned how to do it when you did your Intro to Programming course, so you want to make sure you get your money’s worth.

So what if Copilot is really good at writing repetitive, boilerplate code, thereby keeping you focused on solving business problems?

3. You know every regex expression.

Only losers need to test their regex expressions using regex101. You don’t need Copilot’s help to validate obscure string formats - you just do it in your head.

So what if Copilot can generate regex and easily dump out obscure formats and formulas so that you don’t have to remember them or search for them?

4. You know every API

Who needs to look up how to invoke common APIs? Once you’ve seen a Swagger doc you can call any and every method in that API forever.

So what if Copilot knows how to call APIs that millions of developers use daily?

5. You like copying and pasting from StackOverflow.

Speaking of searching for stuff - you love StackOverflow! What’s better than googling a question and then inevitably landing on StackOverflow where there are a bunch of random answers that may or may not be correct that you can copy from? And who doesn’t love renaming all the variables and fixing all the formatting errors (tabs vs spaces anyone)? Not that you need to search for stuff anyway - your infallible memory is a giant library of endless code examples to draw from.

So what if Copilot can get answers for you without you having to leave the IDE… er, file… and follows your naming conventions and styles?

6. You don’t need unit tests.

Unit tests - that assumes your code could be wrong. And why spend time programming code that tries to break the code you just coded so perfectly? If you did write unit tests, they would be the ultimate tests.

So what if Copilot can quickly generate tests, mocks and find multiple test cases just by analyzing the code you’re testing?

7. Comments? What for?

You don’t have to document your code. Your code is so perfect that people can tell what it’s doing just by seeing your code. Besides, no-one else will ever need to look at your code unless it’s to learn how to program perfectly. Da Vinci didn’t have to “comment” the Mona Lisa, did he?

So what if GitHub Copilot can generate code based on your comments, and that the comments stay to help document your code?

8. You’d rather leak IP by pasting code into ChatGPT.

You’re not like those Samsung developers that leaked sensitive information while copying code into ChatGPT, right? I mean, you’d never be asking an AI for help anyway.

So what if Copilot encrypts data and uses an HTTPS and (at least for Copilot for Business) never keeps any of the data or solutions?

Conclusion

This GitHub Copilot thing is totally overrated. It’s not going to change the way you work or the perfection of the code that you crank out as you consume coffee and cold pizza. No way.

Happy (not) Copiloting!

Team Autonomy vs Enterprise Alignment

2023-06-07T01:22:01+00:00

Team Autonomy vs Enterprise Alignment
1. Team Autonomy
2. Enterprise Alignment
Tying in to DevSecOps
Considering Builds
Considering AppSec
DevSecOps At Scale
Conclusion

Image by Matteo Vistocco on Unsplash

I work for GitHub - so naturally I have a lot of conversations about tooling and products. However, let’s take a step back and remember Donovan Brown’s seminal definition of DevOps:

DevOps is the union of people, process and products to enable continuous delivery of value to our end users.

You’ve also probably heard Peter Drucker’s quote:

Culture eats strategy for breakfast.

Culture is the people and product part of the DevOps equation, and are arguably more important than the product or platform your teams are working with.

That’s all well and good in a theoretical, high-level way. But how do we apply these principles in practice?

Team Autonomy vs Enterprise Alignment

Many years ago, I heard Aaron Bjork and Buck Hodges from the Azure DevOps team talk about how Microsoft transformed their teams from a 2-year delivery cycle to a 3-week delivery cycle. This excellent video by my late friend and colleague Able Wang talks about this transformation and I highly recommend it.

One concept has always stood out to me when Microsoft spoke about this transformation: team autonomy vs enterprise alignment. You can imagine these as two ends of a spectrum, with total team autonomy on one side and complete enterprise alignment on the other side.

To visualize these extreme ends of the spectrum, picture 300 rowboats vs the Titanic:

the 300 rowboats can each turn very quickly
each rowboat can travel fast or slow, according to how well the rowers gel together
getting all 300 rowboats pointed in the same direction is a challenge
communicating to all 300 rowboats is a challenge
the Titanic only has a single direction
the Titanic takes a long time to change direction
communication on the Titanic is easier

Most organizations fall somewhere on the spectrum between team autonomy and enterprise alignment, and various points along the spectrum have advantages and disadvantages.

Team Autonomy

Team autonomy means that teams are able to make decisions without filling in forms and logging tickets. To make this practical for software development, it means allowing teams to decide which programming languages and stacks they want to work with, what IDEs they want to use, and how they will build, test, scan, deploy and monitor their apps.

Enterprise Alignment

Enterprise alignment is the vision and goal of the company and how that is worked out day-to-day. It defines how individuals and teams communicate, what their standards are, and what future direction is. It also defines the “non-negotiables”.

In practice, successful organizations have a small, well-defined “core” of Enterprise Alignment, and then allow teams to have a large level of autonomy. Enterprise alignment defines the what and let’s teams define the minutia of the how.

Tying in to DevSecOps

How does this tie into DevSecOps? Many organizations I work with have a centralized, command and control model. In other words, they lie much closer to the Enterprise Alignment side of the spectrum. Let’s look at two examples: builds and security. We’ll analyze each on both extremes: enterprise alignment and team autonomy.

Considering Builds

Extreme Enterprise Alignment

Many organizations have a “DevOps team”. I really despise this language, since it makes DevOps the responsibility of some other team - after all, if I’m not on the DevOps team, then why should I care about DevOps? I think what most organization mean is that they have a team that is responsible for build and deployment automation.

The idea behind this team is to enable developers to code, and not have to worry about how to package, test, scan and deploy their apps. This leads to app developers not caring about operational issues, not building sufficient telemetry into their apps, not caring about security or scale or performance. After all, that all falls onto the “DevOps” team.

The supposed value-add is that there is a standardized build, test and deploy process, controlled by the DevOps team.

Extreme Team Autonomy

When there is no enterprise alignment, it can look and feel like the wild west. Teams are deploying whenever and however they want, there is little or no code sharing and there are a plethora of tools since each team is using its own preferred tools and stacks.

While this allows agility in the “local” this ends up being a blocker at the “global” level. Teams optimizing for themselves end up being blocked by other teams (or blocking other teams) since there is no set contract for sharing code or apps and no set way to communicate.

Well Balanced

A more balanced approach would be to have a small, well defined set of goals at the enterprise level that can guide teams and set a few non-negotiables. Thereafter, teams should be free to innovate within those boundaries.

How would we do this with builds? One way would be to standardize on a single build platform (say, GitHub) and then require teams to test, secure and monitor their own apps. This can be achieved by setting up branch policies to ensure that teams place these gates into their processes and making developers responsible for run-time operations of their apps. How teams test can be left up to them, as long as they test. If teams don’t want to add telemetry, they are going to have a hard time running apps in production - so they will likely end up adding telemetry to make operations easier.

Considering AppSec

Extreme Enterprise Alignment

Most organizations I work with have a Cyber security team. These teams are typically involved late in the development lifecycle and are the official gate-keepers to “going to prod”. The idea is that this centralized team is the enterprise alignment for securing applications.

There are many problems with this extreme - poor developer experience, slowing release cycles and friction. When you add that security engineering skills are rare (1 security pro for every 800 developers is the current industry measure) then you get the additional problem that this does not scale.

The value-add for this would be a central place where security and risk are surfaced and managed. Unfortunately, the bottleneck and friction this model creates negates any benefits.

Extreme Team Autonomy

On the other extreme, teams are not bound to any security standards at all, leading to risk for the company. If teams are scanning their code, dependencies and secrets, they’re using disparate tools and processes and it is nearly impossible to manage risk at scale.

Well Balanced

How can we balance these requirements - centralized risk management and good developer experience? We standardize on a single platform/tool and mandate that teams scan their code and dependencies and scan for secrets. We can enforce branch protection rules to ensure that these scans complete before deployment. These are the non-negotiables.

We then let teams figure out how to treat remediation in their backlogs. We may have to set some sort of SLA on remediation. As long as we have visibility into which teams are in compliance, we can let the teams decide when/how to remediate. This gives the teams autonomy within some good boundaries.

DevSecOps At Scale

There is no effective way to scale DevSecOps if your culture is either too centralized (enterprise alignment) or decentralized (team autonomy). Organizations must find a good set of non-negotiables and then extend trust to the team for everything else.

For this to work, however, you must have a platform that can support this culture. I believe that GitHub is the platform for this. Here are a few recommendations that will allow you to scale DevSecOps:

Treat the PR as the center of quality and security

Enabling branch protection for your main branch forces teams to use Pull Requests (PRs) to flow code changes to your stable code.
Require peer code review for your PRs. This ensures that you get more eyes onto code changes, and encourages teams to work in smaller batches (there’s nothing worse that doing a code review for a large number of changes).
Require passing builds that include unit tests. This ensures that code at least compiles and that it passes some level of unit testing. Code that can’t pass these basic gates should not be deployed to production!
Require code scanning (SAST). This ensures that security issues for your code are picked up early and fixed immediately. This also removes the burden on the (scarce) security professionals in your organization.
Require dependency scanning and Dependency Review. This ensures that you are not introducing vulnerable dependencies with your code changes.

Enable secret scanning and push protection

There are too many breaches because of secrets checked into source control. Turning this on to remediate existing secrets (get clean) and turning on push protection (stay clean) dramatically reduces this risk.
The ease of switching this on at the org level should not be underestimated. There are no IDE plugins to configure or build steps to configure - it’s just switching a button. There is no other secret scanning tool that can be scaled as easily.

Treat security vulnerabilities as “work”

This removes the “scare” factor from security issues.
This lets teams prioritize remediation along with other feature requests. Teams look at bugs and determine if they need to be fixed immediately or not - they should treat vulnerabilities in the same manner.

Let teams build/test/package/scan/deploy their apps

A centralized build team may work at a small scale, but at larger scales (> 50 devs) this can become a bottleneck.
Reuse small jobs rather than large pipelines. Large, generic pipelines that try to deploy every app become unwieldy and fragile. Rather create small reusable jobs that are like Lego bricks to encapsulate common parts of a workflow, and let teams compose these in their own pipelines. This gives a good balance of reusability without bloating.

Manage by exception

“Trust, but verify.” Assume that teams will do the right thing, and then check for cases where they do not. For example, monitor bypasses of push protection. If a team does this repeatedly, it could be an indication that they are doing something wrong. This is better than “hard gating” and blocking developers.
Teams must own their apps - and that includes failing. If you can fail fast, then you can recover fast too. Once teams see that good quality makes their lives better, they will be more motivated to produce quality code without the need for heavy handed processes! This means that you should be prepared for them to fail from time to time - and to trust them to recover quickly.

Conclusion

Scaling DevSecOps effectively requires organizations to think about their culture. Finding a good spot on the spectrum of Team Autonomy and Enterprise Alignment is critical to success. Organizations must find a small set of core non-negotiables and give teams choice for everything else. The GitHub platform enables organizations to configure these “non-negotiables” in a transparent way, allowing teams to move quickly without compromising quality and security.

Happy scaling!

Spicy Takes 🌶️🌶️🌶️ on RSA 2023

2023-05-01T01:22:01+00:00

Spicy Takes
Conclusion

Image by Pickled Stardust on Unsplash

I was at RSA last week in San Francisco. The highlight of the week was a talk by Shannon Lietz, who I met briefly at GitHub HQ during the week. More on this later.

I visited expo area and I had great conversations with GitHub customers as well as GitHub technology and services partners. I was looking for overall trends and trying to get a pulse on the industry - and coming from a developer background, the security world is both fascinating and foreign to me!

Spicy Takes

There are a couple of key themes that I took away from the week, and I present them here in order of spiciness:

🌶️ Culture eats application security for breakfast
🌶️🌶️ Organizations that don’t invest in developers are not serious about security
🌶️🌶️🌶️ Security tools are a dime a dozen

🌶️ Culture eats application security for breakfast

I am used to the phrase “culture eats tooling for breakfast” in the context of DevOps. You can have the most amazing tools, but if you have a dysfunctional culture, tools will not help you succeed. Many of the conversations I had this week presented echoes of this sentiment, but in the context of security. So it is easy to turn the phrase into culture eats application security for breakfast.

But what does this really mean? I was struck by how little emphasis was placed on culture as a foundation and pillar for application security. A culture that isolates and separates developers and security professionals will struggle to be effective at AppSec.

Conway’s Law teaches us that the communication structures of organizations is invariably reflected in the application architectures of those organizations. It’s no surprise when we look at the popularity of n-tier applications in the late 90’s and early 2000’s - these mirror the top-down, hierarchical management structures that were prevalent in those times. As Agile gained popularity and management changed to smaller, more autonomous teams, we saw the proliferation of microservices.

This is why we must consider the impact of how our developers and security teams communicate and collaborate if we want to succeed at AppSec. We cannot get away from Conway’s Law. If we continue to bolt security teams onto developer teams late in the development life cycle as a mess of bureaucratic red tape, then AppSec will be continue to fail.

You’ve heard the mantra “shift left”, and today no self-respecting security pro worth their salt won’t talk about this concept. But simply deploying another tool in an automated build has limited efficacy - we must “shift the culture left”.

Teams with good tools and bad culture are less effective than teams with good culture and bad tools. Ultimately, we need to progress to teams that have both good culture and good tools.

🌶️🌶️ Developers, developers, developers!

Following on from the culture discussion above, we have to pivot to the key to effective AppSec: the developer. Changing culture is going to require renewed investment in developers as well as a shift in the roles and responsibilities of security professionals.

One highlight of the week was the DevOps Connect talk by Shannon Lietz. I particularly remember her saying, “To be effective in security, we must translate security into developer.”

I realize that I was at a security conference, but I realized that there are very few companies today that are looking to solve security by investing in developers. And I will go even further: companies that do not look to solve AppSec by investing in developers are doomed to fail at AppSec.

AppSec is a fascinating intersection between developers and security professionals. These two groups typically speak different languages and have different lenses through which they view the world. This is why I resonated with Shannon’s statement - companies that fail to translate security into language, processes and tooling that developers understand are not serious about security. And as part of that, they must transform how security professionals work too!

Security professionals as Enabling Teams

Team Topologies does a great job in creating language around how to design teams within an organization. This is another area that suffers a severe lack of investment - companies don’t typically think about how they design their teams or how their teams communicate. Without going into the four types of Teams, at a high level, developers should be Stream Aligned Teams and security professionals should become Enabling Teams.

In short, the security teams should work on enabling developers to write secure code, fix vulnerabilities and become the first line for security. If your security professionals are doing all the security work, they will always be a bottleneck. Organizations can scale AppSec and scarce security skills by taking this approach. This is what I think true “shift culture left” means in the context of AppSec.

🌶️🌶️🌶️ Security tools are a dime a dozen

Most of the vendors at the expo seemed cookie-cutter, using oft-repeated catch phrases (like the ubiquitous “shift left” and “go faster”) but didn’t seem to bring anything new or fresh to AppSec.

There are some critical dimensions that companies must consider when evaluating and rolling out security tools:

Developer productivity
Reduced friction
Visibility
Scalability

I was disappointed to see that very few tools in the AppSec space addressed these dimensions. Slapping another tool into the mix isn’t going to be effective - you must address these dimensions.

Developer productivity

Moving fast isn’t just about new features: your security response velocity will never be faster than your developer velocity. It’s simple to illustrate this point: let’s say that your commit-to-production lead time is 3 days; in that case it stands to reason that your time to remediate cannot be faster than 3 days. Speed is a critical component of staying secure.

Reduced friction

Another great quote from Shannon Lietz is: “Developers don’t talk about security tools unless they make security folks go away.” When I was a developer, security were the people that blocked your deployments. Security tools only slowed me down. It wasn’t until I saw a developer-focused security tool that I realized that security doesn’t have to be a blocker! Shannon’s sentiment is spot-on.

Developers are smart - and hate process when it adds no value to what they do. They tend to find workarounds for any process that introduces more friction. Therefore, any tool that adds friction is doomed to fail. Tools must reduce friction for developers to be successful.

Visibility

One of my customers has a security tool that performs formal method analysis. They use this tool heavily - but they cannot struggle to collate results and see status over multiple projects. They can switch to the tool UI, but this adds to friction. The lack of visibility in the developer workflow is limiting the effectiveness of this tool.

Another part of visibility is metrics. Most teams will talk about Mean Time to Detect (MTTD) and Mean Time to Repair (MTTR), but do not define these or track them effectively. There doesn’t seem to be a consensus on what AppSec metrics are the most important or how to track them.

Scalability

The industry standard ratio for security professionals to developers is 1:800. This is why shifting your security professionals to enabling teams (see above) is so critical - it is the only way to scale AppSec effectively. But you will struggle to do this if your security tools cannot support this shift.

GitHub Advanced Security

I often ask the question, “Why do you think GitHub got into AppSec at all?” The answer is fairly simple: even though security tools and practices have been around for two decades, AppSec is still failing. And the major reason is that it is not developer centric. GitHub is uniquely positioned to bring security to developers in a way that reduces friction, empowers developers and scales AppSec teams. Since it is the heart of the developer workflow, this is a powerful way to really shift both tooling and culture left.

Conclusion

We still have a lot of work to do. AppSec in the industry isn’t as successful as it should be, and organizations must consider both tools and culture in combination in order to improve. Organizations must invest in developers, shift security pros to enabling teams and ensure that they deploy tools that support these shifts instead of hinder them. I was again reminded of how fortunate I am to be at GitHub, where we are moving AppSec forward.

Happy securing!

Using GitHub Copilot Effectively

2023-04-17T01:22:01+00:00

How Autopilot works on Commercial Flights
GitHub Copilot
Conclusion

Image by Rayyu Maldives on Unsplash

GitHub Copilot is aptly named. While some have feared that generative models will replace developers, I do not believe we are there yet: Copilot is an assistant, not a replacement. However, developers will need to adjust their skills, both to stay effective as well as to stay marketable through the disruption that the AI age is bringing.

I have a friend who is a commercial airline pilot, and asked him about how autopilot works on commercial airplanes. I think the analogy of how autopilot works is useful in framing how developers should approach learning how to use GitHub Copilot.

How Autopilot works on Commercial Flights

Most of us have flown in a commercial airplane. We all know that there are two human pilots in the cockpit, and we even know that they engage autopilot to fly the aircraft. However, even though we are all comfortable with the idea of planes flying themselves, we would be a little nervous if there were no humans in the cockpit before we take off!

Here is my understanding of how autopilot works during a commercial flights:

The pilot taxis the plane and takes off - the autopilot cannot take off automatically
Once the plane reaches around 15,000 ft altitude, the pilot engages the autopilot system. Some pilots will fly manually until they are at cruising altitude.
Once engaged, the autopilot is programmed to fly the plane along the current flight plan. The autopilot can navigate through bad weather and turbulence.
The pilots man the radios and watch for weather and wind conditions. At times, pilots will tell the autopilot to fly around weather, or change altitude to get better wind conditions.
The autopilot lands the plane.
The pilot takes over to taxi the plane.

Note: Even if the above is not 100% correct, it’s good enough to make an analogy for GitHub Copilot! Errors and omissions are my own.

GitHub Copilot

Understanding how autopilot works, we can make a useful analogy when we consider GitHub Copilot:

Developers must “take off” since Copilot can’t take off by itself (context)
Developers can use Copilot “mid-stream” but will need to make adjustments for “turbulence” (work in small chunks)
Developers must “man the radio” to monitor the code that they are writing with Copilot (good DevSecOps))
Copilot can “land the plane” but getting to the final destination is up to the developer (remember to solve the right problems)
Quality control is beyond the purview of Copilot

Let’s dig into these a little deeper.

Taking off - providing context

Just as autopilot can’t take off automatically, in the same way a blank project or file isn’t a good way to get going with Copilot. Even before that, developers need a “flight plan” - some idea of what they are going to be coding. Spending a little time to analyze requirements and think about how code is going to be written, tested, scanned, packaged and deployed will go a long way to better productivity and efficiency.

When using Copilot, you get the best results when supplying good context - think of this as the flight plan. Context is the file that you’re currently editing as well as other tabs open in the solution. If you have other files already, open a couple of them to assist Copilot. Open test files to help Copilot with tests and examples of how your methods are being called.

Where none of this exists, take time to think about what you want the code to do and write the intent in comments at the top of the file. The more context you supply, the better your results will be.

I love doing the Advent of Code in December - and using Copilot while solving the puzzles has been great. However, I think one of the main advantages of using Copilot was that it subtly changed how I develop: rather than simply diving into code, I take a few moments to think about how I can best prompt Copilot to give me what I want. This makes a big difference and I found myself spending more time thinking and less time thrashing code - which is a more fulfilling experience as well as a more productive way to code!

Prompt engineering is a phrase that is being bandied about - I think there is something to this. Successful engineers will be those that can successfully guide AI to do the right thing.

Cruising Altitude - working in small chunks

Once you have a little bit of code, you’re at “cruising altitude”. This is where Copilot feels like magic - there is enough context for it to generate the code that you were thinking of. Keep working in small chunks (like inside a method body or inside a loop). The narrowed context produces far better results.

Remember, Copilot is a probability engine and there is some level of randomness inherent in how it works (this is true of all large language models). Broad, vague requests (low context) tend to produce results that show much more randomness (and less meaning and utility). Narrowing the context reduces the compounding effect of the randomness and is more likely to produce meaningful code.

Man the Radio - fast feedback

While you’re having fun coding with Copilot at your side, don’t forget to “man the radio”. Remember, code in an of itself isn’t the goal - solving business problems is! Moving faster isn’t an end - it’s a means to an end.

Why do we want developers to be more productive and efficient? The value of going faster is that we get feedback faster. The faster we get feedback from our end-users, the faster we’re able to adjust course. Scrum and Agile didn’t succeed because of daily stand-ups and retros - Agile succeeded because it focused on flow and shortening the feedback loop. Copilot, by making developers more productive, is wasted unless you’re shortening the feedback loop. Listen to the feedback from end-users, and adjust accordingly. This will give Copilot purpose and value beyond just developer happiness.

Land the Plane - Good DevSecOps

Landing the plane is crucial - after all, if your plane doesn’t land, you can’t get to your destination! Again, the landing of the plane is a means to an end - you have to get off the plane to reach your destination!

Copilot will help you land, but you’ll have to taxi in yourself. Copilot is designed to speed the “inner loop” of development - but you’ll have to make sure you have an efficient “outer loop” too - peer code review, build automation, linting, unit and integration testing, scanning and automated deployment are critical if you’re going to get the most out of Copilot.

Having said that, some Copilot X features are bringing AI to the “outer loop” such as Copilot for PRs, which can suggest missing test cases for code changes in a PR.

Autopilot is only for flying

Copilot allows developers to move faster - which means you need to match that speed when it comes to quality gates and deployment - otherwise you’ll get an impedence mismatch, which if you know your electronics, is a Bad Thing. Copilot, by making developers faster, requires your quality gates and processes to be faster.

The autopilot on planes do not check the fuel levels or the ailerons or do any of the preflight checks itself - quality control is still up to the pilots and ground crews. Copilot is not meant to do everything for you - it’s meant to augment your developers and make them faster. You must have good DevSecOps practices in place to maximize your usage of Copilot.

Conclusion

GitHub Copilot is a powerful tool, but to get the most out of it developers should understand how to feed it context, work in small chunks, and ensure the rest of the DevSecOps pipeline is running smoothly.

Happy co-piloting!

Allowing Bypass of Secret Scanning Push Detections is a Good Thing

2023-03-06T01:22:01+00:00

Secret Scanning Locations
Allowing bypassing is a good idea
Effective management of bypasses
Alerts are still created after bypassing push protection
Management by exception
Conclusion

Image by Tim Hüfner on Unsplash

GitHub Advanced Security includes secret scanning. While there are other secret scanning solutions in the market such as TruffleHog, no SaaS solution can offer push protection.

Secret Scanning Locations

Secret Scanning could be implemented in 3 locations:

The local developer environment - either in the IDE or in the CLI
In a build after commits are pushed
At the time of the push

Let’s examine the pros and cons of each of these approaches.

Local Environment

Performing secret detection in the local environment only works as long as developers remember to run the tool. And if their favorite IDE doesn’t support the tool, it’s unlikely that they’ll run it. Furthermore, even if developers remembered to run these detections every time before they pushed, how would organizations manage custom secret patterns or other configurations? Centralized configuration is essential for managing security at scale - so organizations can’t just think of the scanning, they have to think about how they would manage custom configurations too.

Post-push in a build

If the local environment is too heterogenous and relies too much on the developer, then surely adding a scanning tool in the build makes sense. That way, teams can guarantee that the scan is being performed and could manage configuration using reusable workflows.

However, this is too late in the life cycle - the secret has to be the repo for the build to perform the scan. While this option adds more consistency, it cannot prevent the secret from getting to the repo in the first place.

At push time

The best place to scan for secrets is at the moment of the push. Teams could do this using pre-receive hooks on GitHub Enterprise Server. This would allow teams to run some validation on the push and allow or block it - say, if it contained a secret. Unfortunately, GitHub Enterprise Cloud does not support pre-receive hooks (yet).

However, GitHub Advanced Security does include the option to enable push protection. This prevents pushes if secrets are detected.

This push protection feature is unique in the market for several reasons. Some tools have some of the features listed below, but only secret scanning push protection in GitHub Advanced Security has all of the following:

It is embedded into the repos and can be enabled instantly at enterprise, org or repo level
It does not require build customization or IDE plugins or anything else - it simply works
It allows admins to create custom patterns that are managed centrally
It allows admins to perform dry-runs of their custom patterns so that they can refine them before they roll them out, preventing noise and loss of developer trust
Alerts trigger webhooks for additional automation and alerts are also visible in the audit log

However, it is important to note that push protections can be bypassed. But why? Wouldn’t you want to hard-block any detected secrets?

Allowing bypassing is a good idea

This seems counter-intuitive. However, let’s think about why this actually makes more sense that preventing bypasses.

False positives

There are rare cases when secret scanning will detect what it thinks is a secret - but it’s not in fact a secret. In these cases, a bypass is crucial since you need to get the code into the repo. This becomes even more critical as admins roll out custom patterns, especially for “generic” secrets (like database connection strings) which have no governing pattern (unlike tokens which tend to have much more predictable patterns). The less predictable a pattern is, the more noisy (more false positives) it is going to generate.

Maintaining trust with developers

Whenever there is a gate, control or roadblock in the development life cycle, there must be some real value in the gate. Too many controls are vestiges of old processes or created by people who are no longer at the company, but are not challenged. This leads to friction and causes developers to lose trust in the security teams (or IT teams) and vice versa. Developers will also start to lose trust in the platform.

Totally preventing bypasses of push detections is effectively a statement that you do not trust your developers. Most developers are not malicious and secrets in pushes will most commonly be mistakes: a dev is testing and puts a credential to a test platform or database in their configuration file, only to forget to remove it before pushing. In this case, the push protection helps remind the dev that they have a secret that should not be committed to the repo. So allowing bypasses for false positives while preventing accidental leaks is a good combination.

Workarounds

Let’s imagine a scenario where push protections can never be bypassed. Developers who experience false positives will be frustrated since they have no way around the incorrect detection. This may lead them to become creative and find workarounds.

For example, developers could simply base64 encode the secret. This results in a high-entropy string. High entropy strings could be added to push detection, but by nature will produce a lot of noise (lots of false positives). So in all likelihood, these base64 encoded strings would end up being pushed to the repo. This is a leak, since you can simply base64 decode the string to get to the secret.

Or a developer may take a credential and split it in half, and simply concatenate the halves at run time. Again, an extremely difficult scenario to detect, but easy for a human to exfiltrate.

In short, workarounds make detection harder, and so increase risk.

Note: I have heard stories from customers who have created their own secret scanning tools that cannot by bypassed. The results were disastrous, and the tool is either turned off or bypasses have been allowed.

Effective management of bypasses

This doesn’t mean that allowing bypasses is insecure! With some simple steps, organizations can implement effective controls for bypasses, allowing them to retain customer trust as well as prevent secrets from leaking.

There are two primary methods to track bypasses of push protections:

The secret_scanning_alert webhook which is fired every time a protection is bypassed (the push_protection_bypassed property is set to true)
The secret_scanning_push_protection category of audit logs

You can use either of these to send automated emails or notify admins when bypasses occur. This allows you to maintain visibility without losing developer trust, since the bypass can be inspected and, if valid for cases like false positives, ignored. For cases where the bypass was not valid, admins can have conversations with the developer who bypassed the protection.

Alerts are still created after bypassing push protection

Furthermore, even if a secret is bypassed during a push, GitHub will create a secret scanning alert, enabling admins to manage the bypassed secret appropriately. For example, automated token revocation can be enabled so that when secrets are detected in the repo post-push, automation can revoke the secret immediately for known token formats, or admins can be notified to check the bypass.

Management by exception

This allows organization to “manage by exception” rather than “throttle by prevention”. Ultimately this is a cultural problem and not really a technical problem. Organizations that demonstrate a “trust but verify” culture using the management techniques above will generally foster better developer experience and arguably end up being more secure than companies that promote a low-trust, hard gate.

Let’s all remember to be good humans. Developers should sympathize with the IT and security teams - leaked credentials are a serious matter that could have large and far reaching negative consequences to companies. Developers need to be careful and thoughtful about preventing leaks. IT and security teams should in turn sympathize with developers, who are constantly under pressure to deliver more, faster - so anything that adds friction is going to be counterproductive. They should be careful and thoughtful of how they can partner with, rather than fight against, developers.

Conclusion

Using GitHub Advanced Security secret scanning push protection is the best way for teams to effectively reduce the risk of credential leaks. While users can bypass push protections, there are valid reasons for this, and bypasses can be managed to ensure they are valid, while invalid bypasses can be mitigated quickly.

Happy push protecting!

Fine Tuning CodeQL Scans using Query Filters

2022-08-30T01:22:01+00:00

Query Organization
Why filter?
Standard Selectors
Filtering by Security Severity
1. Security Severity Levels
Query Filters
Precision
Widening the Filter
Testing the Configurations
1. Adding Debug to the init Action
2. Executing the Scans
Conclusion

Image by Mauro Gigli on Unsplash

CodeQL scanning involves four phases:

Initialize - where an empty database is created and hooks are configured into the compiler for compiled languages
Build - where the database is populated from the code-base
Query - where queries are executed against the database - results are output to a SARIF file
Upload - where the SARIF file is uploaded to the GitHub repo

Note: The default analyze Action will query and upload in a single step.

In the initialize phase, you specify which of the supported languages you want to analyze. You can also (optionally) specify the set of queries you want to run.

Query Organization

Queries are the lowest level artifact in CodeQL scans. These are T-SQL like in syntax (with from, where and select clauses), but also have very powerful abstractions like predicate, class and override.

Queries are typically grouped into suites. CodeQL packs can contain queries and suites. Additionally, you can filter queries - which we’ll get to shortly!

Before we move on, one more concept we need to understand is that queries have metadata associated with them. The metadata are more than just a way to describe the query - they are also critical for filtering.

Let’s look at the metadata from a query in the CodeQL repo to examine some of the metadata:

/**
 * @name Exposure of private information
 * @description If private information is written to an external location, it may be accessible by
 *              unauthorized persons.
 * @kind path-problem
 * @problem.severity error
 * @security-severity 6.5
 * @precision high
 * @id cs/exposure-of-sensitive-information
 * @tags security
 *       external/cwe/cwe-359
 */

A typical CodeQL metadata example.

We’ll use some of these metadata properties to filter - notably the kind, security-severity, precision and tags.

Why filter?

If you do not specify a suite in the CodeQL Action, then you’ll get a default set of queries for the language you’re scanning. However, the default set is a subset of all the queries. There are some queries that have higher or lower severity or different levels of “precision” (we’ll discuss what that is later). Rather than give you all the queries, the default setting filters out some queries. This file contains the default set of filters.

The default set of queries is called the code-scanning suite. Each language has a .qls (query suite) file that specifies the list of queries and applies the code-scanning-selectors.yml selector. For example, this file is the default code scanning suite for csharp.

You can also customize the query suite by specifying other “standard” selectors: either security-extended or security-and-quality, which change the filter criteria by adding in additional queries that are excluded in the default selection.

Let’s examine a couple of selectors and how they are specified, and then a couple of use-cases where we use selectors to specify a different set of queries to execute during the Analyse phase.

Standard Selectors

If you look at the includes from the standard selectors you’ll see that security-extended-selectors.yml selects queries that contain the security tag:

- description: Selectors for selecting the security-extended queries for a language
- include:
    kind:
    - problem
    - path-problem
    precision:
    - high
    - very-high
    tags contain:
    - security
    ...

Selectors in the security-extended-selectors.yml file.

By contrast, the security-and-quality-selectors.yml file does not filter by that tag:

- description: Selectors for selecting the security-extended queries for a language
- include:
    kind:
    - problem
    - path-problem
    precision:
    - high
    - very-high
    ...

Selectors in the security-and-quality-selectors.yml file.

This means that the security-extended suite will only include queries that have security in their tags metadata, while the security-and-quality suite will include additional queries that do not contain this tag.

However, we can also filter on other properties - such as kind, security-severity or precision.

Filtering by Security Severity

Last week I heard of a company using CodeQL that were hitting upper limits on the upload size of the SARIF file. They are scanning a large mono-repo and are getting a large number of results in the scan. Arguably, there are other issues at play here, but the team did not want to refactor their build or their codebase.

In this case, neither of the default suites works. Perhaps we need to focus just on the most critical alerts first - so we are going to want to filter by security-severity.

Security Severity Levels

When you see a CodeQL alert, it is marked with low, medium, high or critical severity:

CodeQL Alerts showing security severity.

However, if you look at the query metadata, these levels don’t appear. That’s because there is a table that shows how GitHub calculates the level based on the security-severity number:

Severity	Score Range
None	0.0
Low	0.1 - 3.9
Medium	4.0 - 6.9
High	7.0 - 8.9
Critical	9.0 - 10.0

The mapping of severity to security-severity score.

So how do we filter on security level?

Query Filters

You can filter queries using query filters in a configuration file. Then you just point the init action to the config file, and you’re done! I’ll use code from this repo for the examples.

Here’s and example of an init action that specifies a custom config:

# file: '.github/workflows/codeql-high-severity.yml'
- name: Initialize CodeQL
  uses: github/codeql-action/init@v2
  with:
    languages: csharp
    config-file: ./.github/codeql/high-severity.yml

Specifying a custom config file for CodeQL.

Let’s then look at the custom config file:

# file: '.github/codeql/high-severity.yml'
name: "Custom CodeQL Config for high/very high severity only"
disable-default-queries: true
queries:
  - uses: security-extended
query-filters:
  - include:
      precision:
      - high
      - very-high
      tags contain: security
      security-severity: /([7-9]|10)\.(\d)+/

A custom configuration to only include queries with security-severity >= 7.

Notes:

First we specify a name.
We then disable the default queries.
We bring in the default security-extended queries.
We then apply a query-filter
The filter selects only queries that have a high or very-high precision and a security tag.
Finally, we use regex to include only queries that contain a numeric value >= 7

Precision

Before we go on, what exactly is precision? This is a measure of how many false positives are likely to be returned by the query. Queries with higher precision will return fewer false positives, while queries with lower precision tend to yield more false positives.

When security professionals are analyzing code-bases or writing queries, they may want to dial down precision. However, teams that want to make security remediation actionable should default to higher precision queries. The default setting for the out-the-box suites is high and very-high precision to ensure very few false positives.

Note: Who decides on the precision? While the CodeQL repo is open-source and accepts community contributions, it is maintained by GitHub. Queries are rigorously tested and vetted, so the precision metadata is accurate.

Widening the Filter

The filter above narrowed the number of queries that will be executed in the analysis phase. But we can go the other way too! Here’s a snippet from the configuration for a set of lower precision queries that teams can use if they understand that they are going to get more false positives with this setting:

# file: '.github/codeql/high-severity.yml'
name: "Custom CodeQL Config for lower precision"
disable-default-queries: true
queries:
  - uses: security-extended
  - uses: security-and-quality
query-filters:
- include:
    kind:
    - problem
    - path-problem
    - alert
    - path-alert
    precision:
    - low
    - medium
    - high
    - very-high
    tags contain:
    - security
    - correctness
    - maintainability
    - readability
- include:
    kind:
    - problem
    - path-problem
    precision:
    - medium
    problem.severity:
    - error
    - warning
    - recommendation
    tags contain:
    - security
...

A custom configuration to include more queries.

Notes:

First we specify a name.
We then disable the default queries.
We bring in the both the default security-extended and security-and-quality queries.
We then apply a couple of query-filters
The first filter includes every type of kind, precision and tag
The next filter includes queries with a security tag and all types of problem.severity (different from security-severity).
The remainder of the file is the same as the default selectors from the CodeQL repo

Testing the Configurations

We can compare and contrast three scenarios:

Name	Description	Branch	Actions File	Config file
Default	A default scan (no custom config)	`main`	`.github/workflows/codeql-analysis.yml`	None
High Severity	A high-severity config to only include high and critical security queries	`high-severity`	`.github/workflows/codeql-high-severity.yml`	`.github/codeql/high-severity.yml`
Low Precision	A “low-precision” config to include more queries with lower precision and severity	`low-precision`	`.github/workflows/codeql-low-precision.yml`	`.github/codeql/low-precision.yml`

Three scenarios for CodeQL configuration.

The code on all 3 branches is identical - the only reason I created them was for filtering the results in the Security tab.

Adding Debug to the `init` Action

For the purposes of our exploration, I wanted to be able to analyze the SARIF results file after each scan run. To do this, I just added debug: true to the init action just below the config-file. This will zip up the scanning database and the results file as artifacts that can be downloaded - I am really only interested in the results file since we can compare results, but also because the results file includes the list of the queries that are executed during a scan!

Executing the Scans

I’ve added a workflow_dispatch trigger to the workflow files - so you have to navigate to the Actions tab of the repo and queue a run. After queueing a run for each scenario (and selecting the corresponding branch) I downloaded the SARIF results files for comparison.

To count the number of results in the SARIF, I crafted a quick jq query:


cat default-results.sarif | jq '.runs[0].results | length'

We can also figure out the count of queries. The language for this repo is csharp so we look for the codeql/csharp-queries tools extension in the file for the list of all the queries (rules) that were included in the analysis:

cat default-results.sarif | jq '.runs[0].tool.extensions[] | select(.name == "codeql/csharp-queries") | .rules | length'

When we do the comparison, we get the following results:

Scenario	Rule Count	Result Count
Default	47	6
High Severity	35	5
Low Precision	159	74

The result and rule count for each scan.

We can also see the counts in the Code Scanning tab in the repo. Just change the branch filter to see the different result counts:

CodeQL Alert counts for each scenario.

Conclusion

CodeQL is incredibly powerful - but there are times when you want to fine-tune the set of queries for analysis. Using Query Filters we can easily tweak exactly what we want to scan.

Happy scanning!

Shift Left - How far is too far?

2022-08-04T01:22:01+00:00

How Far Left is Too Far?
The sweet spot
Conclusion

Image by Nick Fewings on Unsplash

I have a developer background, so App Security (AppSec) was always anathema to me. However, I had an epiphany about GitHub Advanced Security and how it is unique in it’s approach - it is security for developers. I wrote some thoughts about that in a previous post.

GitHub Advanced Security (GHAS) allows you to reduce risk without impeding velocity. This is a big deal in today’s fast-paced world. The way that GHAS does this is by centering AppSec on the developer, while still meeting requirements of security professionals. Integrating AppSec into the developers’ daily workflow with very low friction is the secret to securing your software effectively.

GHAS centers itself around the repo and the Pull Request. I have had a number of customers ask why GHAS does not have an IDE plugin. If shifting left is the Holy Grail of AppSec, and GHAS is built to be developer-centric, then why isn’t GHAS in the IDE? Isn’t that the furthest left we can shift?

Or would that be too far left?

How Far Left is Too Far?

Let’s take a moment to consider where in the life cycle various GHAS features work:

Feature	Phase
Secret Scanning	After pushes to the repo. If you have Push Protection enabled, secrets are scanned before the push.
Dependency Scanning (SCA)	After pushes to the repo and in PRs via Dependency Review.
Code Scanning (CodeQL)	During builds and surfaced in PRs.

It seems that Push Protection is the only feature that occurs before a push to the repo. Dependency scanning and code scanning are centered around the repo or PR. Why is the PR the center of GHAS, rather than the IDE? Wouldn’t it be even faster if the IDE could surface vulnerable dependencies and vulnerable code before developers push changes to the repo?

IDEs

Developers can be picky about their IDEs. While many modern IDEs are extensible, there is no standard for IDE extensibility. This means that any policy enforcement at the IDE is near impossible, since you’d have to implement that policy for all IDEs. You could mandate a single IDE, but that doesn’t always work.

Additionally, there’s no simple way to force developers to turn certain tools and plugins on in the IDE. Any process that relies on the IDE is relying on the developer to remember to turn on the tool. And what about shared configuration? Relying on configuration files may work - but many IDEs store preferences on the workstation in personal folders rather than in repos, so sharing common config can also be a challenge.

IDEs are great for “simple” analysis - linters that enforce coding standards work really well in IDEs, assuming you can effectively share the linting rules. Most linters are built this way, storing configuration dotfiles alongside the code. Most linters are fast because they typically require very little compute, so running them in the IDE doesn’t distract the developer.

However, most security analysis tools (worth their salt) tend to require heavier compute and take longer to scan because of the more complex problem domain. Putting code scanning into an IDE becomes a resource hog for developers (have you ever seen a developer waiting for an IDE to compile their code - it’s not pretty!). Furthermore, inundating developers with tons of results can be distracting and end up reducing the remediation effort of the developer since they get fatigued by noisy alerts.

Baked in or optional?

Security testing that isn’t built into the inner sanctum of your code is effectively optional. External tools require someone to build them, install and configure and maintain them, integrate them and automate them. Even if you buy a 3rd party tool rather than build it yourself, you still have to operate, configure, intergrate and automate it yourself. This friction and extra overhead tends to cause developers to avoid these tools - and you lose any value they offer if just one person “forgets” to run the scan.

Background analysis

What about running the code scanning in the background on the developer laptop? This can get problematic because of compute constraints, and may end up with the situation where code is changed before the scans complete, so you get alerts for code that has already changed or been removed - way too much friction and frustration.

CLI Tools before pushing code

You could require developers to run CLI tools before pushing code - but this is now outside of the IDE anyway. Developers will invariably forget to run the tool, or just avoid running it since it is disruptive to their coding workflow.

Pre-commit hooks

What about pre-commit hooks - what about running code scanning there? Once again, typical code scanning takes in the order of minutes - far too long for a pre-commit hook. Developers would have a fit if it took 10 minutes to scan the code before a successful push! Heck, even 1 minute is too long to wait for a push to succeed.

Data for dependency scans

Dependency scanning (SCA) is performed on the repo with GHAS. While the dependency graph could be built in the IDE, how would the IDE compare the dependency graph to CVE/CWE databases to determine if any package contains a vulnerability? Either the IDE would have to download the databases or make API calls, which could be too slow and disrupt the daily developer workflow.

The sweet spot

Taking the above considerations into account, it becomes clear that placing security scanning at the repo/PR is as far left as you should go. Not only does this make security remediation a team sport since team members can collaborate around alerts/remediation process, but this is very little disruption to the daily workflow of a developer. For complex codebases where scanning takes longer than 10 minutes and could potentially slow CI/CD, scheduled jobs or parallel workflows (a CI workflow and a scanning workflow) are perfectly acceptable workarounds.

Developers are already used to collaborating around the PR. The PR is already the rallying point for code review, automated unit testing, linting and other quality gates. GHAS allows teams to add security testing into this pivot point smoothly. This means developers can keep using whatever IDEs they want - but still gain all the benefits of security scanning early and often in the software life cycle.

Dependabot runs post-push (and on a schedule) on the repo and is able to then compare the dependency graph to the vulnerability databases. Automated PRs to bump to patched versions further aids developers to quickly and easily remediate vulnerable packages with very low friction and interruption.

Secret scanning is the one exception - that you want to shift as far left as possible to prevent secrets from ever making their way into the shared repo. Secret Scanning in GHAS scans a repo’s entire history when you enable it for the first time, but you can also turn on Push Protection to ensure that secrets are kept out of the repo in the first place! Under the hood this is achieved conceptually by a pre-commit hook - but the computation time for secret scanning is far smaller than that required to perform code analysis. Secret scanning tends to complete well within seconds, allowing it to be shifted “more left” to the push.

Conclusion

Shifting left is critical for AppSec in today’s world - but you can actually shift too far left. GitHub Advanced Security shifts as far left as possible, but not into the IDE. This decision is deliberate and considered, since IDEs are not ideal for code and dependency scanning. Push protection ensures that secrets don’t enter the repo, but Dependency scanning and Code scanning are centered on the repo and PR where there is little friction for the development inner loop and encouragement of collaboration to remediate security alerts.

Happy securing!