Fine Tuning CodeQL Scans using Query Filters
CodeQL is a fantastic Static Analysis Scanning Tool (SAST). It can be enabled quickly using Actions, but it can be hard to figure out how to fine-tune which queries are run. In this post I’ll cover using Query Filters to fine-tune your CodeQL scans.
- Query Organization
- Why filter?
- Standard Selectors
- Filtering by Security Severity
- Security Severity Levels
- Query Filters
- Widening the Filter
- Testing the Configurations
- Adding Debug to the
- Executing the Scans
- Adding Debug to the
Image by Mauro Gigli on Unsplash
CodeQL scanning involves four phases:
- Initialize - where an empty database is created and hooks are configured into the compiler for compiled languages
- Build - where the database is populated from the code-base
- Query - where queries are executed against the database - results are output to a SARIF file
- Upload - where the SARIF file is uploaded to the GitHub repo
Note: The default
analyzeAction will query and upload in a single step.
In the initialize phase, you specify which of the supported languages you want to analyze. You can also (optionally) specify the set of queries you want to run.
Queries are the lowest level artifact in CodeQL scans. These are T-SQL like in syntax (with
select clauses), but also have very powerful abstractions like
Queries are typically grouped into suites. CodeQL packs can contain queries and suites. Additionally, you can filter queries - which we’ll get to shortly!
Before we move on, one more concept we need to understand is that queries have metadata associated with them. The metadata are more than just a way to describe the query - they are also critical for filtering.
Let’s look at the metadata from a query in the CodeQL repo to examine some of the metadata:
/** * @name Exposure of private information * @description If private information is written to an external location, it may be accessible by * unauthorized persons. * @kind path-problem * @problem.severity error * @security-severity 6.5 * @precision high * @id cs/exposure-of-sensitive-information * @tags security * external/cwe/cwe-359 */
We’ll use some of these metadata properties to filter - notably the
If you do not specify a suite in the CodeQL Action, then you’ll get a default set of queries for the language you’re scanning. However, the default set is a subset of all the queries. There are some queries that have higher or lower severity or different levels of “precision” (we’ll discuss what that is later). Rather than give you all the queries, the default setting filters out some queries. This file contains the default set of filters.
The default set of queries is called the
code-scanningsuite. Each language has a
.qls(query suite) file that specifies the list of queries and applies the
code-scanning-selectors.ymlselector. For example, this file is the default code scanning suite for
You can also customize the query suite by specifying other “standard” selectors: either
security-and-quality, which change the filter criteria by adding in additional queries that are excluded in the default selection.
Let’s examine a couple of selectors and how they are specified, and then a couple of use-cases where we use selectors to specify a different set of queries to execute during the Analyse phase.
If you look at the
includes from the standard selectors you’ll see that security-extended-selectors.yml selects queries that contain the
- description: Selectors for selecting the security-extended queries for a language - include: kind: - problem - path-problem precision: - high - very-high tags contain: - security ...
By contrast, the security-and-quality-selectors.yml file does not filter by that
- description: Selectors for selecting the security-extended queries for a language - include: kind: - problem - path-problem precision: - high - very-high ...
This means that the
security-extended suite will only include queries that have
security in their
tags metadata, while the
security-and-quality suite will include additional queries that do not contain this
However, we can also filter on other properties - such as
Filtering by Security Severity
Last week I heard of a company using CodeQL that were hitting upper limits on the upload size of the SARIF file. They are scanning a large mono-repo and are getting a large number of results in the scan. Arguably, there are other issues at play here, but the team did not want to refactor their build or their codebase.
In this case, neither of the default suites works. Perhaps we need to focus just on the most critical alerts first - so we are going to want to filter by
Security Severity Levels
When you see a CodeQL alert, it is marked with
However, if you look at the query metadata, these levels don’t appear. That’s because there is a table that shows how GitHub calculates the level based on the
|Low||0.1 - 3.9|
|Medium||4.0 - 6.9|
|High||7.0 - 8.9|
|Critical||9.0 - 10.0|
So how do we filter on security level?
You can filter queries using query filters in a configuration file. Then you just point the
init action to the config file, and you’re done! I’ll use code from this repo for the examples.
Here’s and example of an
init action that specifies a custom config:
# file: '.github/workflows/codeql-high-severity.yml' - name: Initialize CodeQL uses: github/codeql-action/init@v2 with: languages: csharp config-file: ./.github/codeql/high-severity.yml
Let’s then look at the custom config file:
# file: '.github/codeql/high-severity.yml' name: "Custom CodeQL Config for high/very high severity only" disable-default-queries: true queries: - uses: security-extended query-filters: - include: precision: - high - very-high tags contain: security security-severity: /([7-9]|10)\.(\d)+/
- First we specify a
- We then disable the default queries.
- We bring in the default
- We then apply a
- The filter selects only queries that have a
very-highprecision and a
- Finally, we use
regexto include only queries that contain a numeric value >= 7
Before we go on, what exactly is
precision? This is a measure of how many false positives are likely to be returned by the query. Queries with higher precision will return fewer false positives, while queries with lower precision tend to yield more false positives.
When security professionals are analyzing code-bases or writing queries, they may want to dial down precision. However, teams that want to make security remediation actionable should default to higher precision queries. The default setting for the out-the-box suites is
very-high precision to ensure very few false positives.
Note: Who decides on the precision? While the CodeQL repo is open-source and accepts community contributions, it is maintained by GitHub. Queries are rigorously tested and vetted, so the precision metadata is accurate.
Widening the Filter
The filter above narrowed the number of queries that will be executed in the analysis phase. But we can go the other way too! Here’s a snippet from the configuration for a set of lower precision queries that teams can use if they understand that they are going to get more false positives with this setting:
# file: '.github/codeql/high-severity.yml' name: "Custom CodeQL Config for lower precision" disable-default-queries: true queries: - uses: security-extended - uses: security-and-quality query-filters: - include: kind: - problem - path-problem - alert - path-alert precision: - low - medium - high - very-high tags contain: - security - correctness - maintainability - readability - include: kind: - problem - path-problem precision: - medium problem.severity: - error - warning - recommendation tags contain: - security ...
- First we specify a
- We then disable the default queries.
- We bring in the both the default
- We then apply a couple of
- The first filter
includesevery type of
- The next filter
includesqueries with a security tag and all types of
- The remainder of the file is the same as the default selectors from the CodeQL repo
Testing the Configurations
We can compare and contrast three scenarios:
|Name||Description||Branch||Actions File||Config file|
|Default||A default scan (no custom config)||None|
|High Severity||A high-severity config to only include high and critical security queries|
|Low Precision||A “low-precision” config to include more queries with lower precision and severity|
The code on all 3 branches is identical - the only reason I created them was for filtering the results in the Security tab.
Adding Debug to the
For the purposes of our exploration, I wanted to be able to analyze the SARIF results file after each scan run. To do this, I just added
debug: true to the
init action just below the
config-file. This will zip up the scanning database and the results file as artifacts that can be downloaded - I am really only interested in the results file since we can compare results, but also because the results file includes the list of the queries that are executed during a scan!
Executing the Scans
I’ve added a
workflow_dispatch trigger to the workflow files - so you have to navigate to the Actions tab of the repo and queue a run. After queueing a run for each scenario (and selecting the corresponding branch) I downloaded the SARIF results files for comparison.
To count the number of results in the SARIF, I crafted a quick
cat default-results.sarif | jq '.runs.results | length'
We can also figure out the count of queries. The language for this repo is
csharp so we look for the
codeql/csharp-queries tools extension in the file for the list of all the queries (
rules) that were included in the analysis:
cat default-results.sarif | jq '.runs.tool.extensions | select(.name == "codeql/csharp-queries") | .rules | length'
When we do the comparison, we get the following results:
|Scenario||Rule Count||Result Count|
We can also see the counts in the Code Scanning tab in the repo. Just change the branch filter to see the different result counts:
CodeQL is incredibly powerful - but there are times when you want to fine-tune the set of queries for analysis. Using Query Filters we can easily tweak exactly what we want to scan.