public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
* [RFC] Adoption of CodeQL in edk2
@ 2022-09-23 22:18 Michael Kubacki
  2022-09-30  0:05 ` [edk2-devel] " Michael Kubacki
  0 siblings, 1 reply; 8+ messages in thread
From: Michael Kubacki @ 2022-09-23 22:18 UTC (permalink / raw)
  To: devel@edk2.groups.io

This is a copy of the markdown from the edk2 discussion where it has 
been posted as a draft:

https://github.com/tianocore/edk2/discussions/3258#discussioncomment-3682099

Please go to that link to see a rendered version of the text.

---

#  Adoption of CodeQL in edk2

This RFC proposes adoption of CodeQL as a static analysis tool used in 
the TianoCore edk2 project.

## Introduction

CodeQL is open source and free for open-source projects. It is 
maintained by GitHub and naturally has excellent integration with GitHub 
projects. CodeQL generates a "database" during the firmware build 
process that enables queries to run against that database. Many 
open-source queries are officially supported and comprise the 
vulnerability analysis performed against the database. These queries are 
maintained here - https://github.com/github/codeql.

Queries are written in an object-oriented query language called QL. 
CodeQL provides:
1. A [command-line (CLI) 
interface](https://codeql.github.com/docs/codeql-cli/#codeql-cli)
2. A [VS Code 
extension](https://codeql.github.com/docs/codeql-for-visual-studio-code/#codeql-for-visual-studio-code) 
to help write queries and run queries
3. [GitHub action](https://github.com/github/codeql-action) support for 
repo integration via [code 
scanning](https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning)
4. In addition to other features described in the [CodeQL 
overview](https://codeql.github.com/docs/codeql-overview/)

[LGTM](https://lgtm.com/) is often paired with CodeQL so you may read 
about that in documentation. LGTM was acquired by GitHub a few years ago 
and all of the functionality has been moved to GitHub code scanning. You 
can read more about this process on the [GitHub blog 
post](https://github.blog/2022-08-15-the-next-step-for-lgtm-com-github-code-scanning/).

CodeQL is an actively maintained project. Here is a comparison of edk2 
commit activity versus CodeQL for reference:

- [CodeQL Commit 
Activity](https://github.com/github/codeql/graphs/commit-activity)
- [edk2 Commit 
Activity](https://github.com/github/codeql/graphs/commit-activity)

Because CodeQL does maintain a strong open-source presence, the 
TianoCore community should be able to file 
[issues](https://github.com/github/codeql/issues) and [pull 
requests](https://github.com/github/codeql/pulls) into the project.

## CodeQL Usage in edk2

CodeQL provides the capability to debug the actual queries and for our 
(Tianocore) community to write our own queries and even contribute back 
to the upstream repo when appropriate. In other cases, we might choose 
to keep our own queries in a separate TianoCore repo or within a 
directory in the edk2 code tree.

This is all part of CodeQL Scanning and [this 
page](https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning) 
has information concerning how to configure CodeQL scanning within a 
GitHub project such as edk2. Information on the particular topic of 
running additional custom queries is documented 
[here](https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning#running-additional-queries) 
in that page.

In addition, CodeQL presents the flexibility to:
- Build databases locally
- Retrieve databases from server builds
- Relatively quickly test queries locally against a database for a fast 
feedback loop
- Suppress false positives
- Customize the files and queries used in the edk2 project and quickly 
keep this list in sync between the server and local execution

### Dismissing CodeQL Alerts

The following documentation describes how to dismiss alerts:
[Dismissing 
Alerts](https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/managing-code-scanning-alerts-for-your-repository#dismissing--alerts)

 > _Note:_ If query has a false positive a GitHub Issue can be submitted 
in the [CodeQL repo issues 
page](https://github.com/github/codeql/issues) with the `false-positive` 
tag to help improve the query.

### CodeQL in Pull Requests

The proposal is to enable CodeQL in a step-by-step fashion. The goal 
with this approach is to make steady progress enabling CodeQL to become 
more comprehensive and useful while not impacting day-to-day code 
contributions.

Throughout the process described in this section, CodeQL Code Scanning 
will be a mandatory status check for edk2 pull requests.

It is proposed to make no changes to the PR evaluation process already 
in place that only builds packages with changes in the pull request.

#### Query Target List

The first step is to define a target list of queries for edk2. While 
CodeQL has the ability run against several languages ([including 
Python](https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning#changing-the-languages-that-are-analyzed)), 
I propose we enable C/C++ first and then return to evaluate Python analysis.

I propose the following as an initial candidate list:

- 
[cpp/conditionally-uninitialized-variable](https://github.com/github/codeql/blob/main/cpp/ql/src/Security/CWE/CWE-457/ConditionallyUninitializedVariable.ql)
- 
[cpp/infinite-loop-with-unsatisfiable-exit-condition](https://github.com/github/codeql/blob/main/cpp/ql/src/Security/CWE/CWE-835/InfiniteLoopWithUnsatisfiableExitCondition.ql)
- 
[cpp/overflow-buffer](https://github.com/github/codeql/blob/main/cpp/ql/src/Security/CWE/CWE-119/OverflowBuffer.ql)
- 
[cpp/pointer-overflow-check](https://github.com/github/codeql/blob/main/cpp/ql/src/Likely%20Bugs/Memory%20Management/PointerOverflow.ql)
- 
[cpp/potential-buffer-overflow](https://github.com/github/codeql/blob/main/cpp/ql/src/Likely%20Bugs/Memory%20Management/PotentialBufferOverflow.ql)
- 
[cpp/toctou-race-condition](https://github.com/github/codeql/blob/main/cpp/ql/src/Security/CWE/CWE-367/TOCTOUFilesystemRace.ql)
- 
[cpp/unclear-array-index-validation](https://github.com/github/codeql/blob/main/cpp/ql/src/Security/CWE/CWE-129/ImproperArrayIndexValidation.ql)
- 
[cpp/unsafe-strncat](https://github.com/github/codeql/blob/main/cpp/ql/src/Likely%20Bugs/Memory%20Management/SuspiciousCallToStrncat.ql)
- 
[cpp/use-after-free](https://github.com/github/codeql/blob/main/cpp/ql/src/Critical/UseAfterFree.ql)
- 
[cpp/user-controlled-null-termination-tainted](https://github.com/github/codeql/blob/main/cpp/ql/src/Security/CWE/CWE-170/ImproperNullTerminationTainted.ql)
- 
[cpp/wrong-number-format-arguments](https://github.com/github/codeql/blob/main/cpp/ql/src/Likely%20Bugs/Format/WrongNumberOfFormatArguments.ql)
- 
[cpp/wrong-type-format-argument](https://github.com/github/codeql/blob/main/cpp/ql/src/Likely%20Bugs/Format/WrongTypeFormatArguments.ql)

CodeQL query files (.ql files) contain metadata about the query. For 
example, 
[cpp/conditionally-uninitialized-variable](https://github.com/github/codeql/blob/main/cpp/ql/src/Security/CWE/CWE-457/ConditionallyUninitializedVariable.ql) 
states the following about the query:

```plaintext
/**
  * @name Conditionally uninitialized variable
  * @description An initialization function is used to initialize a 
local variable, but the
  *              returned status code is not checked. The variable may 
be left in an uninitialized
  *              state, and reading the variable may result in undefined 
behavior.
  * @kind problem
  * @problem.severity warning
  * @security-severity 7.8
  * @id cpp/conditionally-uninitialized-variable
  * @tags security
  *       external/cwe/cwe-457
  */
```

We can automatically include queries against these criteria using "query 
filters". For example, this could include any `problem` query above a 
certain `security-severity` level. Or all queries with `security` in `tags`.

That would mean new queries checked into the CodeQL repo could cause 
unexpected build breaks. Since edk2 favors consistency in CI results, I 
propose we start with the fixed query set proposed at the top of this 
section.

 > _Note:_ Additional queries can be found here as well - 
https://lgtm.com/search?q=cpp&t=rules

##### Suggesting New Queries

It is proposed new queries be enabled by sending an RFC to the TianoCore 
development mailing list (devel@edk2.groups.io) with the query link and 
justification for adopting the query in edk2. Anyone is welcome to 
suggest new queries.

#### Enable One Query at a Time

Enabling the candidate list of queries immediately will trigger hundreds 
of alerts.

Therefore, I recommend we enable one query at a time. The PR to enable 
each query can be accompanied by the code fixes for the query to pass. 
If a query is deemed fruitless during enabling testing, it can simply be 
rejected. The goal is to enable an effective set of queries that improve 
the codebase. As the list of enabled queries builds, the total CodeQL 
coverage will increase against active PRs.

I recommend we start with a query that has a very low alert frequency to 
enable the initial GitHub Action and build momentum for enabling 
additional queries.

It is proposed that each query be enabled in a "query staging branch" 
where the query is added to the edk2 query suite and each package 
contributes their patches to the branch to fix any issues necessary for 
the query to be successful. Once that branch is ready, it can be 
converted into a patch series that enables the query for the codebase in 
a single pull request.

That being said, we can [configure the severities that cause pull 
request 
failure](https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning#defining-the-severities-causing-pull-request-check-failure). 
If we find a useful informational query, that could be enabled without 
impacting the pull request status result. As of this RFC, those queries 
are not proposed.

#### edk2 Pull Request Example

I put together a [PR 3179: Test 
CodeQL](https://github.com/tianocore/edk2/pull/3179) that demonstrates a 
basic CodeQL GitHub action.

You can see the CodeQL link in the PR Checks area ([direct link to 
results](https://github.com/tianocore/edk2/pull/3179/checks?check_run_id=7817354780)):
![image](https://user-images.githubusercontent.com/21320094/187554321-6fb230f8-a870-468f-911e-861b046a1396.png)

You can also see the run on the 
[edk2/actions](https://github.com/tianocore/edk2/actions) page:
![image](https://user-images.githubusercontent.com/21320094/187554464-651313ea-8fad-4881-9e4c-fba0b102d444.png)

This initial test was mainly to ensure permissions were set up to allow 
the action to run and to settle other logistics in getting the action 
(first GitHub action in edk2) setup and working properly.

#### edk2 Pull Request Time

We can roughly expect CodeQL to take an hour to complete against a pull 
request. Since this would be the longest running task in a pull request 
in parallel to other jobs, pull requests should be expected to take 
around an hour to have all status checks finished. This is not 
considered an issue since:
1. Results can be checked locally
2. The TianoCore contribution process requires patches to be on the 
mailing list for a minimum of 24 hours
3. Pull requests can be created at any time to check results

CodeQL has the [ability to ignore certain file 
paths](https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning#avoiding-unnecessary-scans-of-pull-requests) 
and I recommend we leverage that to avoid running CodeQL on pull 
requests that only change the maintainers.txt file for example.

#### CodeQL Scheduled Builds

While pull requests will only build packages deemed necessary based on 
the PR evaluation heuristics used today, the plan is to have a 24 
scheduled build that will build all of the packages and place the 
results in GitHub Code Scanning.

We will use this process to understand if there any differences 
discovered. Based on the results, we might choose to adjust our strategy 
for running CodeQL in pull requests to prevent future discrepancies. If 
differences are present, it is proposed maintainers of the affected 
package resolve those issues.

### CodeQL Locally

The [CodeQL CLI](https://codeql.github.com/docs/codeql-cli/) can be used 
as follows to wrap around the edk2 build process (`MdeModulePkg` in this 
case) to generate a database in the directory `cpp-database`. Example is 
shown using 
[stuart](https://github.com/tianocore/edk2-pytool-extensions) build command.

```cmd
codeql database create cpp-database --language=cpp 
--command="stuart_ci_build -c .pytool/CISettings.py -p MdeModulePkg -a 
IA32,X64 TOOL_CHAIN_TAG=VS2019 Target=DEBUG --clean --skip-post-build" 
--overwrite
```

The following command can be used to generate a [SARIF 
file](https://codeql.github.com/docs/codeql-cli/sarif-output/) (called 
`query-results.sarif`) from that database with the results of the 
[cpp/conditionally-uninitialized-variable](https://github.com/github/codeql/blob/main/cpp/ql/src/Security/CWE/CWE-457/ConditionallyUninitializedVariable.ql) 
query:

```cmd
codeql database analyze cpp-database 
codeql\cpp\ql\src\Security\CWE\CWE-457\ConditionallyUninitializedVariable.ql 
--format=sarifv2.1.0 --output=query-results.sarif
```

SARIF logs can be read by log viewers such as the [Sarif 
Viewer](https://marketplace.visualstudio.com/items?itemName=MS-SarifVSCode.sarif-viewer) 
extension for [VS Code](https://code.visualstudio.com/).

---

Once an edk2 query suite (.qls) file is created, that same file can be 
used for pull requests and by developers locally using the CodeQL CLI to 
run the same set of queries with the `analyze` command shown above. 
Developers can add new queries to the file and confirm results before 
attempting server builds.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-10-08  1:26 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-09-23 22:18 [RFC] Adoption of CodeQL in edk2 Michael Kubacki
2022-09-30  0:05 ` [edk2-devel] " Michael Kubacki
2022-09-30  1:02   ` Michael D Kinney
2022-09-30  2:53     ` Ni, Ray
2022-10-03 14:19       ` Michael Kubacki
2022-09-30  6:33     ` 回复: " gaoliming
2022-10-03 14:29       ` Michael Kubacki
2022-10-08  1:26         ` 回复: " gaoliming

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox