Maybe I jump to an implementation too soon. I should have shared more about my thought at first. I am sorry about that.
When I first saw the DisableVarPolicy() API, I felt it is a backdoor. This is a brand-new feature that does not exist today.
(I don’t treat VarPolicy as something new, it is an enhancement of VarCheck or VarLock. I have no concern at all.)
Also disabling UEFI secure boot requires either authenticated variable or physical user presence before.
With this feature, the UEFI secure boot can be disabled by a simple function call. (Maybe that can be considered as a feature)
It also reminds me that when UEFI secure boot was introduced. Almost all attacks are just to figure out a way to disable UEFI secure boot via different ways, such as other variable or jumper.
As such, my first reaction is that I want to have a way to eliminate the DisableVarPolicy() and always enforce policy. I believe it should be a *platform choice* to support DisableVarPolicy() or not.
Maybe in MSFT platform, this is required in MFG mode. Not a problem. As long as you have SDL process to review the MFG code and make you can do the right thing when the var policy and secure boot is disabled.
Maybe for my platform, I have no confidence on that. What I want to do is just to remove this DisableVarPolicy() feature at all. Instead of review all the MFG code to make sure they are not calling DisableVarPolicy, I can just do a simple
check - This function does not exist. Done! That is the simplest way for me. Many EDKII feature uses the similar way – use a PCD to control a feature on/off.
Yes, I fully agree with you that “Our recommendation would be to make sure that "lock" is called prior to leaving TCB, similar to SMM or other hardware locks (which the platform already has to coordinate).”
That is good practice everyone should follow. However, if there is a call DisableVarPolicy() in the code, it must be before Lock. (Otherwise it does not work)
What if an attack can find a path to trigger DisableVarPolicy() before lock? This recommendation does not help.
If you don’t like splitting the protocol, that is fine.
I can offer another implementation choice. Defining a PCD – PcdSupportDisableVarPolicy. Pseudo code below:
EFI_STATUS
DisableVarPolicy()
{
If (!FeaturePcdGet(PcdSupportDisableVarPolicy)) {
Return UNSUPPORTED
}
// Follow existing way
}
With those 3 lines of code, you may configure TRUE for MSFT platform, with detail SDL review in MFG mode.
I may configure FALSE for my platform with confident that no one can violate the policy in any boot mode. Simple for me.
J
Now let’s talk about Test.
I think the unit tests and system test MSFT did are very good. We learn a lot from that.
I do appreciate your great work.
On the other hand, the test only shows the *final system state*. The attacker may find another path to bypass the protection by triggering DisableVarPolicy() to unlock the system.
(I still remember the stupid mistake I made years ago that caused a RO variable not locked in S4 resume path. Also another mistake that a system state is unlocked in S3 resume path.
L)
It is hard to design test to catch *all possible special path*. That is my headache. If there is such function existing in my firmware, I have to keep my eye on every possible path.
That is why I would like to have a way to eliminate this function in my firmware at all (keep it simple), but leave the flexibility to keep it enabled in other firmware.
Anyway, I like your idea to extend the security state to PCR. Just in case there is violation it can be recorded.
Thank you
Yao Jiewen
From: bret.barkelew via [] <bret.barkelew=microsoft.com@[]>
Sent: Friday, February 7, 2020 2:00 PM
To: Yao; Yao, Jiewen <jiewen.yao@intel.com>; devel@edk2.groups.io
Subject: Re: [edk2-devel] [edk2-rfc] [edk2-devel] [RFC] VariablePolicy - Protocol, Libraries, and Implementation for VariableLock Alternative
I agree with you that this function is extremely sensitive and important, and I kinda like your idea of splitting the protocol into a "policy" and "policy control", but I don't think it make practical sense to split them this way. At the
very least, you end up with a chicken and egg problem where we would need to read mfg state to decide whether we need to publish the other protocol, but mfg state would also determine whether we used the protocol to disable the protection.
Furthermore, there may be architectural (e.g. off-Soc) implementations of this that would still require a "lock" signal to disable their ability to receive the disable command. These solutions wouldn't have a concept of building with a "I want more security"
PCD.
Our recommendation would be to make sure that "lock" is called prior to leaving TCB, similar to SMM or other hardware locks (which the platform already has to coordinate).
Two points to help manage this:
1) We would be willing to discuss adding a HSTI bit or DeviceState flag to indicate to an OS or attestation authority that the protections are disabled for the current boot.
2) We would recommend running the VarLockAudit test that we've provided with the release to make sure that not only are protections enabled, but the parameters of your policies are configured the way the platform wants them. It was very useful to us when trying
to understand the overall system state we wanted.
https://github.com/corthon/edk2/tree/personal/brbarkel/var_policy_rfc_extra/UefiTestingPkg/AuditTests/UefiVarLockAudit
Looking forward to your thoughts.
Thanks!
- Bret