Administrators have update lessons to learn from the CrowdStrike outage

If administrators have learned anything from the CrowdStrike chaos, it’s to understand exactly what delayed updates mean – or don’t mean – in the anti-malware world.

One of the reasons the CrowdStrike update caused so many problems was that administrators assumed the faulty update would have been pulled and fixed long before it troubled their systems. Many were cheerfully running on N-2 or N-1, meaning they were set to use a release two or one version behind the latest.

On Friday, July 19, a security expert fielding an influx of calls from alarmed customers told us: “Every managed CrowdStrike customer is N-2. It’s even in their support documents as a recommendation.”

Yet Windows systems around the world began experiencing Blue Screen of Death boot loops as a CrowdStrike update made its global update.

The problem for many users was understanding that the version policy only applied to part of the CrowdStrike system. One posted: “We learned the N-1 policy we had in place only applies to agent updates, and not signature files.”

“As far as we can tell there is not a good way to delay what signature files get pushed, hence everybody receiving the 7/18 23:09 (central time) signature file that blew up the world over the next hour.”

Others complained: “Crowdstrike overrode client settings on N-1/N-2 deployments (i.e. staged 1 or 2 releases back from current) for this release.”

Another user, having noted that they had CrowdStrike set to be one version behind on non-critical infrastructure and two versions behind on critical infrastructure, glumly said: “We got hit anyway because it was a ‘content file’ and so ignored our auto update restrictions.”

While phrases such as “betrayal of trust” have been thrown around, it seems that not all administrators understood the update cadence applied to the software and not the channel used for signatures. There are, after all, good reasons why a customer would want the most up-to-date signature files, considering the speed at which malware evolves.

Sharon Martin, CEO of Managed Nerds, was in the process of spinning up a CrowdStrike evaluation to determine if it was something to offer customers just as the Blue Screen of Death wave began. As it became clear that critical systems were at risk from the wayward signature file regardless of update cadence, Martin said: “If CrowdStrike was the only EDR solution left in the world, I would choose ransomware over it.

“It’s a tool in the security stack that completely broke the affected systems with no warning. There was no indication in the beginning of how or when recovery might be obtained. The information was so slow flowing, some were unsure if they should wait for a fix or just go on and reimage the machine.

“Most of the information that did flow was either to their largest partners or behind their login-walled documentation portal which is not publicly searchable. That’s why some organizations didn’t know whether to send their people home for the day or not on Friday.”

Jamil Ahmed, Distinguished Engineer at Solace, noted the dilemma facing administrators. “n-2 for software means you are taking a conservative and cautious view to not run the most latest version.

“However … it is in your interest to get the latest definitions of new threats and viruses as soon as you can without delay. This is the heart of the struggle here. It is the processing of that new definition that had the bug and caused the BSOD.”

Planning for disaster and staging updates is critical. Using deployment rings to apply updates to a few devices first is commonplace. Analysts including Directions on Microsoft’s Mary Jo Foley have listed the practice as a lesson to be learned from the CrowdStrike meltdown.

However, staging will only go so far when there is a separate channel that pushes signature updates on a different cadence.

Just ask any of the administrators who have had to deal with the consequences. ®

READ MORE HERE