Meltdown and Specter created something of a fusion in the world of cloud computing. And by translation, the flaws found in the processors at the heart of much of the world's IT infrastructure have had a direct or indirect effect on the interconnected services that drive the Internet today. This is especially true for a variant of the Specter vulnerability revealed abruptly by Google on January 3, since this particular vulnerability could allow malware to run on a user's virtual machine or other sandboxed environment to read data from another or from the host server itself.
In June 2017, Intel learned of these threats from researchers who kept the information secret so that hardware and operating system vendors could work furiously on the solutions. But while sites such as Amazon, Google and Microsoft were detected early due to their "Level 1" nature, most of the small infrastructure companies and data center operators remained in the dark until the news was released on March 3. January. This caused many organizations: there was no warning of the exploits before the proof of concept code to exploit them was already public.
Tory Kulick, Director of Operations and Security of the hosting company Linode, described this as a chaos. "How could something so big be revealed in this way without adequate warning? We were feeling out of place, like, what are we missing? Which of the POC [proofs of concept of the vulnerabilities] are out there now? & # 39; Everything that was going through my mind. "
"When this was broken, nobody had heard a peep from Intel or anyone else directly," Zachary Smith, executive director of the package hosting service, told Ars. "All we could see is what was happening on Google's blog about how to exploit this, so we were all fighting, the big ones, Google, Amazon and Microsoft, have had at least 60 days of preparation time, and we've had a negative preparation time. "
Even the teams behind some distributions of operating systems, including the developers of BSD distributions, were not aware of the flaws until Google published the blog Project Zero. "Only Level 1 companies received advance information, and that is not responsible disclosure, it's selective disclosure," said Theo de Raadt, OpenBSD project leader, speaking with ITWire. "Everyone who is below level 1 has become baded up."
The nature and timing of Google's disclosure, driven at least in part by the independent discovery of vulnerabilities, has made the response even more chaotic and painful for hosts and users of the cloud. The microcode corrections from the processor to the firmware have been ejected incomplete, in some cases they have been retrieved later. Some applications have had great performance successes. And no one is really sure how all permutations of software and firmware patches will affect cloud services as they are deployed.
So, to overcome the chaos, these companies did something new: they decided to work in parallel -late. A group of second level service providers came together to share formal information about patches from various vendors, metrics about their impact and best practices for implementing them. Over the past week, this ad hoc council of war, a group of at least 25 companies operating in a simple shared Slack, has attracted several high-profile members, including Netflix and Amazon Web Services. And this improvised centralization has even allowed the researchers originally behind the Specter / Meltdown discovery to interact directly with the affected companies.
"Probably one of the best things that emerged from the test was this cloud hosting collaboration," said Kulick de Linode. "Sharing links and things like that was absolutely critical."
And Kulick, like others in the group, expects this episode to generate more permanent collaboration throughout the industry, giving smaller organizations and large cloud customers a seat at the table for future security issues this magnitude.
"Our industry has grown," Smith said. "We are not a diverse team of people running small accommodation racks and we put some online websites online: we are managing a lot of people's lives in our infrastructure for them, and it would be a problem if we did not." find a way to coordinate. "
" Thank God that this was not a state actor, "added Smith.
The fire starts in the dumpster
While the world was shaking from the hangovers on New Year's Eve, another One kind The headache was taking shape between the talk on Slack channels at Packet, a "bare metal" lodging company based in New York.
"On Monday night and Tuesday, some of the commitments and comments from AMD to Kernel.org entered our internal Slack channels, "said Smith (Kernel.org is where contributors send the latest updates to the Linux kernel versions)." We host Kernel.org, so we see it carefully . Everyone was like, & # 39; Something is happening & # 39; "
There was a long discussion in Kernel.org change records dating back to May 2017 about a new feature called KAISER ( "Isolation of the kernel address to have" Efficiently Removed "side channels.) This feature was caused by long-standing concerns about the possibility of the types of attacks on which the Meltdown and Specter concept tests are based. for KAISER began about a month before Meltdown and Specter were revealed to Intel, so work was already under way to try to mitigate the potential threat of these kinds of attacks. By the time Packet and others started monitoring this, updates of the KAISER-related kernels were coming more and more frequently and with more subtle references to a potential exploit. As the year progressed
"I thought that the People were seeing things through compromises and "We started putting it together," Kulick said.
A comment accompanying Linux kernel committed by AMD's Tom Lendacky on December 27 really sparked speculation, infuriating executives at several of the companies who were aware of the vulnerabilities. The confirmation comment essentially explained AMD's position at the time about the embargoed errors: the company believed that its processors were not subject to the types of attacks against which the isolation feature of the kernel page table protects. AMD also believes that its microarchitecture does not allow memory references, including speculative references, that access privileged data when they run in a lower privilege mode when that access would result in a page error.
Of course, AMD's architecture later to not be as immune to lateral channel attacks as Lendacky claimed.
"AMD did not help with its kind of sarcastic kernel engagement," said Smith, who suggested that the comment might have played a role in Google's advance publication. of information about Specter and Meltdown. However, even if it did, other researchers were beginning to independently discover the flaws in the Specter and Meltdown core: researcher Anders Fogh had publicly written about what would later be defined as Meltdown in late July of last year.
What triggered the latest release, Jann Horn of Google's Project Zero security research team published details of Meltdown and Specter on January 3, a week before the initial embargo on vulnerability releases. At that time, according to Smith, "you know, all kinds of hell broke loose."
Kulick said he believed Google's disclosure caused problems, but "even if it were revealed in the novena as planned, we would all have been in a world of pain." It would have been different if there had been some delivery time.
Given the dependence of all kinds of applications on cloud services, it is saying that no one in Intel, Red Hat, AMD, or Google in anyone outside of the major hardware and operating system providers.
"The Tier 2 providers that are represented in this small work group that we form control hundreds of thousands, if not millions, of servers," Smith said. "But individually we're too small … Google never thought about calling Packet, Intel did not think about calling Packet, and they certainly did not call OVH or Digital Ocean, and yet, we're so important from a customer point of view, because our customers need a lot more help. "
Once the details were out, communications from Intel, AMD and other hardware vendors over Specter and Meltdown were (and have remained) irregular. Even today, there is no central communications channel for all those affected. "My impression is that [Intel’s communications with customers] was going through different teams depending on the regions," Kulick said. "They are being beaten quite hard, so there have been delays in communication."
"Intel was just behind the eight ball," said Smith. He suggested that Intel was too consumed by the problem of public relations and did not focus on talking to clients like him. "I've encouraged you [Intel] … I'm asking your group of data centers to do a kind of online fireside to answer questions We need to have some open conversations, which will not all be positive, but they have to work together, people have to be heard, and I generally believe that our community wants to help, we just need to have more of a more open dialogue. "
Of course, the problem of communication has not been helped by the absence of any type of channel established for communication. "Quite frankly, this is exposing how immature a public cloud industry is," said Smith. "We really do not have really good working groups, so where are you, if you're Red Hat or are you Intel or are you Supermicro, do you have any kind of common code of conduct to work with everyone around you – a security problem? no place ".