Last fall, I reported two critical-rated, pre-authentication remote code execution vulnerabilities in the VMware ESXi platform. Both of them reside within the same component, the Service Location Protocol (SLP) service. In October, VMware released a patch to address one of the vulnerabilities, but it was incomplete and could be bypassed. VMware released a second patch in November completely addressing the use-after-free (UAF) portion of these bugs. The UAF vulnerability was assigned CVE-2020-3992. After that, VMware released a third patch in February completely addressing the heap overflow portion of these bugs. The heap overflow was assigned CVE-2021-21974.
This blog takes a look at both bugs and how the heap overflow could be used for code execution. Here is a quick video demonstrating the exploit in action:
Service Location Protocol (SLP) is a network service that listens on TCP and UDP port 427 on default installations of VMware ESXi. The implementation VMware uses is based on OpenSLP 1.0.1. VMware maintains its own version and has added some hardening to it.
The service parses network input without authentication and runs as root, so a vulnerability in the ESXi SLP service may lead to pre-auth remote code execution as root. This vector could also be used as a virtual machine escape, since by default a guest can access the SLP service on the host.
The Use-After-Free Bug (CVE-2020-3992)
This bug exists only in VMware’s implementation of SLP. Here is the simplified pseudocode:
At (3), if a
SLP_FUNCT_SRVREG request is handled correctly, it will save the allocated
SLPMessage into the database. However, at (4), the
SLPMessage is freed even though the handled request returns without error. It leaves a dangling pointer in the database. It is possible the
free at (4) was added in the course of fixing some older bugs.
Bypassing the First Patch for CVE-2020-3992
The first patch (build-16850804) by VMware was interesting. VMware didn’t make any changes to the vulnerable code shown above. Instead, they added logic to check the source IP address before handling the request. The logic, which is in
IsAddrLocal(), allows requests from a source IP address of localhost only.
After a few seconds, you might notice that it can still be accessed from an IPv6 link-local address via the LAN.
The Second Patch for CVE-2020-3992
Just over two weeks later, the second patch (build-17119627) was released. This time, they improved the IP source address check logic.
This change does eliminate the IPv6 vector. Additionally, they patched the root cause of the UAF bug by clearing the pointer to the
SLPMessage after adding it to the database.
The Heap Overflow Bug (CVE-2021-21974)
Like the previous bug, this bug exists only in VMware’s implementation of SLP. Here is the simplified pseudocode:
srvurl comes from network input, but the function does not terminate
srvurl with a NULL byte before using
strstr(). The out-of-bounds string search leads to a heap overflow at (6). This happened because VMware did not merge an update from the original OpenSLP project.
The Patch for CVE-2021-21974
Six weeks later, the third patch (build- 17325551) was released. It addressed the root cause of the heap overflow bug by checking the length before the
memcpy at (6).
All Linux exploit mitigations are enabled for
/bin/slpd, and most notably, Position Independent Executables (PIE). This makes it difficult to achieve code execution without first disclosing some addresses from memory. At first, I considered using the UAF, but I could not figure out an effective method to get a memory disclosure. Therefore, I moved my focus to the heap overflow bug instead.
Upgrading the Overflow
struct SLPBuffer to handle events that it sends and receives. One
SLPBuffer* sendbuf and one
SLPBuffer* recvbuf are allocated for each
The plan is to partially overwrite the
curpos pointer in SLPBuffer and leak some memory on the next message reply. However, the
sendbuf is emptied and updated before each reply. Fortunately, there is a timeslot during which
sendbuf can survive due to the select-based socket model:
- Fill a socket send buffer without receiving until the send buffer is full.
- Partially overwrite
sendbuf->curposfor that socket.
- Start to receive from the socket. The leaked memory will be appended at the end.
There are some additional challenges, though:
-- Due to the use of strstr(), you cannot overflow with a NULL byte.
-- The overflowed buffer (
obuf) will be automatically freed very soon after the return of
Together, this means that the overwrite can only extend partway through the next chunk header. Otherwise, the size of the next free chunk will be set to a very large value (four non-NULL bytes), and shortly after
obuf is freed, the process will abort.
The following layout overcomes these challenges:
Assume that the target is
sendbuf. In (F1), each chunk marked “IN USE” can be either a
SLPBuffer or a
SLPDSocket. A hole is prepared for
obuf in (F2). After triggering the overflow in (F4), the next freed chunk is enlarged and overlapped onto the target. Next,
obuf is then freed in (F5). Now, you can allocate a new
recvbuf from a new connection to overwrite the target in (F6). This time the overwrite can include NULL bytes.
There is an additional problem:
malloc() functions from OpenSLP are replaced with
calloc() by VMware.
recvbuf in (F6) is also allocated from
calloc(), which zero-initializes memory. This means that partial pointer overwrites are not possible when
recvbuf overlaps the target. There is a trick to get around that, though: You can first overwrite the
IS_MAPPED flag on the freed chunk in (F4). This causes
calloc() to skip the zero initialization on the next allocation. This is a general method that is useful in many situations where you want to perform an overwrite on target.
Putting It All Together
- Overwrite a connection state
STREAM_WRITE_FIRST. This is necessary so that
sendbuf->curposwill get reset to
sendbuf->startin preparation for the memory disclosure.
- Partially overwrite
sendbuf->startwith 2 NULL bytes, where
sendbufbelongs to the connection mentioned in step 1. Start receiving from the connection. You can then get memory disclosure, including the address of
sendbuf->curposfrom a new connection to leak the address of a
recvbuf, which is allocated from
mmap(). Once you have an mmapped address, it becomes possible to infer the
recvbuf->curposfrom a new connection, setting it to the address of
free_hook. Start sending on the connection. You can then overwrite
- Close a connection, invoking
free_hookto start the ROP chain.
These steps may not be the optimized form.
Privilege Level Obtained
If everything goes fine, you can execute arbitrary code with root permission on the target ESXi system. In ESXi 7, a new feature called DaemonSandboxing was prepared for SLP. It uses an AppArmor-like sandbox to isolate the SLP daemon. However, I find that this is disabled by default in my environment.
This suggests that a sandbox escape stage will be required in the future.
VMware ESXi is a popular infrastructure for cloud service providers and many others. Because of its popularity, these bugs may be exploited in the wild at some point. To defend against this vulnerability, you can either apply the relevant patches or implement the workaround. You should consider applying both to ensure your systems are adequately protected. Additionally, VMware now recommends disabling the OpenSLP service in ESXi if it is not used.
We look forward to seeing other methods to exploit these bugs as well as other ESXi vulnerabilities in general. Until then, you can find me on Twitter @_wmliang_, and follow the team for the latest in exploit techniques and security patches.
CVE-2020-3992 & CVE-2021-21974: Pre-Auth Remote Code Execution in VMware ESXi