which is a serial interface that connects the audio codec
to the host PC. Typical messages to the audio codec are
structured as [NID][Verb][Payload], where NID
is the node identifier (e.g., the widget to operate on),
Verb is the type of operation (e.g., set configuration),
and Payload contains the parameters for the operation
(e.g., the configuration parameters).
The HD audio codec defines a number of pin widgets.
Each pin, including the audio jack ports, has its own con-
figuration. The configuration includes the jack color, lo-
cation (rear, front, top, etc.), connection type (in or out),
and other properties. For example, in the Realtek
ALC892 chip, pins 14-17 (LINE2-L, LINE2-R, MIC2-
L, MIC2-R), pins 21-24 (MIC1-L, MIC1-R, LINE1-L,
LINE1-R), and pins 35-36 (FRONT-L, FRONT-R) are
the analog input and output pins. In the retaskable pins
(e.g., 14-17) it is possible to change the default configu-
ration and its functionality from out (e.g., headphone or
speaker) to in (microphone) and vice versa. The HDA
specification defines the complete codec architecture
that allows a software driver to control various types of
operations [10].
3.1 Kernel interface
The vendors of audio codec chips, such as Realtek and
Conexant, provide kernel drivers which implement the
codec's functionality, including retasking, and expose it
to the user mode programs. For example, the Realtek
driver for Microsoft Windows allows remapping the au-
dio jack via specific values in the Windows Registry
(HKEY_LOCAL_MACHINE\SYSTEM\CurentCon-
trolSet\Control\Class\{4D36E96C-E325-
11CE-BFC1-08002BE10318}\0000\Set-
tings\). A guide for how to remap Realtek onboard
jacks in Microsoft Windows can be found in [14]. The
Linux kernel, a part of the Advanced Linux Sound Ar-
chitecture (ALSA), exposes an interface that enables the
jack configuration; the hda-jack-retask tool is a
user mode program for Linux that allows the manipula-
tion of the HD audio pins' control via a GUI interface
[15].
3.2 Architecture
The architecture of the SPEAKE(a)R malware is shown
in Figure 4. The SPEAKE(a)R malware consists of a user
level process and a kernel level driver. The user level
process manages the set of malware tasks, such as com-
munication, command and control (C&C), persistency,
keylogging, and so on. The kernel level driver is in
charge of retasking the jacks from output to input and
vice versa. The main operational scenario involves a PC
that is not equipped with a microphone (or in which the
microphone is muted or turned off) but has connected
headphones, earphones, or passive speakers.
Figure 4. SPEAKE(a)R architecture and components.
In this scenario a malware process runs in the user mode
(Figure 4, a), reconfiguring the headphone jack into a mi-
crophone jack by sending a retasking request to the ker-
nel module SPEAKE(a)R driver (Figure 4, b). The ker-
nel driver sends the configuration verb to the HD audio
codec through the HD audio interface (Figure 4, c),
which sends it to the audio codec via the HD audio bus.
Kernel level vs user level. Communicating with the HD
audio code chip via the kernel module provides the high-
est level of stealth, since the malware operations are not
exposed to user level monitoring (e.g., anti-virus). In-
stalling a kernel level driver requires root or administra-
tor privileges which can be acquired by stealing creden-
tials or exploiting a privilege escalation vulnerability in
the system. However, a kernel level component is not
necessary for the implementation of the SPEAKE(a)R
malware. A malware can communicate with the HD au-
dio hardware from the user level via command line tools
and system APIs in Linux and Windows OSs [16] [17].
These tools and APIs send the configuration verbs via
the standard audio driver already installed in the system.
The drawback of retasking the audio jacks from the user
level is that it is less stealthy. Anti-virus, intrusion detec-
tion systems (IDS), and intrusion prevention systems
(IPS) can detect the malicious activity, block it, and raise
an alert.
Stealth. During normal system behavior the audio out-
put reconfiguration takes place only while the head-
phones are not in use by the user. To avoid detection, the
SPEAKE(a)R kernel module detects when audio output
is triggered (e.g., the user is playing music) and instantly
reconfigures the microphone jack back into a headphone
jack.
Enhancing quality. The infected computer may be
equipped with both a microphone and headphones, but
the headphones are better positioned for the desired re-
cording, e.g., the headphones are closer to the voice
Malware
process
SPEAKE(a)R
module
HD Audio
(ALSA)
User-level
Kernel-level
HD Audio Codec Chip
Hardware
a
b
c
Retasking requests
Configuration verbs