In late October security researchers published details of proof of concepts exploits affecting smart home devices e.g. Amazon Echoes (known as Amazon Alexa) and Google Home. These techniques allow for eavesdropping on conversations and the obtaining of passwords from users.
Why should these proofs of concepts be considered significant?
The proof of concept apps used by the researchers passed both Amazon’s and Google’s app validation processes and were briefly available to the public. Further modifications to the apps did not require a validation by either vendor.
The researchers demonstrated how their app can mislead a user into believing the smart device is no longer listening (and recording) when in fact it is.
For an Amazon Echo the device was made to keep listening by changing the de-activation intent (a phrase that can have values (words) within it to carry out custom actions. Instead the de-activation routine does not stop the device from recording you. This was done in a way that the owner of the Amazon Echo would not know anything was wrong since they will still hear the device speak “Goodbye” message. This was achieved by adding a Unicode (defined) character sequence (U+D801, dot, space) to the end of the intent sequence. Since these characters cannot be pronounced (and heard) by the device silencing the speaker but keeping the app active in order eavesdrop on a conversation. By adding more characters, the time can easily be extended.
Eavesdropping using the Amazon Echo is demonstrated in the following video from the SRLabs researchers:
Phishing a Password
To phish a password the researchers simply added an audible message in place of some of the unpronounceable characters to simply ask the user for their password by first telling them a security update for app is available and to supply the password to install the update. The researchers demonstrated the ability to convert the spoken sentence into text and send it to their proof of concept server. This is demonstrated in the following video:
To perform the same actions with Google Home the researchers put the user into a loop and were able to capture recognised speech as text without alerting the user of the Google Home to this being carried out. This time the researchers used multiple “noInputPrompts” with SSML elements or the Unicode characters again to capture whatever is being spoken.
This is demonstrated in the following video:
Phishing a Password
This was carried out using the same technique as for the Amazon Echo above. This is demonstrated in the following video:
How can I protect my smart speaker / virtual assistant from these vulnerabilities?
Unfortunately, as the purchaser of these devices there is no action you can carry out to prevent these techniques being used against you. Instead the responsibility lies with Amazon and Google. They need to improve their app validation processes, as per the researcher’s findings:
“To prevent ‘Smart Spies’ attacks, Amazon and Google need to implement better protection, starting with a more thorough review process of third-party Skills and Actions made available in their voice app stores. The voice app review needs to check explicitly for copies of built-in intents. Unpronounceable characters like “U+D801, dot, space. “ and silent SSML messages should be removed to prevent arbitrary long pauses in the speakers’ output. Suspicious output texts including “password“ deserve particular attention or should be disallowed completely.”
My thanks to the SRLabs researchers who explain what needs to be done by the vendors to remediate these issues.
Proof of concept attacks using laser beams
Smart speakers use specific microphones known as microelectro-mechanical systems (MEMS) microphones to convert the voices they hear into electrical signals they can understand and process. Such microphones however also respond to the application of light to them as proven by academic researchers who user lasers to have the devices call out the time, order a laser pointer online, set the devices volume to zero and open a garage door (or potentially the front door of a house).
What are the limitations of this technique?
The aiming of the laser can be imprecise which limits its distance and may also inadvertently hit other smart speaker devices. The researchers used a telescope, a telephoto lens and a tripod to focus the beam and to provide accurate timing.
Further limitations are detailed in this BleepingComputer article. My thanks to them for this detail and for the descriptions of this technique.
They also detail methods by which the owner of the smart speaker could be alerted to this technique being used to exploit it: “the victim may be alerted by the visibility of the light beam, unless infrared is used – but additional gear is necessary in this case, and the audio response from the target device confirm execution of the command”.
Both Amazon and Google provided statements that they are analysing the results of this research and are working with the researchers to improve security.