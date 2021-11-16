About the Trojan Source vulnerability

Researchers at the University of Cambridge recently revealed a Trojan Source vulnerability that can affect any codebase regardless of the programming language. Thanks to Unicode, the rendering of source code can appear different from the actual parse structure. This vulnerability allows attackers to easily insert trojans into any application, creating a weakness to exploit.

For example, the following code snippet might appear to be safe, but the hidden Unicode control characters in it cause compilers to parse it in an unusual way.

/* begin sensitive block */ if (properlySanitized(user_input)==true) {

sensitive_api_call(user_input);

/* end sensitive block */}

The above gets parsed into

/* begin sensitive block */ if (properlySanitized(user_input)==true) {sensitive_api_call(user_input);/* end sensitive block */ }

Organizations need a solution to this problem because of how easily this vulnerability could be injected into codebases. For example, when a developer searches the web for a way to implement an algorithm or use an API, they might copy and paste a code snippet from the search results. If the copied snippet contains this attack, the trojan will be successfully planted. And this kind of vulnerability is difficult to catch with manual code review because most people aren’t looking for hidden characters when they review code.

This vulnerability can also enter a codebase via the supply chain of third-party components. A popular dependency could include the malicious code, and it might not be caught during code review because bidi character attacks are invisible to human reviewers.