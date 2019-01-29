Dependency strategies

Now that we understand why code dependencies are important, it’s time to look at the various approaches that static analyzers can take.

Ignore dependencies

This is the easiest algorithm to implement. The analyzer ignores dependency information and simply uses guesswork. It’s terribly inaccurate, but it doesn’t require writing any code.

The analyzer vendor typically optimizes the guessing to either maximize true positives or minimize false positives, often with the goal of getting the maximum possible score on some well-known benchmark. Usually, this involves assuming all dependencies are at the latest version. In an enterprise deployment, this works worst when needed most, as the analyzer will miss large swaths of defects in the most poorly maintained and vulnerable applications.

Since finding dependencies can be technically challenging, a vendor may make the dubious claim that the analyzer implements this strategy for “ease of use.”

Use a built-in library of knowledge

This is the brute-force approach to handling code dependencies. The analyzer is simply pre-configured to “know” about certain things. Again, by way of illustration, what does the following method do?



1 org.apache.commons.io.IOUtils.closeQuietly(…)



A good guess (based on the name) is that it closes a resource. The developer of the static analysis solution could go out and download this class, confirm what it does, and then provide this information to the analyzer as built-in knowledge. This works but isn’t a scalable solution owing to the large (and growing) number of libraries out there.

If an analyzer were to employ only this strategy, it wouldn’t be very good. It would have a very poor defect detection rate and/or a uselessly high false-positive rate. That doesn’t mean, however, that this technique is without merit.

It’s an excellent mechanism for the most common and stable libraries, such as the standard Java runtime libraries. It has the advantage of being fast since the information is pre-calculated. And it also allows the analyzer to enforce an API contract even if the code is more relaxed. But it’s not a good general-purpose solution.

Analyze dependencies

If a good static analyzer wants to truly understand a dependency, there’s one surefire way to do so. The analyzer is good at analyzing code, and it can just apply its own analysis techniques to the dependency. This doesn’t work so well for languages like C/C++, where it’s hard to extract useful information from the compiled binary (there are entire solutions whose whole purpose is to try this). But it works great for languages like Java, where the dependencies tend to be in easily analyzed JARs, or JavaScript, where they come as source code that is indistinguishable from the first-party code other than some metadata location (like residing in the node_modules!).

As analysis algorithms go, this is clearly the superior option. But it relies on locating the dependencies, which are not always available to security teams. Also, this technique is valuable only in proportion to the strength of the underlying analyzer.

Take a hybrid approach

The two active techniques described above are not mutually exclusive but rather can be applied together. In this scenario, the analyzer uses built-in knowledge when it’s available. Otherwise, it examines the actual dependencies to understand their behavior. It should be clear that this is an ideal solution but that it leads back to the original problem of obtaining the dependencies.