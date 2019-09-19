Questions about fuzz testing

How can you lower the false-positive / false-negative rate of a fuzzing tool?

Fuzzing typically doesn’t have a false-positive rate. The confusion comes from the fact that when you execute the test case, there might be results in your system under test (SUT)—for example, some unexpected behavior or something that crashes—that aren’t immediately checked again if you execute only one test case.

If you keep your system and input the same—in other words, if you use the same test case—there should always be the same result in your SUT. However, if you’re running 20 test cases, for example, you might overflow a buffer in test case 20 and cause something to crash. That wouldn’t happen if you restarted your target system and just executed test case 20.

Using external instrumentation or the Agent Instrumentation Framework allows you to instrument that on a test case basis, so you can immediately see the impact of just one test case—for example, on the memory used by a process and the ability to detect that.

How do you know when to stop fuzzing?

It comes down to metrics. Fuzzing has an infinite space issue in the sense that it’s negative testing. That means you can have an arbitrary number of test cases. For example, if you take one field in a packet and insert a single character, say ‘A,’ that’s a test case. You can insert two A’s in the field as a test case. And you can continue like that until you have one billion A’s in the field. Everything in between is a test case. It looks very fancy on marketing slides to have one billion test cases for this specific functional field. But it doesn’t make much sense when it comes to executing them against the target system.

There are various metrics that define what a good fuzzer should have. The first one is high-quality anomalies. Your normalization database should contain anomalies that are likely to trigger issues when inserted into various parts of the code.

Second, you should set up the instrumentation correctly to get information on whether a specific function crashes so you don’t execute more test cases to trigger that specific bug. For example, if you’re testing a specific field and you trigger an overflow, you don’t want to waste your time executing a hundred more test cases.

It’s also about coverage. The fuzzer should understand the protocol it’s testing against and create test cases that cover the entire breadth of the protocol. It should also be able to go very deep to the protocol stage while executing test cases.

Defensics is a good example of this. We have the full implementation of the protocol, so all the building blocks of the protocol are mirrored in our tool. As a result, we can take the model, multiply it by the anomalies we have using our normalization engine, and then dynamically create test cases.

Can you use fuzzing for performance testing?

That’s not really the intention behind how we approach fuzzing. Fuzzing is designed to look at the implementation of protocols running on the target.

Of course, there are various ways of instrumenting your target—via SNMP, via external instrumentation, and so on. So you can very easily get performance-related data from your target. For example, we often find when we execute test cases that we cause a spike in CPU and memory use. Defensics offers multiple ways of visualizing that, such as a graph that correlates all the different inputs. You can use that information to cross-correlate a test case causing high CPU spikes. Then you can loop that specific test case and see how it affects your target. But again, Defensics isn’t meant to be a load tester or any such tool. There are other tools with that dedicated functionality, and it’s not our core intent for the Defensics platform.