Monday, July 26, 2010

Design Detection Heuristics


Benford's law provides a useful heuristic to detect data that has been produced by a person. This is very useful to detect fraud, tampering, vote rigging and other activities where one needs a little help. It appears thought that the application of Benford's law is more of an art than a science and rather than being the smoking gun one would like, it serves as the starting point for an investigation or a trigger for caution.

I've developed a Splunk App that adds a new command to the Splunk search language that calculates the first digit distribution, which can then be used to graph the field of interest.

* | benford field=price | table digit price benford

Other digits can be selected as follows

* | benford field=price digit=2 | table digit price benford


Here's some sample transactions I generated

The benford command will calculate the distribution of the first digit and produce a table, which can be graphed.











The following graph illustrates the digit distribution compared to the benford distribution.













The following graph was created using real transactional data.