Data Mining Techniques & Tools for Fraud Detection

    Data mining with its wide variety of techniques is able to juice out a lot of useful information from a large set of data.

    With its ability to find useful knowledge from a given data, it is a potent technique to identify abnormal patterns in data and any underlying unwanted activity.

    Industries like Insurance, Banking, Credit Card, and Telecom are most vulnerable to financial fraud with large sets of data.

    Before we delve into data mining techniques for fraud detection, let’s look at some of the already developed fraud detection systems.

    1. The fuzzy logic system determines the fraudulent cases using optimum threshold values.
    2. A credit fraud model has a classification technique for fraud/legal values and secondly clustering followed by classification again for no fraud/legal values.
    3. Kohonen’s Self-Organizing Feature Map was used to evaluate auto injury claims on the size of fraud suspicion.

    Now let’s look at some of the data mining techniques that are helpful at fraud detection.

    Two Most Prominent Data Mining Techniques that Help with Fraud Detection

    1. Bayesian Belief Networks

    Bayesian Belief Networks basically sets up a model of the causal relationship on the basis of which probabilities are predicted and hence an instance is determined to be legal or illegal.

    For the purpose of detecting fraud, two Bayesian Networks are constructed to determine the behavior of auto insurance.

    The model basically makes two assumptions; one that considers the driver is fraudulent and the other one that the driver is legitimate.

    Two nets are set up; one is a fraud net and the other one (user net) is that from a genuine user.

    Now, as this operation is carried out, the user net is adapted to a specific user based on the incoming data and then the user’s behavior is observed for any deviations.

    2. Decision Trees

    Decision trees are a set of machine learning techniques that consists of independent and dependent attributes. The basic algorithm for the decision tree is explained below:

    We begin by assuming that there are two classes, that legal and illegal. The tree begins with a single node consisting of training samples.

    If the given samples are of the same class fraud, then the node will become a leaf and shall be labeled as a fraud.

    Else, the algorithm uses an entropy-based measure that shall separate the samples into individual classes.

    What are the Best Data Mining Tools for Fraud Detection?

    Some of the best data mining tools for fraud detection include:

    1. Clementine 4.0 from Integral Solutions Ltd.
    2. Darwin 3.0.1 from Thinking Machines Corp.
    3. Enterprise Mines from SAS Institute,
    4. Intelligent Miner for Data from IBM
    5. Pattern Recognition Workbench from Unica Technologies Inc.

    Also Read: List of 6 Open Source Data Mining Tools