Insurance fraud is an economic problem that threatens the financial strength of insurers as well as their survival. Of particular interest is the automobile insurance class which contributes 35.8% of the non-life insurance business and has in the past five years grown by 38.17% and 3.96% in motor private and commercial respectively. Fraud in this class may take the form of reporting and claiming of fictitious damage or loss, exaggeration of losses covered by the insured and misrepresentation of facts.
According to Association of Kenya Insurers (AKI), local market players in the underwriting industry find automobile insurance to be one of most challenging products for insurers due to the large technical loss accounting for 68.92% and 60.72% in motor private and motor commercial respectively. This means that for every Kes 100 earned by the insurer in premium, Kes 68.92 and Kes 60.72 respectively goes out to settle insured’s claims. The situation is aggravated by the large expenses incurred partially attributed to the investigation carried out to verify the genuineness of the claim which account for 44.16% in each of the classes. As a result, the insurer makes a loss of Kes 13.08 and Kes 4.88 respectively, for every Kes 100 earned in net premium.
A significant proportion of losses can be attributed fraudulent insurance claims. In a bid to respond to this, AKI rolled out a system dubbed Integrated Motor Insurance Data System (IMIDS) which provides a statistical hub for the insurance companies through sharing of information among them to help counter incidents of fraud that have hit the sector. The system receives on real-time basis, underwriting and claims information and this enables the AKI secretariat support its members by providing information on stolen motor vehicle, written-off motor vehicle data, uninsured motor vehicles, accidental claims and underwriting data among others.
While this information is the new oil for the industry, analytics form the combustion engine. Coupling such a system with a mathematical model for fraud detection, not to make a definite decision about a claim being fraudulent at the time it is reported, but rather to determine whether there is statistically significant evidence that a claim is likely to be fraudulent or not so as to allow for better prioritization of claims that are to be investigated. The Artificial Neural Networks (ANN) provide for the best suited candidate for this purpose. This is because ANNs exhibit mapping capabilities where they can map input patterns to their associated output patterns, learn by example through training and possess the capability to generalize thus they are able to predict new outcomes from past trends. ANNs are also robust and fault tolerant and can therefore recall patterns from incomplete or partial patterns.
Univatiate analysis to be able to identify the key indicators of automobile insurance fraud is carried out using logistic regression. Claim characteristics significant in predicting the claim status are evaluated from the p values at 95% confidence interval and the results displayed in the table below.
From the p values in the table above, we can conclude that all claim characteristics are significant in predicting the claim status at 95% confidence interval except for gender, type of cover, existence of inconsistencies between the medical form and police report and number of injuries. The insignificance of gender could be attributed to the data used as it corresponded to gender of policyholder who may not be necessarily the person driving as at the time of the accident. For the latter variable, it should be noted that, all claims with medical form inconsistencies were fraudulent, implying that this variable is important in flagging the claim status and the large p-values could be attributed to the few records observed to have such inconsistencies.
Odds ratio evaluate the odds of an event (a claim being fraudulent) occurring in one group to odds of it occurring in another group (input variable).
- Claims tied to high number of claimants are very highly likely (38 times) to be fraudulent unlike lower ones as fraudsters are more likely to exploit the opportunity of imposing themselves in accidents involving large groups of people as chances of detection are perceived to be minimal.
- Claims filed when the prevailing lending interest rate are high are at least thrice more likely to be fraudulent than when the interest rate are low. In Kenya, most individuals purchase motor vehicles using car loans and therefore increase in interest rates may cause financial strain on policyholders. This may cause some to feel justified to engage in insurance fraud so that they may obtain monies to pay off their loans. This situation may be made worse when the insured is in a bad financial situation or occupational situation is unusual and/or difficult, for instance, if insured is employed in an industry or company that is experiencing lay-offs or downsizing.
- Claims filed shortly after cover was taken are more likely (13 times) to be fraudulent than the ones that took a longer time to experience an accident. This is an indication that fraudsters are highly likely to take up an automobile cover with the sole purpose of exploiting the insurer by staging accidents immediately the cover is effective so that they would not have paid out much in premiums.
- Claims reported to the insured long after an accident occurred are at least five times more likely to be fraudulent than the ones that are reported immediately the accident occurred.
- Claims reported to the insured in the absence of a police report are highly likely to be fraudulent than the ones which no police report about the accident was recorded.
- Interaction between number of claimants and interest rate: has a significant association with the claim status, at 95% confidence interval. There exists a positive correlation of 0.49 between these two variables indicating increase in the lending interest rates result to an increase in the number of claimants.
- Interaction between claims being filed shortly after cover was taken and interest rate has a significant association with the claim status. These two variables are negatively correlated with a correlation coefficient of -0.44 indicating an increase in the lending interest rates result to a decrease in the number of days between when the policy cover was taken and when the claim was filed implying that people are more likely to file claims shortly after taking up a cover when the prevailing lending interests rates are high.
The ANN works in two steps: the input variables are multiplied and added with weights
and a threshold operation is added to the result. If
and
represent the input vector and weight vector respectively and for a given threshold b, let
be the corresponding threshold function. The output variables
of the network is 1 when the sum of weighted input signals lies above the threshold and 0 otherwise. Thus the effect of the network depends on the weights
and threshold b.
Based on this concept the network is trained to solve the automobile insurance fraud classification problem based on the input variables which represent the claim characteristic and belong to either of two classes
and
representing non-fraudulent and fraudulent classes respectively. The network characterized by weights
classified an object as belonging to
and
when the output variable
is 0 and 1 respectively. To solve this classification problem the weights
are ‘learned’. To achieve this, a training set
from 7 input vectors
whose correct classification, also referred to as the target vectors,
was known is used. This can graphically be represented as:
The sum of squared errors method is utilized to train the network where the weights were adjusted in such a way that the sum of squared errors between the output variables Y and target variable Z was minimized. The Quasi-Newton Method learning algorithm is used to minimize the error function in training the neural network as it is numerically stable and has very effective self-correcting properties which account for its superior performance in practice.
The performance of ANN in modeling automobile insurance fraud is evaluated by examining the model’s sensitivity and specificity. Here, sensitivity describes the ability of the model to correctly identify non-fraudulent claims while specificity describes the ability of the models to flag the fraudulent claims.
The confusion matrices above indicate that the fitted ANN model is able to identify the true positive rate at 94% and the true negative rate at 77%. This implies that if the model was adopted by an insurer during their claim processing, then they would be able to identify the non-fraudulent claims with at least 94% chance of being correct and the fraudulent claims with at least 77% chance of being correct.
To affirm this, the corresponding Receiver Operating Curve (ROC) was examined which resulted into 0.6310 – 0.7422 Area Under the Curve (AUC) at 95% confidence interval.
Operating in a market where fraudulent activity has seen the collapse of insurers mainly underwriting automobile insurance, general insurers are faced with the challenge of balancing between fraud risk and reputation risk as on one hand, they need to assess claims carefully to avoid exposure to fraudsters, which results to increase in the cost of premiums for the good customers while on the other hand, to maintain an excellent reputation for customer service since they need to be able to settle legitimate claims quickly. Statistical modeling provides a way to treat each case based on acquired information and deliver a better service in terms of both speed and safety by providing different channels for handling a claim based on the level of risk.
Tools required to support this exercise include: R statistical software and databases. The existing claims database suffices for this modeling but may be reinforced with the integration of the underwriting database such that the database schema provides a three dimensional view of the customer, that is, data belonging to the one customer should be similar in all databases. The proposed linking of the IMIDS system to other key stakeholders in motor insurance such as KRA and NTSA will also allow for Federated learning.