Machines Can Identify Income Fraud Without Verification

Income Fraud is one of the most common types of fraud that impacts lenders. People lie about their income in order to qualify for the loan.

As a case in point,  1 in 5 borrower’s admitted in a survey that they misrepresented their income on their application for a car loan.

In Australia, 1 in 6 home buyers admitted to lying about how much money they made on their mortgage application.

And in a recent study, online lender, The Lending Club believes about 11% of borrowers misstate their incomes on online lending applications.

Yes, it is common. But it also really hard to tell if someone is lying about their income.   Think about it.  You would probably have a hard time guessing the income of a member of your own family!  And you have known them your whole life!

How can a lender know someone is lying when they have no idea who the person is?  Most applications have limited information on the consumers which put lenders in a difficult situation.

Income Fraud Impacts Loan Performance

The mortgage industry learned this the hard way.  Income Fraud matters.  If a borrower lies about their income – which they often do – it impacts the performance of the loan.

Fraud rates in the mortgage industry had soared to over 100 basis points by 2007.  Lying on mortgage applications was very common but it ultimately lead to the collapse of the mortgage industry in epic fashion.

In 2007, BasePoint Analytics determined that borrower that lied on their applications about income, employment or occupancy were much more likely to default. In fact, up to 70% of loans that defaulted within the first year, had lies on their application.

And the mortgage industry isn’t the only industry that has found that income fraud impacts performance, auto, online and personal loans have found the same phenomenon to be true.

The Current Approach – Verify Everything with the IRS

The mortgage industry responded to income fraud by requiring a 4506T verification on every single mortgage application.  A 4506T verification involves the borrower signing a permission slip to let the lender get their tax returns for the last 2 years.

A 4506T verification involves the borrower signing a permission slip to let the lender get their tax returns for the last 2 years.  After the lender gets the signed form, they transmit it to the IRS and wait 3 days to get the results back.

How one service processes a 4506T request

When they get the tax return back, they analyze the income to make sure it matches to the application.

The problem with this approach?  It’s costly.  It’s time-consuming.  It’s invasive.  It’s horribly slow.  Oh, and one big thing – people are notorious for lying about their income to the IRS!

Even with all these issues, 4506T has helped to crack down on income fraud. Other industries have followed suit and begun adopting this “straight to the IRS approach of verifying income.

Other Approaches to Income Verification – Paystubs, Bank Statements, Tax Letters

In addition to verifying income with tax returns, lenders are increasingly turning to other verification approaches

Bank Statements – Lenders request up to 2 years of bank statements from borrowers to verify deposits, withdrawals, assets and average daily balances.

Paystubs – Lenders can request the last 2 paystubs to verify how much borrowers recently got paid from your employer.

W2’s – Lenders request copies of your year-end W’2’s from borrowers to see how much they got paid the year before.

Tax Preparer Letter – Self Employed borrowers are often asked to provide a tax preparer letter which explains how their income was sourced and how much it was.

The only problem with these approaches?  They are rife with fraud and forgery.  Paystubs, Bank Statements and W2’s are ridiculously easy to forge over the internet and you can find quite a few shady tax preparers who are happy to write anything on a letter to a lender.

Here is a fraudulent paystub I created in 2 minutes off the internet.  It’s probably good enough to get a loan with. But it is fraud.  It is a forgery.

Face it.  Income verification is no silver bullet.  But it is effective in many cases and that is why lenders rely on it.

Where Statistical Analysis Comes into Play

Have you read the book, Freakonomics?  There is an interesting chapter where they show how easy it is to catch cheaters by analyzing data.  You see, cheaters and liars can never make their data follow a normal distribution.   Cheating will always show itself in the data if you look closely enough.


In this example, a high school maturity exam in Poland (courtesy of reader Artur Janc), comes this histogram showing the distribution of scores for the required Polish language test, which is the only subject that all students are required to take, and pass.

This is not a normal distribution as you can tell. The dip and spike that occurs at around 21 points just happens to coincide with the cut-off score for passing the exam.   You can tell by looking at the chart that there is some sort of bias being introduced by the teacher, or the exam scorer or the students themselves to give some students just enough to pass the test.

This same principal of normal distribution and outlier analysis can come into play for analyzing income fraud at the application stage.   By looking for abnormal distributions or income that deviates from normal expectations, you can quickly find the income cheaters.

How Machines Can Spot The Dimensions of Income Fraud

There are several ways, Machines can help identify the patterns of income fraud in applications.

The key to spotting income fraud is not to look at a single dimension of income, but rather 5 or more dimensions.   By looking at multiple dimensions, machines can improve the accuracy of finding what is out of pattern for a borrower.

Here are a few examples of dimensions that can be analyzed through machine learning and pattern recognition to find income fraud.

  1. Income by Geographic Area – Income analysis by 3 digits, 5 digits, 9 digit zip code can help identify normal patterns.  Income ranges can generally be predicted based on the zip code a borrower lives in.  By looking for patterns of borrowers that significantly exceed a given zip code, you can begin to narrow in on the fraudsters.
  2. Income by Occupation – Every occupation has a normal distribution of average incomes. By analyzing the occupation that a borrower reports and matching it to income ranges, you can quickly tell which borrower might be lying.
  3. Income by Employer and Job Title –  Large employers have defined income averages based on the job title.  By analyzing the employer and reported income, you can detect which borrowers are out of range.
  4. Income by Age – Income generally increases as borrowers age. By analyzing the average borrower income to age range, the outliers will become evident.
  5. Income by Property Value –  Property Value averages are a good indicator of income ranges. Typically, areas with higher property values will also have higher income ranges.  By analyzing the ratio between property value to income you can find expected income ranges.
  6. Income by Credit Profile – The credit score, # of Tradelines, Time in file can be profiled for expected income ranges to find which borrowers are out of range.
  7. Income by Reported Assets –   Many applications require borrowers to report their assets.  Higher incomes should be matched with higher assets, when income and asset ratios significantly deviate, it may indicate either income or asset fraud.
  8. Income to Key Ratios – High incomes coupled with high key ratios (DTI, PTI, LTV) are often indicators for borrowers, brokers or dealers overfitting borrowers to help them qualify for a loan.

Can Machines Effectively Identify Income Fraud?

The answer is easy.  Absolutely they can, and they do.  We use sophisticated income analysis to help spot income fraud in auto, mortgage and retail applications for credit.

The key to success is to evaluate the income level risk in the context of the overall risk of an application.

Income fraud is typically a means to an end, it is not the end.  Evaluating a misrepresented income in isolation of the risk of an application might be meaningless.  For example, a loan with an LTV of 15% collateralized against a home may have almost zero chance of loss, so how important is the income in calculating the risk of that application.

The Lending Club Case Study Shows Promise

Croudify presented some interesting analysis that proved that machines can help identify income fraud that matters without necessarily verifying income with documentation on every single loan.

The Lending Club was being criticized for not verifying income on 100% of applications.  Their analysis proved that the Lending Club was in fact, doing a good job of verifying which applications to verify income and which were not necessary.

Their first finding was that the Lending Club was, in fact, verifying more incomes, NOT Less over time.

And secondly that their default rates were lower for those applications where the income was not verified.

How to Use Machine Learning Models

Machine learning models are meant to be used in combination with hard verification techniques. I would never recommend using models alone.  The models are only used to identify those applications where hard verification is recommended.

When the models indicate the risk of income fraud is not material in changing the performance of the loan, in those cases the income verification can be deprioritized.

It’s a risk based approach which always works the best.

I hope you enjoyed the article. Thanks for reading!