The Problem With Lending
Lending
Lending is one of the most important cornerstones in any economy, especially in America, since it gives the basis for growth. Not everyone qualifies for lending based off the traditional banking system, and therefore a secondary lending market exists. In order to understand how big this market is let's take a look at the LendingClub’s historical Total Loan Issuance. Since 2007, LendingClub has issued over 20 billion dollars, which demonstrates how massive and important this sector is. Moreover, LendingClub is just one of a handful of companies that are dealing in the online lending marketspace that came in around a decade ago to fill in the gap that traditional banks were unable to fulfil.
The Risk
The main risk in issuing a loan is that the borrower will not pay the loan back. And so in online lending, the main challenge is building a model that is accurate enough to maintain profit: the losses due to errors in the model should be lower than the profits collected via interest from borrowers that do pay back. Two things are needed in order to create a strong model: (1) As much information as possible about the borrower and (2) an intelligen model to accurately predict the default rate of the potential borrower.
The Default
In the article “Default Rates at Lending Club & Prosper: When Loans Go Bad” by LendingMemo, we can see an interesting analysis of how lending companies try to maximize their 2 main parameters: min FICO score to lend, and interest rate to charge. It is interesting to see the experiments these companies have done by shifting up and down the min FICO scores, and seeing the default rates changed based off that. It seems that at the moment LendingClub has fine tuned their min FICO score to 703, and interest rate to 12%.
The Data Science
Oleh Dubno, in his paper “Predicting Defaults of Loans using Lending Club’s Loan Data”, lays out a great example of how data exploration and model-building should happen for the Lending Club data. He analyzes the data set (in his nice python notebook) and showed that annual income only predicts funded amount when the income is less than or equal to $75,000; after that, there is no correlation. Dubno then runs logistic regressions, and arrives at expected conclusions. If the loan’s grade is good, if it is a small loan, or if the borrower was employed, has a large income, or didn’t mark “OTHER” in response to “home ownership”, then the loan is more likely to be paid back. Surprisingly, the length of time that the borrower was employed doesn’t matter: as long as the borrower was employed at the time of the loan, it’ll be good. The frequency of loan-repayment also varies by state: Nebraska, Missouri, and Oregon have particularly high default rates. Dubno then normalizes his data and makes a more specific model, and finds that Mississippi, in fact, has the highest rates of default.