Can Alternative Data Be Used for Creditworthiness Assessment? A Banker, a Data Scientist & a Lender Give the Only Right Answer
September 27, 2018
Approximately 25% of the US consumers are considered thin-file because they have fewer than five items in their traditional credit histories. 7% are a no-hit (records other than the five items of the traditional framework) and 9% are completely invisible — they have no record of credit. Looking for a resistance-free entry into the financial services industry to cash in on the opportunity, technology and internet companies are particularly interested in those edge cases. With the abundance of digitized behavioral data available to those companies today — social media, search, mobile money, bills, shopping history, rent, etc., — anyone can lend. Facebook alone targets ads based on 98 data points on every user. The number of indicators FinTech startups in the alternative credit scoring space varies from several hundred to hundreds of thousands.
However, not all alternative data is born equal.
Alternative data sources vary significantly in their ability to accurately assess one’s creditworthiness/predict the likelihood of someone defaulting. Moreover, the sources of alternative data vary in relevance depending on the goal — one would look at very different pieces of data with a varying level of trust for assessing someone previously unknown to the formal financial system, and for marginally improving the accuracy of a credit score for someone with limited data.
A day ago, I was discussing this very matter in a panel on alternative credit scoring at FinovateFall in NYC with a Dipanjan Das (Capital One), Aristotle Socrates (Juvo), and Houman Motaharian (LendingPoint), where refreshing honesty on the use of alternative data in creditworthiness assessment led us to three important considerations.
What is the definition?
Alternative sources of data constitute a fairly long list that can be grouped roughly into two types — soft data and hard data.
Soft data sources include the hallmarks of social behavior for individuals. For businesses, the closest resemblance would be corporate culture, the way a business owner cares for the property, inventory management, etc.
Hard data sources always come down to finances and how one (whether a business or an individual) behaves with money.
For financial institutions to trust someone with money, hard data has a far larger meaning than soft data — social media data, for example. Experian, for one, talks about alternative data in the following terms:
Mobile phone payments
Cable TV payments
Bank account information, such as deposits, withdrawals or transfers
Small dollar loans
Every one of those alternative sources has to do with money management. Sharing the results of its first-ever report on lender and borrower perceptions about using alternative data for credit decisions, Experian revealed that 80% of lenders rely on a credit report plus additional information when making a credit decision. But here is what’s more interesting — more than 50% of consumers believe that including items like their utility or mobile phone payment history would have a positive effect on their credit score.
Basically, more than 50% of consumers would want to expand the hard data points that lenders are considering. This is surprisingly aligned with how an institution thinks about alternative data — in terms of ongoing, consistent financial behavior, and responsibility. The agency found that if given a choice, many consumers would prefer that alternative credit data sources, such as utility bill payment history (48%), savings/checking account transactions (39%), and mobile phone payment history (38%) be evaluated in their credit history. Every one of the most preferred sources is financial data.
The main characteristics of a good source of alternative data that some professionals distinguish, include:
Coverage: a new data source will ideally have broad and consistent coverage (e.g., over 90% of US adults use a cell phone, and the market is concentrated so data collection would be easy to achieve; ~40% of US adults pay rent, but this is a low concentration market and so the data are expensive to collect).
Specificity: a data source should ideally contain detailed data elements about an individual — data elements that provide part of a full picture of the borrower (e.g., on time and late payments over a significant time series, or specific asset or income data); some data sources are based on ‘segment data’ or ‘modeled data’ and are typically less predictive than consumer-specific sources.
Accuracy and timeliness: data should be accurate and frequently updated; a data source should have a system for ongoing data verification and management.
Predictive power (‘signal’): most important, data should contain information relevant to the behavior that you’re trying to predict.
Orthogonality: ideally, the data source should be additive to traditional bureau data; this means that using it will improve the predictive accuracy of any new score by improving the signal-to-noise ratio.
Regulatory compliance: data sources must comply with existing regulations for consumer credit (i.e., Fair Credit Reporting Act, Equal Credit Opportunity Act, Gramm-Leach-Bliley Act).
Those characteristics are more likely to be found in hard data.
What of soft data sources? This leads us to the second important consideration.
What is the target market and the goal?
There is ONLY ONE RIGHT ANSWER to the question of whether alternative data can be used for creditworthiness assessment or not — IT DEPENDS.
Let’s take social media.
Experian asks: Can banks, credit unions, and online lenders look at social media profiles when making a loan decision and garner intel to help them make a credit decision?
Experian answers: In the case of business credit, YES. On the consumer side, NO.
To address the consumer NO first, there is a very trivial answer to close down any discussions about the use of famed social media data for alternative credit scoring frameworks in consumer lending:
The Equal Credit Opportunity Act, which states that credit must be extended to all creditworthy applicants regardless of race, religion, gender, marital status, age, and other personal characteristics. Social media profiles can check every one of those boxes, making this data unusable.
Social media data can be manipulated.
FCRA requires credit data to be displayable and disputable. Social media can’t address those needs.
The situation is much different for business lending because the state of social media channels for a business indicates customer engagement-related performance, which, ultimately, can point to (potential) financial performance. Consumers are increasingly engaging with businesses through the means of social media messaging. More so since chatbots became ubiquitous. The regulatory standards are also different for businesses — the FCRA does not apply to business lending.
Alternative data could be useful in providing a clearer picture on small business relationships with the community of its customers/potential customers. Any online rankings on major platforms like TripAdvisor, Yelp, etc., lead to sales, making social media performance an important and relevant indicator of where the business stands with its customers.
There is another side to a conversation about the goal. The use of alternative data in creditworthiness assessment has varying relevance in edge cases — with invisible individuals and businesses, and with those with limited records.
The greatest opportunity is in creating identities for those who are deemed invisible, while for those with full record, alternative data cannot play a role any more meaningful than to simply sharpen the image in a non-deterministic way. That category can be accurate enough for institutions described using current frameworks.
Since the catch-22 of credit is that to borrow, you need a score; but to generate a score, you need to have borrowed before, alternative data (financial) could be seen as a solution to breaking this conundrum for edge cases. For those with records or limited records, traditional scoring frameworks are seen as powerful enough by financial institutions.
What is next?
Any breaks in the use of alternative data to improve the accuracy of existing frameworks or to redefine the very framework are heavily skewed towards enhancing only one function — loan origination process. Financial institutions, technology, internet companies, and startups are focused on ensuring the fastest, cheapest, most efficient and accurate decision-making process and delivery of funds to the borrower.
Loan origination, however, is only one piece of the puzzle, with the other two, more important ones, being repayment, and repeat (up)sell.
Traditionally, industry-defining players like Experian, TransUnion, and now much-conflicted Equifax calculate the credit scores based on a person’s historical financial and repayment data. And there is a reason for it — with the abundance of available options, repayment (or default rate) is the key measurement of a successful framework. More importantly, non-harmful in the long term for the borrower repayment. Past financial behavior has always been and will remain very powerful in its accuracy of creditworthiness assessment.
Any model that is focused on leveraging alternative data to evaluate the likelihood of default becomes useless when the evaluation turns out to be incorrect. Once the hit defaults, all that will matter in the future will be that fact of default, and not the shining score build of 40,000 pieces of data.
In conversations about the use of alternative data sources, the focus has to shift towards responsible lending and sustainable recovery of funds. Consumers in the US, for example, are heavy users of credit. Consumer debt, including personal loans, real-estate-secured loans, auto loans, credit cards, and student loans, totals over $12 trillion. That is not the number lending startups should be using as a highlight to justify their existence. That number represents an opportunity for companies operating in the lending space to focus on practices ensuring sustainable and responsible recovery of funds.
For ~4.5 billion people globally — a majority of them from low and middle income emerging countries — with no credit or repayment data available, alternative frameworks may become a way into the formal financial system. But those frameworks will need to continuously evolve to take into account the most important stage — repayment history.