Big Data, Digital Footprints, and Credit: Looking To The Future

People seem to view their credit score as sacred; they’ve cultivated the number for years, check its progress every few months, and pray it doesn’t dip too low. If you want to own a home, qualify for a good home insurance rate, finance a new car, or even sign an apartment lease, there’s a good chance the score serves as a gatekeeper to these experiences.

This is because companies need to quantify client risk. And, credit scores have served as a solid predictive model for decades.

But a new study by the National Bureau of Economic Research has found that credit scores aren’t the only indicator of credit risk – your digital footprint might work as well. The research paper called On the Rise of FinTechs Credit Scoring using Digital Footprints by Tobias Berg, Valentin Burg, Ana Gombovic, and Manju Puri tested a different approach: instead of the using credit bureau information to predict consumer credit risk, they used digital footprints.

Digital footprint variables are highly valuable for default prediction

Researchers studied a sample of over 270,000 purchases from a German e-commerce company that allows its customers to buy furniture and pay for it at a later date. The transaction works a bit different than your average furniture sale, however. The company analyzes the customers’ digital footprint and uses traditional credit scores in order to determine creditworthiness.

The researchers decided to look more closely at several facets of a consumer’s digital footprint. These variables that contain information stored in cookies, browsers, and online registration. The computer/browser data include device type, operating system, and browser type; while cookies contain the channel through which the customer visits the web page, time spent on page, and check out time. From the registration footprint, researchers used the email address, the name of email provider, and information about the customer’s writing.

Besides the digital footprint variable, the dataset contained data from a private credit bureau who compiles a score similar to a FICO score in the United States.

The results were significant. The authors claim that digital footprint variables are highly valuable for prediction of the defaults.

They found that the difference in default rates between customers using iOS and Android is equivalent to the difference in default rates between a median FICO score and the 80th percentile of the FICO score.

Also, customers who use internet and phone service providers that provide services to affluent customers are about half as likely to default as the average customer; this is equivalent to the difference between a mean FICO score and a FICO score at the 10th percentile.

Email addresses are also useful in predicting defaults as customers with their names in the email address are 20% less likely to default. This is similar to the discovery of Belenzon, Chatterji, and Daley (2017), who claim that companies that carry their founder’s name have a superior performance.

The customer’s behavior during the registration process and type of websites that brought them to the registration page are also significantly related to defaults. Those coming through targeted ads were more likely to default than those who visited from price comparison websites.

A model that takes into account all variables of digital footprints outperforms the information content of credit bureau scores

A model which uses all variables of digital footprints outperforms the information content of credit bureau scores. The area-under-curve of such model is 69.6%, which is significantly higher than the area-under-curve of 68.3% of the FICO score.

The FICO score is not available for unscorable customers, and in the dataset used for the study, ~15,591 observations were unscorable. It turned out the digital footprint variables for unscorable customers are more useful than for scorable customers as the area-under-curve for unscorable customers is 74.4% vs. 69.7% for scorable.

The authors find their results remarkable because they use simple and easily available information, but also because the results are robust to out-of-sample tests and therefore not driven by over-fitting in the sample. They are also robust to various default definitions and to various sample splits.

What’s next

In a way, the results are completely unsurprising. A person’s digital footprint exposes their level of affluence through their browser and mobile device type. The richer a person is, the easier it will be for them to make monthly payments. So what?

But to me, it comes down to two questions:

1. What e-commerce or FinTech companies will decide to do with this data.

Outside of making credit risk determinations for their own companies, there’s no question that it’s incredibly valuable to other businesses. It could allow other e-commerce companies to bypass credit report pulls and other bureaucratic processes in order to sell or finance their products.

2. These findings are clearly a win for business, but is it a win for the consumer? I would say that it depends on which consumer you ask.

The credit score determination process is already fraught with socioeconomic and racial bias. And, because a digital footprint is tied to shallow affluence indicators, I don’t see how using it overcomes this bias for those with already low credit scores.

The authors do make the argument that this may help the 2 billion adults worldwide without access to any credit or formal financial services, mentioning that it will foster financial inclusion and lower inequality. This statement seems like a bit of a reach; those without access to credit may not have the ability to optimize their digital footprints.

But with this information, those with subprime credit scores or no credit history may be able to change their digital behavior in order to overcome credit denials. It’s possible that a simple switch from an Android browser to an iOS browser leads to more approvals and lower interest rates. That’s a pretty simple switch. However, over time, this could turn into a rat race between the consumer with no clear winner in sight.