Big Data: Friend or Foe?

Big data has become an immense part of any sort of sophisticated decision-making tool for financial institutions. The importance of consolidated structured records on customer financial (and not only) behavior is difficult to overestimate as it provides companies an opportunity to make accurate business choices and stay relevant in the market.

While all the advantages of big data are quite well-known and anticipated, there are, however, certain downsides. Big data carries hidden rocks both for customers and for organizations that are just as important to acknowledge as all the advantages.

Earlier this year, the US Federal Trade Commission published a report on big data, addressing certain concerns the government and professionals have on the downsides of big data utilization for decision-making.

One of the concerns is the quality of data, its accuracy, completeness, and representativeness. Accurate decisions can’t be made on inaccurate data or data of a poor quality.

Another concern professionals express is the effect of uncorrected biases in the underlying consumer data. During quality researches, data collectors and analysts may unintentionally pass personal judgments in interpretation. Hence, the insights derived from the data will be biased and carry considerable risks for the business decisions. It may happen that the biases are in favor of certain groups of customers and put other groups in a disadvantageous position. Those mistakes may lead to inaccurate modeling and predictions about consumer behavior, pricing models, target market, etc. Human error is not a new phenomenon and unfortunately, data analysts have their own biases. It is likely that by having a personal vision of the answer in their minds, they may unintentionally manipulate the data to get the anticipated answer.

The end result of biased data handling goes beyond a particular organization. It creates disadvantageous consumer groups and discriminatory marketing and business focus.

The following are some of the concerns the professionals community and the government have expressed regarding the power of big data and business decisions powered by it:

One of the most common mistakes in big data analysis is the misinterpretation of correlations. While it is known that correlation is not causation, some may want to anyway see causation in the place where only correlation can be determined. It is extremely crucial to understand the underlying principles for any correlation to not mistakenly make an assumption about causation.

The ultimate goal of acknowledging all risks and addressing them is to prevent a wrongful categorization of the consumer data, which can lead to the exclusion of certain groups across industries. If a range of organizations steps on the same rake, it will lead to a massive discrimination of particular groups and an exclusion from the focus of interest by mistake. Those are believed to be mostly low-income and underserved populations.

Interpretation of inaccurate and biased raw data may lead to more individuals mistakenly being denied opportunities based on the actions of others, as the FTC warns. As stated in the report, big data can lead to decision-making based on the actions of others with whom consumers share some characteristics. One of the interesting examples brought up was about the credit card companies decreasing a customer’s credit limit not based on a payment history, but on analysis of other customers with a poor repayment history that had shopped at the same establishments where the customer had shopped. This homogenization results in business inefficiency when trustworthy customers are being denied of a possible credit at a low rate because they have some common shopping attributes with the non-creditworthy pool of the customers.

Low quality and biased data may be a source of discriminatingly targeted advertising of financial services. Some low-income customers who may be eligible to receive a better offer may actually never know of it or automatically denied. The reason could be incorrect targeting based on the skewed representation of the whole group as a result of certain biases towards underprivileged groups.

Private and sensitive information exposure will always be a concern when it comes to data collection. With frequent data breach cases, the customers’ concern over the exposed data that they see as very private will only grow. In fact, the example of Facebook in the report is quite frightening in terms of the power of tools organizations use to figure out extremely private information. As stated by FTC, one study combined data on Facebook likes and limited survey information to determine that researchers could accurately predict a male user’s sexual orientation 88% of the time, user’s ethnic origin 95% of time, religion (Christian or Muslim) 82% of the time, political party preference 85% of the time, etc.

Big data in hands of criminals and not-so-clean organizations may assist in targeting most vulnerable groups of population in order to scam them. In the year when lists of customers with their personal information and even medical condition records can be quite easily bought and sold by interested parties, vulnerable groups of customers (with mental issues, disabilities, the ones who reply the most to enticements, etc.) are in a particular danger to be scammed and get involved in a financial fraud.

Carefully and knowledgeably manipulated big data is a powerful tool for justification of any discriminatory decision. As we have mentioned, an interested and highly skilled analyst can get an anticipated and desired answer from any data set. It takes a little manipulation and imagination to answer the wrong question, take the wrong values to get the desired result. In a bigger picture, it gives an institution the power to make any discriminatory decision and rest assured in its legality because the data said so.

Discriminatory pricing policy across industries is another possible outcome of big data mishandling. Low-income communities are at a particular disadvantage and zip code-based pricing in online stores could be an example. As FTC reports, it may occur that consumers in poorer neighborhoods will have to pay more for online products than consumers in wealthy communities, where there is more competition from brick-and-mortar stores. In that case, poorer communities would not realize the full competition benefit of online shopping.

Last but not the least is an issue with big data expressed by professionals called weak effectiveness of consumer choice. Sometimes consumers refuse to share personal data. In those cases, companies have nothing more left than to infer the missing characteristics from the data collected from seemingly similar consumers that have also restricted the access to personal data. In that case, the accuracy of inferences is certainly not of the highest quality. Sophisticated algorithms homogenize customer characteristics when they meet the refusal wall, which creates a data of an overall lower quality.

Big data can be both a positively powerful tool and a destructive mechanism as well. It depends on the organization whether big data can facilitate inclusion or exclusion. A wide range of decisions in marketing, pricing policy, target audience choice, etc., is based on data. The way that data is collected and handled determines the outcome for consumers, communities and businesses.