Can Data Catalogs Help Deliver Greater Algorithm Transparency?

A very interesting article appeared in Fast Company last month on how watchdog groups are trying to lift the veil on the AI-driven algorithms that may be used with good intentions but could be threatening the future of civil rights. (Can data catalogs help? Read on and see…)

The article discusses a case in which the New Orleans police department deployed a “predictive policing system” that could mine citizens’ data and flag potential criminal activity. Nobody outside the department was told, and so the legality of such a system could never be questioned.

As I discussed in my 2019 Predictions for Big Data in the Enterprise, the more businesses and governments rely on AI and ML algorithms to make automated decisions that impact citizens and the public at large, the more there will be a need for greater transparency and explainability.

The Long Arm of the Algorithm

For instance, why was a mortgage denied? Can the bank prove no illegal demographics (like race and gender) were used to make the decision or train the model that made the decision? How about algorithms, as the Fast Company article mentions, that help universities determine without any human oversight which of the many thousands of student applications deserve consideration and which don’t? Or those that help determine government benefits?

Most people don’t even know these systems exist. While watchdog groups are trying to bring greater transparency and accountability to light, as the article explains, non-disclosure, trade secrets, intellectual property and other laws (like 1984’s ironically timed Computer Fraud & Abuse Act) make it difficult for them to figure out where and how these technologies are being used.

Help is on the Way

Some are calling for the creation of an FDA-like agency that requires AI vendors to share with the agency the algorithms they create that impact people’s “health, safety and liberty.” Since state and federal laws on the matter are probably far off, cities like New York and Berkeley are starting to pass algorithm transparency laws themselves, while watchdog groups like the ACLU of Idaho have started taking organizations to court, arguing that hidden code tasked with determining the fates of citizens violates due process.

While I believe AI-driven algorithms are generally being put in place to serve the public, you still need to know these algorithms are doing a good job. Some have been shown to make arbitrary decisions despite being fed voluminous data. Others have been shown to reflect bias. (Our recent Women in Data Dinner & Forum, which we’ll be blogging about later, addressed this.) Still others may be completing a task in a way that some may agree with and others won’t, therefore warranting public debate and full disclosure on the interests that are being served. Preventing algorithm accountability is harmful not only to the public, civil rights and due process, but potentially to the organizations and vendors themselves as employees begin feeling more emboldened to report their employers over unethical use of code.

Can Companies Police Themselves?

To be sure, one Berkeley professor quoted in the Fast Company piece says transparency must come from inside the companies themselves. It behooves them to do so. (Employees at several companies have proven corporations can find guidance in a people-driven conscience.) I’m proud to say that Waterline Data can actually help support this effort. Since a data catalog helps find appropriate data sets and document their lineage and quality, it helps to create the first step toward establishing greater transparency and explainability. After all, if a company can’t show where data came from or what it means, how can they explain the model or ensure that it’s legal?

Be sure to check out this article and keep an eye on the issue in general as it’s bound to get more attention while creating some interesting debate.