why this article?
A journalist recently asked me to comment on the feasibility of a conspiracy theory involving one of Facebook’s AI algorithms. He wanted to know whether it was likely, or even possible, that Facebook was using its existing algorithm for suicide video detection to screen and censor conservative media sources. To answer the question meaningfully, I found I needed to spend an hour educating this journalist about AI in general, just to give him enough background information to understand my assessment of the conspiracy theory. In doing so I equipped him to deal with future AI-related investigations, which he and his colleagues will most certainly encounter with increasing frequency in the future as AI expands its interaction with our daily lives.
From the encounter described above, I realized that journalists will soon face a growing need to explain and editorialize about AI-related social concerns of all types. I concluded that most journalists are currently ill-equipped for the task. To remedy the situation, I composed this primer to assist journalists’ acquisition of sufficient knowledge about the subject to assist them at what they do best: Guiding the public discourse regarding pertinent issues of the day.
And I’m not the only one saying journalists need information on the state of AI technology and how it interacts with daily life; famed AI pioneer Fei-Fei Li recently offered the following quotation to WIRED:
“I also have really high hopes that AI literacy is more prevalent—starting with journalists but also with policy makers, teachers, civic society. This is not a professor wanting everybody to know how to code; it’s about more people participating in the guidance of AI”  (Emphasis mine).
The fact is, as citizens and as economic actors in the Global North, we now consume the output of AI algorithms daily. While (non-AI) algorithms and computer-moderated experiences have been with us for quite some time, the practical, commercial use of AI at today’s scale is extremely new—emerging only in the last seven or eight years. To gain a handle on the resulting social consequences, we need laypeople who understand AI enough to make well-informed decisions about it. Journalists find themselves in the position to reach laypeople en masse concerning these matters, and I hope this article assists them with the task.
Here are the key matters to remember from this article, each described in more detail in the text that follows:
- We are only about ten years into full-blown AI revolution, the widespread use of AI to make commercial and governmental decisions impacting large numbers of individuals. We therefore need great journalists to guide us through this transition!
- Ill social consequences, including death and inappropriately long prison terms, have resulted from AI’s use.
- AI and machine learning technology strives to mimic intelligence, whether human intelligence or a kind of intelligence that is very distinct from that of humans.
- An AI process only creates an approximate model of the real-world problem domain it seeks to make decisions about. The limits of approximation therefore attenuate its accuracy.
- An AI solution may perform well for a very specific problem—the one it is trained to solve. However, it will not generalize effectively to other tasks.
- Bias in the data sets used to train AI algorithms leads to bias in their output.
- AI algorithms prove difficult to audit.
- We don’t know the consequences AI use might impose on society, e.g., possible technological unemployment.
AI failures abound
At immediate issue for journalists is that AI utilization has created substantial problems at times, stories that journalists will communicate to the public and editorialize about. A few examples:
- When AI-based facial recognition technology is used to control crime, mistaken identity causes arrests and/or public shaming of the innocent [18, 19].
- An Uber self-driving car recently killed a pedestrian [18, 20].
- A recent AI-driven Amazon employee recruiting algorithm produced measurable gender bias [18, 21].
Certainly, a human can make such errors too, but most societies have established protocols in place for dealing with human error. We do not have such protocols established for AI-related error—a worldwide conversation is required about this matter. Journalists will prove critical to that conversation.
what is artificial intelligence and machine learning?
Artificial intelligence (AI) broadly refers to the research toward and the engineering practice of making computers mimic intelligence. Here I leave “intelligence” loosely defined; it can mean mimicking human information and decision-making abilities , or it can refer to the development of intelligence highly divergent from human thinking . But these definitions start with the assumption that computers are not, in and of themselves, intelligent. By that I mean that to be useful they must be “programmed”, i.e., instructed in some way by a human.
Digging deeper into this matter of instruction: For most of computing history, humans wrote “programs” that directed the data-processing and decision-making of computers. Software engineers applied programming languages such as C++ or Python to detail every action a computer running a given program would make. These detailed instructions proved very explicit (e.g., IF user types “Hello” and then presses the enter key, THEN PRINT “Hello back to you!” on the screen).
However, this activity does not scale well. Writing explicit instructions to account for every possible input and every possible decision is intractable, even for the best software engineering teams. So, AI researchers devised two basic frameworks in response: “Expert systems” and “machine learning” (ML). Both emerged about the same time in academia, but the latter only became commercially practical in recent years. This article will focus primarily on machine learning, but here I briefly explain expert systems as a starting point:
Expert systems attempt to address the challenge of explicitly scripting every decision made from given inputs to a program. They proliferated in the 1980’s  but have now largely fallen by the wayside. A “knowledge engineer” would enter data into an expert system after interviewing an expert, say a highly-specialized medical doctor. A large body of facts would be collected manually—thus the problem of accounting for every possible input remains—and then a “reasoning engine” (itself an algorithm) would process the facts to draw conclusions without the need to encode the specific reasoning steps impacted by each datum. In this manner, they were said to mimic the intelligence of an expert. Expert systems still remain in use—I’ve created two of them in the last year alone, one for medical reasoning and one for fashion recommendation—but AI research and practice has largely shifted to the second framework: Machine learning.
Machine learning attempts to address the problem of accounting for each possible input or combination of inputs, so that software engineers no longer have to. Essentially, engineers “train” an algorithm to produce desired outputs from a large set of known inputs, where a known outcome usually accompanies each provided known input to assist the training. For example, an ML algorithm intended to detect the presence of a skateboard in a video will be shown tens to hundreds of thousands of videos, some with and some without skateboards. In the most common case, each of these videos will be “labeled” as either having a skateboard in it or not having a skateboard in it. The ML training procedure will process these inputs and “learn” how to detect skateboards in future unseen videos. In this scenario, engineers become more like “teachers” than designers of explicit rules (more on this later).
To put it mildly, machine learning has rapidly taken the world by storm. ML algorithms drive Google’s search results , Netflix’s movie recommendations , and Facebook’s filters . Large corporations use ML to filter resumes and monitor employees . Some jurisdictions use ML in criminal sentencing procedures . China applies ML-based facial recognition to oppress the Uyghur minority . For my part, I’ve used ML for molecule design , currency trading , political bias detection , and to provide real-time feedback to musicians regarding audience enthusiasm .
The rest of this article will specifically focus on machine learning and use the term interchangeably with artificial intelligence, as is common practice today.
what is a “model”?
AI practitioners often refer to the algorithms they create as “models”. Here is why:
A model is an approximation of a real-world object made to assist the study of or decision-making about that object. The modeled object might be physical, e.g., a model airplane being tested in a wind tunnel, a process, e.g., a map of a supply chain’s dynamics, or even of an idea, for example a “back of the envelop” drawing of a business model on a napkin during a business lunch. Models may also be mathematical, as in a set of one or more equations. For example, “E = mc2” is a mathematical model describing the physical relationship between energy and mass.
More to the point, machine learning algorithms are mathematical models—an equation (or set of equations) approximating the real-world relationships between the input and outputs the algorithm was trained upon. The training process “discovers” these relationships and encodes them into the equations’ parameters. Running the trained ML model on new inputs then assists or even replaces real-world decision-making by humans.
and here we reach the first problem…
I used the term “approximating” intentionally in the last paragraph. Models are only approximations of the objects they reflect, not the real thing. Therefore, some detail is lost, and conclusions made based a model may not adequately reflect reality. For example, a model airplane in a wind tunnel might not have every rivet and seam of the real airplane it mirrors carved into it. Therefore, the microfluidic effects of these unmodeled rivets and seams (which can add up to significant drag in real life) remain undiscovered.
Similarly, my ML models of the currency exchange market do not take into account every possible factor that impacts price fluctuation. To illustrate, I do not (yet) include the impact of central bank statements. Nor can I include political events; my models lost money at the initiation of the recent trade war between the United States and China. The predictive consequence is that the algorithm sometimes forecasts one price direction when reality turns the other way.
So fine, I lost some money. Not a big deal. But consider that similar models decide socially important matters such as how long a given criminal’s prison sentence lasts. Or whether you (and a whole group of people—similar to you in some way that you may or may not know about) get a job you are qualified for. Or assess the state of your mother’s cancer. Serious decisions are made using these “approximate” models.
It follows that an imperative question faced by journalists, policy makers, and philosophers regarding the use of AI is “How often is being wrong acceptable?”. Certainly, it depends on the application: If my currency predictions are correct only 60% of the time, I still make money over a large number of trades. But how much accuracy is socially tolerable for a self-driving car’s pedestrian detection algorithm? 99.9%? 99.99999%? In the case of an AI-based cancer diagnosis, you can get a second opinion. But a self-driving car might kill someone!
And even if self-driving cars do occasionally run over pedestrians, one must evaluate how their performance in pedestrian avoidance compares to human drivers, with an eye toward deciding whether human or artificial drivers are a better idea overall. Questions like these, and their policy consequences, require social discussion. Journalists serve a critical role in these discussions.
false positives and false negatives
More formally, a procedure of any kind (whether a court case, a medical test, or a machine learning algorithm) that selects between two possibilities (e.g., guilty vs. not guilty, infected vs. not infected, skateboard present in video vs. skateboard not present in video) is called a “classifier”. Technically, classifiers can involve more than two outcomes, but to define the following key terms, we’ll restrict the conversation to two-way classifiers:
When a classifier says a situation is true (guilty, infected, skateboard present), and that situation is actually false, we call the conclusion a “false positive”. Similarly, a true situation declared as false is denoted a “false negative”. Designers of AI-based classification algorithms seek to reduce the number of these false positives and false negatives, just like the criminal justice system seeks to reduce the number of inaccurate verdicts. But, as discussed above, inaccurate conclusions remain and we as a society must decide what false conclusion rates we’ll accept for given applications.
To illustrate the social consequences in a real-world situation, a 2015 Google image classifier designed to recognize gorillas in photos delivered false positives declaring that black people in images were gorillas . In another well-known situation , AI is used in some jurisdictions to predict whether a convicted criminal will reoffend. False positives delivered by such an algorithm might lead to longer jail sentences than appropriate. In the case I introduced this article with, responding to a conspiracy theory that Facebook was using its suicide video detection algorithm to censor right-wing videos, I concluded that most likely scenario is that the suicide video detection algorithm generated a false positive for the particular censored right-wing video that spawned the conspiracy.
AI is “brittle”
Continuing discussion of Facebook’s suicide video detection algorithm:
AI is far less capable than the public generally thinks it is. The most confusion regards AI’s perceived generalizability. In other words, folks tend to think an AI algorithm trained to perform a task can easily adapt to performing a related task. This is simply not the case, and therefore we call AI algorithms “brittle”, meaning they cannot adapt to changing conditions. (Increasing the generalizability of AI is an area of active research).
Regarding Facebook’s suicide video detection algorithm, due to the technology’s lack of generalizability it simply cannot be expected to accurately decide whether videos show far-right extremist content. If Facebook wanted such an algorithm, and they likely have one, they’d have to train it from scratch using a completely different set of training videos. And then that algorithm, once trained, could not be used to detect ISIS propaganda—they’d need to start from scratch yet again.
you can only predict based on what you’ve seen before
When I first applied for credit, long before modern AI entered the scene, I was denied simply for not having any credit beforehand. The card issuer’s algorithm—likely based on traditional statistical methods—knew only how to process applicants with previous credit histories, because that is most likely what the programmers considered when creating it.
The same issue remains in today’s far more effective machine learning techniques: They can only approximate the relationships between data and outcomes based on input they have seen during the training process. When a given input data scenario lies far enough outside that of the training domain, the algorithm is helpless. Brittle.
and here we reach the second major problem: training biases
When discussing the fact that machine learning models are only good within the bounds of the data they were trained upon (e.g., the credit model I described above not knowing what to do with applicants with zero credit history), we realize that there might be biases in the selection of the data used. For instance, for the Google algorithm that erroneously classified black individuals as gorillas, it is likely that few pictures of black people—relative to white people—were used as input during the training procedure. The resulting effect reminds me of the “all black people look the same” attitude that still sends many innocents to prison in the United States, only more “precise” in its potential inaccuracy.
I referred to the trainers of ML algorithms as “teachers” above. All teachers transmit their biases to their students, whether they like it or not. It has even been proven that software engineers propagate their social biases in their supposedly neutral code .
The mathematics involved in most machine learning techniques are usually high-dimensional, meaning a trained model might have thousands or tens of thousands of numbers associated with it. Therefore, it proves intractable for a human to audit the model; to determine exactly what relationships within it drive which behavior. This is a problem in situations like self-driving cars where, supposing such a car hit a pedestrian, we would expect the authorities to investigate the collision’s cause in the same manner they investigate airplane crashes. However, the mechanical relationships in an airplane, while complex, can be clearly traced to a chain of causes and effects in most situations. However, due to the opaque nature of most ML models, this kind of assessment would prove inaccessible in the case of a self-driving car’s pedestrian detection algorithm.
Improving the accountability and auditability of machine learning techniques remains an area of active academic research.
unknowns that society must soon cope with
Two large AI-related unknowns loom on the horizon: Technological unemployment and the impact of ML algorithm “cross-talk”. As these situations come into focus, journalists will find themselves describing the situation.
The specter of technological unemployment, the mass and rapid dismissal of human workers due to automation (and the social consequences thereof), has haunted the West since the Industrial Revolution. However, each inflection point in the development of today’s industrial society created sufficient new jobs to replace those lost. It remains to be seen whether this trend holds with AI-driven automation. In the past, automation replaced simple tasks requiring little human brainpower to execute, leaving humans to perform more complicated reasoning and creative activities. Now careers requiring more training and intellect stand at risk; suppose the highly-educated middle class, not just the proletariat, were to find themselves unemployed in large numbers? What happens to our political, economic, and social stability in such a case?
The second unknown might be thought of in terms of an “AI ecology” . With thousands of ML algorithms modulating our daily experience, cross-talk between the algorithms is bound to occur, and in ways impossible to predict. As a simple example, when I train my currency trading algorithms from historical price data, I indirectly include the net effects of all other ML algorithms that contributed to that historical price data (algorithms owned by other traders). And when my algorithm executes a trade, their algorithms respond to my signal. The effect could be moderating or could snowball into something disastrous. We simply don’t know.
Therefore, I think of the whole production AI operating space as an ecology: The algorithms interact with each other directly and through us in ways analogous to ecological concepts. The real power lies not then in who can make the “best” or “right” algorithm (though that helps), but in who can better “bias” this complex interactive ecology toward their ends.
only ten years into this revolution
We (society) are only about ten years into this revolution; before that time, the computational demands required by AI exceeded what off-the-shelf silicon could push, and therefore AI remained in the laboratory. Now we inhabit a world where AI can economically be deployed by novice engineers to production environments toward the production of significant business gains. The result may prove as radical a social change as the introduction of the internet.
It will take great journalists to guide us all through this upheaval.
- Robert Epstein and Emily Williams. 2019. Evidence of systematic political bias in online search results in the 10 days leading up to the 2018 U.S. midterm elections. Western Psychological Association 99th Annual Convention, Pasadena, CA, 26 April 2019.