What is artificial intelligence (AI)?

Artificial intelligence refers to the theory and development of computer systems that simulate human intelligence to make decisions and perform tasks.

Artificial intelligence explained

Artificial Intelligence (AI) or machine intelligence (MI) is defined by Techopedia as “a branch of computer science that focuses on building and managing technology that can learn to autonomously make decisions and carry out actions on behalf of a human being”.¹However, this definition is far too general and cannot be used as a blanket definition for understanding what AI technology encompasses. AI isn’t one type of technology, it's a broad term that can be applied to a myriad of hardware or software technologies which are often leveraged in support of machine learning (ML), natural language processing (NLP), natural language understanding (NLU), and computer vision (CV).

Oracle Cloud Infrastructure states that “In the simplest terms, AI [refers] to systems or machines that mimic human intelligence to perform tasks and can iteratively improve themselves based on the information they collect”² and that “AI manifests in a number of forms”.²

Compared to Techopedia’s and Oracle’s broad and general definitions, IBM’s definition of AI is more specific. IBM states that AI “leverages computers and machines to mimic the problem-solving and decision-making capabilities of the human mind”.³ Focusing on these ideas of “creating intelligence” and of “machines understanding human intelligence”, IBM continues with their definition of AI by citing John McCarthy’s eponymous article, What is Artificial Intelligence?. McCarthy states that “[AI] is science and engineering of making intelligent machines, especially intelligent computer programs. [AI is similar to] using computers to understand human intelligence, but AI does not have to confine itself to methods that are biologically observable".⁴

Stretching back about 55 years before McCarthy’s paper, IBM expands their exploration of AI and references mathematician, cryptanalyst, and the iconic “father of computer science” Alan Turing. Founder of the eponymous Turing test that’s meant to distinguish between a human and a computer, Turing’s 1950 paper Computing Machinery and Intelligence helped lay the foundation for modern computer science and AI. It’s arguably the first paper to pose the question “Can machines think?”⁴

SAS notes that “[the 1950’s initial] AI research [explored] problem solving and symbolic methods [and the 1960’s research explored] training computers to mimic basic human reasoning”.⁶ This early work paved the way for the automation and formal reasoning that we see in computers today, including decision support systems and smart search systems that can be designed to complement and augment human abilities.

In comparison to Turing, IBM presents Stuart Russell and Peter Norvig’s textbook Artificial Intelligence: A Modern Approach as a more recent analog to Turing’s AI philosophy. Originally published in 1994, and with its most recent edition was published in April 2020, Artificial Intelligence: A Modern Approach has become one of the most popular resources for studying AI.

In this textbook, Russell and Norvig explore the different versions of AI, stating that “[Some versions] have defined intelligence in terms of fidelity to human performance, while others prefer an abstract, formal definition of intelligence called rationality”⁷ and defining rationality as “doing the right thing”.⁷Continuing their exploration of AI, the authors note that intelligence could also “be a property of internal thought processes and reasoning, [or] intelligent behavior".⁷

Russell and Norvig break these lines of thought into the two dimensions or “human vs. rational and thought vs. behavior”.⁷ With four potential combinations for exploring AI, IBM presents these approaches as the following:

Human Approach:

Systems that think like humans
Systems that act like humans

Ideal Approach:

Systems that think rationally
Systems that act rationally³

Alan Turing asked, “Can a machine think”?⁵ IBM argues that Turing’s definition of AI would likely use the Human Approach of “systems that act like humans”.⁴

Russell and Norvig argue that from the two dimensions of “human vs. rational and thought vs. behavior [that] there are four possible combinations [and] there have been adherents and research programs for all four”.⁷

Techopedia speculates that the next generation of AI is “expected to inspire new types of brain-inspired circuits and architectures that can make data-driven decisions faster and more accurately than a human being can”.¹

A brief history of t[ai]me

Far before Alan Turing, most examples of AI were found in myth or works of literature. Greek mythology contains what is probably the foremost example of AI. In the book, Machines Who Think, author Pamela McCorduck notes that crippled Hephaestus, the god of fire and the forge “[fashioned] attendants to help him walk and assist in his forge”⁸. For a description of Hephaestus’ automatons, McCorduck cites The Iliad, where Homer described the automatons as follows:

“These are golden, and in appearance like living young women. Thereis intelligence in their hearts, and there is speech in them and strength,
and from the immortal gods they have learned how to do things”⁹.

Perhaps the most well-known automaton in Greek myth was Talos. A giant bronze automaton that’s sometimes depicted as a man or a “bronze bull or a man with a bull’s head”,¹⁰ Talos guarded the island Crete from invaders. Similar to Achilles’ weakness, Talos’ vulnerability was found in his ankle.

If Turing is considered “the father of computer science,” then Charles Babbage is “the father of the computer”.¹¹ Babbage invented the Difference Engine and its successor, the Analytical Engine. Designed to calculate tables of numbers including logarithms, the Analytical Engine contained “a central processing unit and memory and would have been programmed with punch cards”¹², making it the progenitor of general-purpose computers.

Arguably the next most iconic automaton comes from Mary Shelley’s 1818 genre defining science fiction and gothic novel, Frankenstein. In the article The Link Between Mary Shelley’s Frankenstein and AI, author Charlotte Mckee writes that “There is an indisputable link between Victor Frankenstein’s creation and Artificial Intelligence”.¹² Mckee notes that “The questions Shelley raises about a man-made being are relevant in the creation of AI [and explore] the possibilities of Artificial General Intelligence [and the] many ethical concerns that link Frankenstein and AI too”¹³.

Frankenstein’s creation isn’t created to be good or evil. Throughout the novel, it demonstrates a keen mind and articulation, teaching itself to both read and speak, and experiences the human emotions of happiness, anger, and desire. Mckee argues that it “is the very embodiment of machine intelligence. He learns like an algorithm throughout the novel”.¹³

At the turn of the 20^th century, the concept of AI, or at least of rogue machines, was spread from the stage and the silver screen. Outside of early adaptations of Frankenstein, two of the most well-known and influential instances of AI in media, notably a play and a silent film, consisted of the following examples:

Karel Čapek’s sci-fi play R.U.R. or Rossumovi Univerzální Roboti (translated Rossum's Universal Robots) helped popularize the word “robot”¹⁴
Fritz Lang's great cinematic work of German expressionism, Metropolis, is considered one of the most influential silent films, with Roger Ebert stating “Metropolis is one of the great achievements of the silent era”¹⁵

The Turing of the screw

It is impossible to overstate the influence that Alan Turing has had on science, let alone computer science or AI. Turing’s work as a codebreaker striving against Enigma, the German military's cipher machine, was instrumental in decrypting Nazi Germany’s encrypted communications. The BBC estimates that Turing helped bring WWII to an end faster and saved the lives of between “14 to 21 million people”.¹⁶

The Turing test is arguably one of the pillars of AI. Initially referred to as the Imitation Game⁵ in Computing Machinery and Intelligence, the Turing test is a means for determining if a computer (or any machine) is intelligent and can think.

Turing argued that a human interviewer could evaluate the conversation between a human and a machine, such as a computer, attempting to make human-like conversation. Knowing that one of the subjects was a machine imitating human speech, the human interviewer would separate the two subjects from each other then ask each subject questions and record their answers. Each subject’s response to the interviewer would be written out or typed, so the conversation wouldn’t rely on how well the machine could articulate words as human speech.

If the interviewer can’t reliably determine the human from the machine, then the machine would pass the test. The machine’s answers wouldn’t have to be correct answers per se, only answers that a human might give.⁵

It’s not uncommon for people today to experience a reverse Turing test – a test where the subjects must prove that they are human and not a computer. Completely Automated Public Turing Test To Tell Computers And Humans Apart (CAPTCHA) are probably the easiest to recognize of any reverse Turing tests.¹⁷ Common CAPTCHA examples include the following:

reCAPTCHA – although there are many different types of reCAPTCHA, one of the most recognizable is the “I’m not a robot” checkbox
Confident Captcha, Picture identification CAPTCHA – a user is presented with several pictures or tiles, and asked to select all examples of a traffic light or a kitten, etc.
Word Problem CAPTCHA – users are presented with the image of a word or words that is often distorted, with the letters struck through or blurred, and prompted to type out the word¹⁸

Has AI passed the Turing test?

Despite all modern technological advancement, as of June 2022 (the time of writing), no AI has successfully passed the Turing test.¹⁹ Despite this fact, AI’s failure to pass the Turing test has not become a testament against the idea of thinking, intelligent machines, and the possibility that one day machines could pass the test.

Noel Sharkey, professor of artificial intelligence and robotics at the University of Sheffield, argues that “Despite the failure of machines to deceive us into believing they are human, Turing would be excited by the remarkable progress of AI”.¹⁹ Prof. Sharkey continues, imagining that Turing “would have danced for joy when Deep Blue defeated world champion Gary Kasparov at chess [or when IBM’s] Watson beat the two best human opponents in the history of the American game [Jeopardy]”.²⁰ Prof. Sharkey concludes that “the Turing Test remains a useful way to chart the progress of AI and I believe that humans will be discussing it for centuries to come”.²⁰

What are the different types of AI?

AI is generally broken down into categories based on “the degree [that] an AI system can replicate human capabilities”.²¹While an AI still hasn’t passed the Turing test, how proficient they are at performing human functions helps with their classification and creates a juxtaposition between the simpler and “less-evolved type[s]”²¹ of AIs and the more evolved types capable of demonstrating human-like functions and proficiency.

Forbes states that there are two broad classifications or “types” for AI based of their capabilities and functionality, specifically “classifying AI and AI-enabled machines based on their likeness to the human mind, and their ability to “think” and perhaps even “feel” like humans”.²¹ Java T Point states that these seven types of AI are divided into two groups: a Type 1 group and a Type 2 group.

Techopedia notes that with the rising demands for lightning-fast information processing, today’s “digital processing hardware cannot keep [pace]”.¹ To keep up with the needs of tomorrow, researchers and developers are “taking inspiration from the brain and considering alternative architectures [of] artificial neurons and synapses [processing] information with high speed and adaptive learning capabilities in an energy-efficient, scalable manner”.¹ The Type 1 group consists of “evolving stages of AI”¹ sorted together by their intelligence capabilities and includes of the following examples:

Artificial narrow intelligence (ANI) or narrow AI
Artificial general intelligence (AGI) or general AI
Artificial superintelligence (ASI) or super AI or strong AI

The Type 2 group consist of AI sorted together by their functionality and consists of the following examples:

Reactive AI
Limited Memory AI
Theory of Mind AI
Self-Aware AI²²

What are the differences between weak and strong AI?

The evolving stages of AI move from one stage to the next based on the demand for faster, smarter, and more efficient information processing. The philosopher John R. Searle is considered to have coined the terms “weak vs. strong AI” in his 1980 article, Minds, Brains, and Programs where he stated “I find it useful to distinguish what I will call "strong" AI from "weak" or "cautious" AI”.²³

It’s easy to differentiate the “weak to strong” performance levels of narrow AI from general AI and super AI. Great Learning notes that weak AIs are “narrower applications with limited scopes [that are only] good at specific tasks [and that use] supervised and unsupervised learning to process data”.²⁴ Inversely, strong AIs use “a wider application with a [broader] scope, [have] an incredible human-level intelligence, [and] uses clustering and association to process data”.²⁴

What is narrow AI?

The first stage of evolving AI and easily the most common and available category of capability-based AI, ANI represents all AI that exists today and that has ever existed. Apple’s Siri and Amazon’s Alexa are all examples of narrow AI that you might see or even use as you go about your day-to-day life. Common pre-determined functions for narrow AI include “speech recognition, natural language processing, computer vision, machine learning, and expert systems”¹.

Outside of simply being an AI that exists today, the requirements for narrow AI are simple: perform a specific task using human-like intelligence. The “narrow” in narrow AI refers to the AI’s limited, usually pre-defined range of capabilities. These AI are often created with a single dedicated task in mind and are unable to perform tasks outside of their limitations or programming.

IBM’s Watson and other supercomputer AIs are still considered narrow AI. Despite or because of how they use expert systems approaches, ML, and natural language progressions, “these systems correspond to all the reactive and limited memory AI”.²¹ Forbes goes on to state that “Even the most complex AI that uses machine learning and deep learning to teach itself falls under ANI”.²¹

What is general AI?

The second stage of evolving AI, AGI refers to an AI that could “learn, perceive, understand, and function completely like a human being”.²¹ An AGI system could independently construct different competencies and develop domain-spanning connections and competencies. This ability would “[reduce the] time needed for training [and] make AI systems just as capable as humans by replicating our multi-functional capabilities”.²¹A general AI is the dream that one day a computer could be as smart and as capable of performing the same intellectual tasks as a human and have the “[equivalent of] the human mind’s ability to function autonomously according to a wide set of stimuli”.¹

At time of writing, while it’s possible for them to be hidden in development, there are no known general AI systems that exist today.

What is super AI?

The third and final stage of evolving AI, ASI refers to the hypothetical concept of an AI that exceeds human intelligence. A super AI would contain many of the “key characteristics of strong AI, [including the] capability to think [and] to reason, [to] make judgments, plan, learn, and communicate [independently]”.²² True to its name, an ASI could exceed human intelligence and yield an intellect that’s greater than the best human minds in virtually every field.

What is reactive AI?

The first of the functionality-specific types of AI, reactive AI uses real-time data to make decisions. “One of the oldest forms of AI systems [with an] extremely limited capacity [reactive AI lacks] memory-based functionality [and can’t] use [their] experiences [or memory] to inform their present actions”.²¹ This AI lacks the capability to learn the same way that a person does and can only respond to a previously defined inputs or conditions. “IBM’s Deep Blue, a machine that beat chess Grandmaster Garry Kasparov in 1997”²¹ is a popular example of a reactive AI.

What is limited memory AI?

The second of the functionality-specific types of AI, limited memory AI leverages data stored from past experiences for decision-making. This AI features the capabilities of reactive machines, is often capable of storing data for a limited timeframe and demonstrating learning from historical data.

Limited memory AI is present in most AI systems and apps today, particularly those that use deep learning and are “trained by large volumes of training data that they store in their memory to form a reference model for solving future problems”.²¹

Self-driving cars are a popular example of limited memory AI. Edureka notes that today’s self-driving cars “use sensors to identify civilians crossing the road, steep roads, traffic signals [and similar road navigation information] to make better driving decisions [and leveraging its learning experience] helps to prevent any future accidents”.²⁵

What is theory of mind AI?

The third of the functionality-specific types of AI, theory of mind AI incorporates user intent and similar subjective elements into its decision making. Unlike reactive AI and limited memory AI, theory of mind AI is currently in its early conceptual phases of development or possibly early development. Forbes describes theory of mind AI as “the next level of AI systems”²¹ that’s capable of understanding human emotions, beliefs, social cues, and thought process and “discerning [an entity’s] needs”.²¹ Forbes expands on theory of mind AI, noting that “to truly understand human needs, AI machines [must] perceive humans as individuals whose minds can be shaped by multiple factors”.²¹

What is self-aware AI?

The fourth and final AI of the functionality-specific types of AI, self-aware AI features a consciousness similar to a human mind and the ability to create goals and make data-driven decisions. As of the time of writing, self-aware AI is currently only a hypothetical idea, a concept that is potentially the final goal of AI research. Forbes notes that a self-aware AI would “be able to understand and evoke emotions in [others and] also have emotions, needs, beliefs, and potentially desires of its own”.²¹ Hypothetical or otherwise, self-aware AI can be readily found throughout much of science fiction and similar popular culture.

If each functionality-specific AI was a poker player, who would they be?

Techopedia offers up a fun example for distinguishing the different functionality-specific types of AI, prompting the reader to visualize each AI player in a poker game.

The reactive player bases all decisions on the current hand [of cards] in play
The limited memory player only considers their own and other player’s past decisions
The Theory of Mind player factors in other player's behavioral cues
The self-aware AI player [wonders] if playing poker [for money] is the best use of their time and effort¹

What is generative AI?

Generative AI, a subset of AI, specializes in producing human-like text, audio, code, images, simulations, and videos. These AI models employ DL techniques, to grasp patterns and structures from very large datasets. Subsequently, these acquired patterns enable the models to create content through sampling.

Diverse applications of generative AI have yielded promising outcomes across various domains, with continuous research and development pushing its boundaries. Nevertheless, ethical concerns arise, particularly regarding potential misuse and the generation of deceptive, manipulative, or fake content. Consequently, the exploration of generative AI needs to be accompanied by discussions about responsible development, usage, and adoption practices.

What are machine learning and deep learning?

When it comes to discussing AI, deep learning and ML are often confused and conflated, and it’s not hard to see why. Both are subsets of AI that focus on completing tasks or goals. Examples of both deep learning and ML can be easily found today, from self-driving cars to facial recognition software. Despite their common interchangeability, there is much that distinguishes deep learning from ML, and vice versa.

What is deep learning and how does it work?

A subfield of machine learning, Techopedia defines deep learning as “an iterative approach to artificial intelligence (AI) that stacks ML algorithms in a hierarchy of increasing complexity and abstraction”²⁶ and notes “Each deep learning level is created with knowledge gained from the preceding layer of the hierarchy”.²⁶

Author Michael Middleton on the Flatiron School blog post Deep Learning vs. Machine Learning — What’s the Difference? states that “Deep learning models introduce an extremely sophisticated approach to ML and are set to [perform some complex tasks] because they've been specifically modeled after the human brain”.²⁷ Middleton continues the human brain comparison noting that “Complex, multi-layered ‘deep neural networks’ are built to allow data to be passed between nodes (like neurons) in highly connected ways [resulting in] a non-linear transformation of the data that is increasingly abstract”.²⁷

IBM notes that the word “deep” in deep learning regards “a neural network comprised of more than three layers—which would be inclusive of the inputs and the output—can be considered a deep learning algorithm”.³Middleton notes that although “it takes tremendous volumes of data to ‘feed and build’ [a deep neural network], it can begin to generate immediate results, and there is relatively little need for human intervention once the programs are in place”.²⁷

On the identically titled Levity blog post Deep Learning vs. Machine Learning – What’s The Difference?, author Arne Wolfewicz states that in addition to analyzing data like a human mind, deep learning algorithms can perform their analysis “through supervised and unsupervised learning [and] use a layered structure of algorithms called an artificial neural network (ANN)”.²⁸ This ANN is “inspired by the biological neural network of the human brain, leading to a process of learning that’s far more capable than that of standard machine learning models”.²⁸

Author Patrick Grieve on the also identically titled Zendesk blog post Deep learning vs. machine learning: What’s the difference? regards the difficulty of deep learning models and incorrect conclusions. Grieve states that “like other examples of AI, [a deep learning model] requires lots of training to get the learning processes correct. But when it works as it’s intended, functional deep learning is often received as a scientific marvel that many consider to be the backbone of true artificial intelligence”.²⁹

Simplilearn author of the article Top 10 Deep Learning Algorithms You Should Know in 2022 Avijeet Biswal notes that while deep learning algorithms often use self-learning features, “[these algorithms] depend upon ANNs that mirror the way the brain computes information”.³⁰Biswal continues that “during [their] training process, algorithms use unknown elements in the input distribution to extract features, group objects, and discover useful data patterns [and] this [training process] occurs at multiple levels, using the algorithms to build the models”.³⁰

What are examples of the primary deep learning algorithms?

The idea that deep learning is the most complex AI that’s widely used today isn’t hyperbole. Deep learning models employ a variety of different learning models for certain tasks. Avijeet Biswal states that the following are the top 10 most popular deep learning algorithms:

Convolutional neural networks (CNNs or ConvNets)
Long short-term memory networks (LSTMs)
Recurrent neural networks (RNNs)
Generative adversarial networks (GANs)
Radial basis function networks (RBFNs) or radial basis networks
Multilayer perceptrons (MLPs)
Self-organizing maps (SOMs)
Restricted Boltzmann machines (RBMs)
Deep belief networks (DBNs)
Autoencoders (AEs)³⁰

What are convolutional neural networks?

CNNs consist of multiple layers “that process and extract features from data”³⁰ and as an algorithm that “can assign weights and biases to different objects in an image and differentiate one object in the image from another”²⁶, including “[identifying] satellite images, [processing] medical images, [forecasting] time series, and [detecting] anomalies”.³⁰Biswal notes that “Yann LeCun developed the first CNN in 1988 when it was called LeNet [and it] was used for recognizing characters like ZIP codes and digits”.³⁰

The following are the four layers that CNNs leverage when they process and extract features from data:

Convolution layer – employs different filters to execute the convolution operation
Rectified linear unit (ReLU) – performs operations on elements and includes an output that is a rectified feature map
Pooling layer – fed by the rectified feature map, pooling is a down-sampling operation that reduces the dimensions of the feature map. Afterwards, the pooling layer flattens and converts the two-dimensional arrays from the pooled feature map into a continuous, long, single, linear vector
Fully connected layer – Formed when the pooling layer’s flattened matrix is fed as an input, a process that classifies and identifies the images³⁰

What are long short-term memory networks?

LSTMs are a type of RNN that are designed around learning and retaining information and then recalling that previously learned information. They are capable of learning and similarly demonstrating long term memory retention (à la long-term dependencies). Techopedia notes that LSTMs “can learn order dependence in sequence prediction problems”²⁶ and that they’re often “used in machine translation and language modeling”.²⁶ Biswal notes that LSTMs “are useful in time-series [predictions] because they remember previous inputs [and they] are typically used for speech recognition, music composition, and pharmaceutical development”.³⁰

LSTMs work the way they do because of a linking structure where three layers or gates communicate. Rian Dolphin of the article LSTM Networks | A Detailed Explanation notes that these gates are the “forget gate, input gate and output gate”.³¹

Here is an example of a LSTM workflow:

The forget gate decides what parts of the cell state (the long-term memory of the network) are useful and should be retained based on the previous hidden state (representation of previous inputs) and the new input data in the sequence. Irrelevant data is forgotten.
The new memory network (NMN), which is a tanh activated neural network, communicates with the input gate to determine what new information should be added to the networks cell state, then updates the cell state. This decision depends on both the previous hidden state and new input data.
The output gate makes determinations for the new hidden state based on the following factors:
- The newly updated cell state
- The previous hidden state
- The new input data³¹

What are recurrent neural networks?

RNNs are algorithms capable of remembering sequential data and feature “connections that form directed cycles [that] allow the outputs from the LSTM to be fed as inputs to the current phase [and capable of memorizing] previous inputs due to its internal memory”.³¹ RNNs are often used for “speech recognition, voice recognition, time series prediction [and analysis] and natural language processing”²⁶ and “image captioning, handwriting recognition, and machine translation”³⁰.

What are generative adversarial networks?

GANs are composed of two algorithms that compete against one another to produce new data. Biswal expands on this description, stating that each “GAN has two components: a generator model [that] learns to generate fake data, and a discriminator model [that] learns from that false information”.³⁰

GANs are often used in “digital photo restoration and deepfake video”²⁶ and other programs to “help generate realistic images and cartoon characters, create photographs of human faces, and render 3D objects”³⁰. GANs help video game developers to “upscale low-resolution, 2D textures in old video games by recreating them in 4K or higher resolutions via image training”.³⁰

Here is an example of a GAN workflow:

“The discriminator learns to distinguish between the generator’s fake data and the real sample data.
[The initial training is presented and the] generator then produces fake data, and the discriminator quickly learns to tell that it's false.
The GAN sends the results to the generator and the discriminator to update the model”.³⁰

What are radial basis function networks?

Techopedia describes RBFNs as “a type of supervised [ANN] that uses supervised machine learning to function as a nonlinear classifier, [a nonlinear function that uses] sophisticated functions to go further in analysis than simple linear classifiers that work on lower-dimensional vectors”.³² Biswal describes RBFNs a little differently, stating they are “special types of feedforward neural networks that use radial basis functions as activation functions [and include] an input layer, a hidden layer, and an output layer and are mostly used for classification, regression, and time-series prediction”.³⁰

RBFNs work because of the following flows and features:

A classification is performed first by measuring the inputs similarity to examples from the training set.
- RBFNs have an input vector that feeds to the input layer and a layer of RBF neurons.
The function finds the weighted sum of the inputs, and the output layer has one node per category or class of data.
- In the hidden layer, there are neurons that contain the Gaussian transfer functions. These transfer functions have outputs that are inversely proportional to the distance from the neuron's center.
- The network's output is a linear combination of the input’s radial-basis functions and the neuron’s parameters.³⁰

What are multilayer perceptrons?

Techopedia describes MPLs as “a feedforward [ANN] that generates a set of outputs form a set of inputs [and] is characterized by several layers of input nodes connected as a directed graph between the input and output layers”.³³ Biswal notes that MLPs “have the same number of input and output layers but may have multiple hidden layers”.³⁰ Techopedia notes that MLPs often feature “several layers of input nodes connected as a directed graph between the input and output layers [meaning] that the signal path through the nodes only goes one way. Each node, apart from the input nodes, has a nonlinear activation function”.³³

MLPs leverage “backpropagation as a supervised learning technique for training the network [and are] widely used for solving problems that require supervised learning [and] research into computational neuroscience and parallel distributed processing”.³³ MPLs are often used in applications for speech recognition, image recognition and machine translation.

Here's an example of an MPL workflow:

MLPs feed the data to the input layer of the network.
- The layers of neurons connect in a graph so that the signal passes in one direction.
MLPs compute the input with the weights that exist between the input layer and the hidden layers.
MLPs use activation functions to determine which nodes to fire.
- Activation functions include ReLUs, sigmoid functions, and tanh.
MLPs train the model to understand the correlation and learn the dependencies between the independent and the target variables from a training data set.³⁰

What are self-organizing maps?

Invented by professor Teuvo Kohonen and sometimes referred to as self-organizing feature map (SOFM) or a Kohonen map, SOMs are described by Techopedia as “a type of [ANN] that uses unsupervised learning to build a two-dimensional map of a problem space, [where] the problem space can be anything from votes in U.S. Congress, maps of colors and even links between Wikipedia articles.”³⁴

Phrased another way, SOMs “enable data visualization to reduce the dimensions of data through self-organizing [ANNs]”.³⁰ SOMs leverage data visualization by “[generating] a visual representation of data on a hexagonal or rectangular grid”³⁴. This is principally done “to solve the problem that humans cannot easily visualize high-dimensional data [and] to help users understand this high-dimensional information”.³⁰

Specifically, SOMs try to “mirror the way the visual cortex in the human brain sees objects using signals generated by the optic nerves, [making] all the nodes in the network respond differently to different inputs”.³⁴

While MLPs use backpropagation for supervised learning, SOMs leverage “competitive learning where the nodes eventually specialize rather than error-correction learning, such as backpropagation with gradient descent”.³⁴ SOMs differ from “supervised learning or error-correction learning, but without using error or reward signals to train an algorithm, [making them] a kind of unsupervised learning”.³⁴

Techopedia notes that when SOMS are fed input data, they compute “the Euclidean distance or the straight-line distance between the nodes, which are given a weight”.³⁴ The best matching unit (BMU) refers to the network’s node that’s most alike the input data.

As a SOM “[advances] through the problem set, the weights start to look more like the actual data [and the SOM] has trained itself to see patterns in the data [similar to how] a human [perceives these patterns]”.³⁴

SOMs are commonly used for applications involving “meteorology, oceanography, project prioritization, and oil and gas exploration”.³⁴

Here's an example of an SOM workflow:

The SOM initializes weights for each node and chooses a vector at random from the training data.
The SOM examines every node to find which weights are the most likely input vector.
- The winning node is called the BMU.
The SOM discovers the BMU’s neighborhood, and the number of neighbors lessens over time.
The SOM awards a winning weight to the sample vector.
- The closer a node is to a BMU, the more its weight changes.
- The further the neighbor is from the BMU, the less it learns.
Repeat step 2 for N iterations.³⁰

What are restricted Boltzmann machines?

In 1985, Geoffrey Hinton co-created the RBM with David Ackley and Terry Sejnowski³⁵. Techopedia describes RBMs as a “type of generative network”³⁶ commonly used for [collaborative] filtering, feature learning and classification that leverages types of dimensionality reduction to help tackle complicated inputs”.³⁶ Biswal adds that RBMs are also used for “dimensionality reduction, regression, and topic modeling”³⁰ and that “RBMs constitute the building blocks of DBNs”.³⁰ RBMs are notably used for creating DBNs and similarly sophisticated models by stacking individual RBMs together.

RBMs get their name due to there being “no communication between layers in the model, which is the ‘restriction’ of the model”³⁶ and that an RMB’s nodes “make ‘stochastic’ [or random] decisions”.³⁶ Because of this random process, RBMs are sometimes labeled as “stochastic neural networks”.³⁰

Biswal notes that RBMs include two layers: one with visible units and another with hidden units and that “each visible unit is connected to all hidden units [and] RBMs have a bias unit that is connected to all the visible units and the hidden units, and they have no output nodes”.³⁰

A RBMs workflow involves two phases: “forward pass and backward pass”.³⁰

In the forward pass phase, the RBM...

Accepts the inputs and translate them into a set of numbers that encodes the inputs.
Combines every input with individual weight and one overall bias. The algorithm passes the output to the hidden layer.

In the backward pass phase, the RBM...

Takes that set of numbers and translate them to form the reconstructed inputs.
Combines each activation with individual weight and overall bias and passes the output to the visible layer for reconstruction.
- At the visible layer, the RBM compares the reconstruction with the original input to analyze the quality of the result.³⁰

What are deep belief networks?

A complex type of generative neural network (GNN), Techopedia defines DBNs as “an unsupervised deep learning algorithm [where] each layer has two purposes: it functions as a hidden layer for what came before and a visible layer for what comes next”.²⁶

Biswal notes that DBNs are “generative models that consist of multiple layers of stochastic, latent variables [that] have binary values and are often called hidden units”.³⁰

Collectively described by both Biswal and Techopedia as a group of RBNs that “are composed of various smaller unsupervised neural networks”³⁷. Techopedia describes DBNs as having “connections between layers [and] each RBM layer communicates with both the previous and subsequent layers”.³⁷ While these layers are connected, “the network does not include connections between unites in a single layer”.³⁷

One of the common features of a deep belief network is that although layers have connections between them, the network does not include connections between units in a single layer.

Per Techopedia, ML and neural network design pioneer Geoffrey Hinton “characterizes stacked RBMs as providing a system that can be trained in a "greedy" manner and describes deep belief networks as models ‘that extract a deep hierarchical representation of training data’”.³⁷ Specifically, “the greedy learning algorithm uses a layer-by-layer approach for learning the top-down, generative weights”.³⁰ Biswal notes that “DBNs learn that the values of the latent variables in every layer can be inferred by a single, bottom-up pass”.³⁰

This greedy unsupervised ML model displays “how engineers can pursue less structured, more rugged systems where there is [less] data labeling and the technology has to assemble results based on random inputs and iterative processes”.³⁷

As part of the DBN process, they “run the steps of Gibbs sampling on the top two hidden layers, [drawing] a sample from the RBM deﬁned by the top two hidden layers [then drawing] a sample from the visible units using a single pass of ancestral sampling through the rest of the model”.³⁰

DBNs are commonly used for applications involving “image-recognition, video-recognition, and motion-capture data”³⁰ and by “healthcare sectors for cancer and other disease detection”²⁶.

What are autoencoders?

Also known as autoassociator and Diabolo network, AEs are a “unsupervised [ANN] that provides compression and other functionality, [leveraging] a feedforward approach to reconstitute an output from an input”³⁸ and where “the input and output are identical”.³⁰

Developed by Geoffrey Hinton³⁹ to solve unsupervised learning problems, AEs work by first compressing the input and then sending it to be decompressed as an output. This decompressed output is frequently similar to the original input. Techopedia notes that this process exemplifies “the nature of an autoencoder – that the similar inputs and outputs get measured and compared for execution results”.³⁸

Autoencoders have two main parts: an encoder and a decoder. The encoder maps the input into code and the decoder maps the code to a reconstruction of the input. The code is sometimes considered a third part as “the original data goes into a coded result, and the subsequent layers of the network expand it into a finished output”.³⁸

A “denoising” AE is a useful tool for understanding AEs, with Techopedia stating that a denoising AE “uses original inputs along with a noisy input, to refine the output and rebuild something representing the original set of inputs”.³⁹

AEs are commonly used for applications involving image processing, and “pharmaceutical discovery [and] popularity prediction”.³⁰

Here's an example of an AE workflow:

The encoder receives an input.
The input transforms into a different representation.
An attempt is made to reconstruct the original input as accurately as possible.
The image is encoded, then reduced into a smaller representation.
- When an image isn’t clearly visible, it feeds into an AE neural network
The image is decoded to generate the reconstructed image.³⁰

What is machine learning and how does it work?

Techopedia describes ML as a sub-topic of AI “that focuses on building algorithmic models that can identify patterns and relationships in data, [contextualizing] the word machine [as] a synonym for computer program and the word learning [for] how ML algorithms will automatically become more accurate as they receive additional data”.⁴⁰

Patrick Grieve defines machine learning as “An application of [AI] that includes algorithms that parse data, learn from that data, and then apply what they’ve learned to make informed decisions”.²⁹ Arne Wolfewicz defines machine learning more simply as “the general term for when computers learn from data”²⁸ and as “the intersect of computer science and statistics where algorithms are used to perform a specific task without being explicitly programmed [and] instead, they recognize patterns in the data and make predictions once new data arrives”.²⁸

The term “machine learning” was first coined in 1959 by American IBMer and AI and computer gaming pioneer Arthur Samuel in his paper Some Studies in Machine Learning Using the Game of Checkers.⁴¹ Much the same with AI, the idea of ML was not a new one. Techopedia notes that ML’s “practical application in business was not financially feasible until the advent of the internet and recent advances in big data analytics and cloud computing [and] because training an ML algorithm to find patterns in data requires extremely large data sets”.⁴⁰

Wolfewicz notes that “the learning process of these algorithms can either be supervised or unsupervised, depending on the data being used to feed the algorithms”²⁸ He elaborates providing this example of machine learning:

“A traditional machine learning algorithm can be something as simple as linear regression, [so] imagine you want to predict your income given your years of higher education. [First], you have to define a function, e.g. income = y + x * years of education. Then, give your algorithm a set of training data [such as] a simple table with data on some people’s years of higher education and their associated income. Next, let your algorithm draw the line, e.g. through an ordinary least squares (OLS) regression. Now, you can give the algorithm some test data, e.g. your personal years of higher education, and let it predict your income”.²⁸

Wolfewicz argues that the “the driving force behind machine learning is ordinary statistics [and that] the algorithm learned to make a prediction without being explicitly programmed, only based on patterns and inference”.²⁸

Grieve notes that while ML is complicated, “at the end of the day, [ML] serves the same mechanical function that a flashlight, car, or computer screen does”²⁹ and that ML can be interpreted as meaning “[a device continually] performs a function with the data given to it and gets progressively better over time”.²⁹

ML is often leveraged by enterprise’s today for predictive analytics, such as risk analysis, fraud detection, and voice and image recognition. Grieve notes that ML powers a variety of “automated tasks that span across multiple industries, from data security firms that hunt down malware to finance professionals who want alerts for favorable trades [with] AI algorithms [that] are programmed to constantly learn in a way that simulates a virtual personal assistant”.²⁹ Techopedia adds that “predictive analytics and other similar ML projects frequently require [computer scientists,] data scientists, and machine learning engineers”.⁴⁰

What algorithms are used for training machine learning?

The three primary learning algorithms used for training in ML consist of the following examples:

Supervised learning – It’s given to labeled training data as an input and shown the correct answer as an output. It leverages outcomes from historical data sets to predict output values for new, incoming data.
Unsupervised learning – It’s given unlabeled training data. Instead of being asked to predict the correct output, it uses the inputted training data to detect patterns, and attempts to apply these patterns to other data sets that display similar behavior. Sometimes referred to as semi-supervised machine learning, it’s sometimes necessary to use a small amount of labeled data with a larger amount of unlabeled data during training.
Reinforcement learning – Rather than receiving training data, this learning algorithm is given a reward signal and seeks patterns in data that will give the reward. Its input frequently comes from its interaction with a digital or physical environment.⁴⁰

What are data scientists?

Middleton describes data scientists as a role where you “compose the models and algorithms needed to pursue [your] industry’s goals [and you] oversee the processing and analysis of data generated by the computers”.²⁷ This role requires coding expertise, including languages like Python and Java, with “a strong understanding of the business and strategic goals of a company or industry”.²⁷

What are machine learning engineers?

Middleton describes machine learning engineers as a role where you “implement the data scientists’ models and integrate them into the complex data and technological ecosystems of the firm [and where you are] at the helm for the implementation [and] programming of automated controls or robots that take actions based on incoming data”.²⁷

Techopedia notes that machine learning operations (MLOps) are the primary focus of a ML engineer’s job. MLOps is “an approach to managing the entire lifecycle of a machine learning model”.⁴⁰ from training through daily use up to retirement. ML engineers tend to have knowledge of “mathematics and statistics, in addition to data modeling, feature engineering and programming”.⁴⁰

It's likely that at an enterprise, data scientists and ML engineers will work together on a variety of AI-based projects. They may work on “deciding which type of learning algorithm will work best to solve a particular business problem [or on] deciding what data should be used for training and how machine learning model outcomes will be validated”.⁴⁰

What is a machine learning model and how does it work?

Techopedia defines a ML model as “the output of an ML algorithm that’s been run on data”.⁴⁰ When it comes to differentiating between models and algorithms, the author of the Finance Train’s eponymous article Difference Between Model and Algorithm notes that “an algorithm is a set of rules to follow to solve a problem, [that] it will have a set of rules that need to be followed in the right order [to] solve the problem. A model is what you build by using the algorithm”.⁴²

Here's an example of a workflow for creating an ML model:

Gather training data.
Prepare data for training.
Decide which learning algorithm to use.
Train the learning algorithm.
Evaluate the learning algorithm’s outputs.
If necessary, adjust the variables (hyperparameters) that govern the training process in order to improve output.⁴⁰

What is bias in machine learning and how can it be prevented?

In the BMCBlogs post Bias & Variance in Machine Learning: Concepts & Tutorials, author Shanika Wickramasinghe notes that “With larger data sets, various implementations, algorithms, and learning requirements, it has become even more complex to create and evaluate ML models since all those factors directly impact the overall accuracy and learning outcome of the model”.⁴³ She continues stating that “any issues in the algorithm or polluted data set can negatively impact the ML model”.⁴³

Techopedia states that despite the desire for transparent and explainable ML algorithms, “algorithmic transparency for machine learning can be more complicated than just sharing which algorithm was used to make a particular prediction”.⁴⁰ While many of the popular algorithms today are freely available, the proprietary training data where bias often is rooted, is proprietary and harder to access.

What is bias in the context of machine learning algorithms? Wickramasinghe describes bias as the following:

A phenomenon that skews the result of an algorithm in favor or against an idea
A systematic error that occurs in the machine learning model itself due to incorrect assumptions in the ML process
The error between average model prediction and the ground truth. Moreover, it describes how well the model matches the training data set
- A model with a higher bias would not match the data set closely.
- A low bias model will closely match the training data set.⁴³

Wickramasinghe adds that “characteristics of a high bias model include the following features:

Failure to capture proper data trends
Potential towards underfitting
More generalized/overly simplified
High error rate”⁴³

What is variance in machine learning and how is variance different from bias?

Much like deep learning and ML, bias and variance in ML are often confused and conflated. Wickramasinghe describes variance as the following:

The changes in the model when using different portions of the training data set.
The variability in the model prediction — how much the ML function can adjust depending on the given data set. Variance comes from highly complex models with a large number of features.
- Models with high bias will have low variance.
- Models with high variance will have a low bias.⁴³

Wickramasinghe adds that “characteristics of a high variance model include the following features:

Noise in the data set
Potential towards overfitting
Complex models
Trying to put all data points as close as possible”⁴³

What are underfitting and overfitting and when do they occur?

Underfitting and overfitting are terms for “how a model fails to match data”⁴³. Wickramasinghe notes that “the fitting of a model directly correlates to whether it will return accurate predictions from a given data set”.⁴³

Techopedia describes underfitting as “a condition that occurs when the ML model is so simple that no learning can take place”.⁴⁴ The author adds that “if a predictive model performs poorly on training data, underfitting is the most likely reason”.⁴⁴

Opposingly, overfitting is described as “a condition that occurs when a machine learning or deep neural network model performs significantly better for training data than it does for new data”.⁴⁴ The author notes that when a ML model can't make accurate predictions about new data because it can't distinguish extraneous (noisey) data from essential data that forms a pattern”,⁴⁴ then overfitting is likely the reason.

How can bias and variance be prevented or otherwise mitigated?

Wickramasinghe states that “bias and variance are inversely connected. It is impossible to have an ML model with a low bias and a low variance”.⁴³ Therefore, if a ML algorithm is adjusted for a given data set, there’s a high probability that bias will be reduced and variance will increase along with the odds of inaccuracy for the model’s predictions. Similarly, crafting a model that better fits a data set reduces the risk of inaccuracy for predictions, lowering the variance and raising the risk of bias.

This is a constant balancing act between variance and bias that data engineers must maintain. Wickramasinghe notes that “having a higher variance does not indicate a bad ML algorithm. Machine learning algorithms should be able to handle some variance”.⁴³

She argues that data engineers can approach the trade-off between bias and variance using the following methods:

Increase the model’s complexity – Factor in accountability for bias and variance, decreasing the overall bias and increasing the variance to acceptable levels, while aligning the model with the training dataset without incurring significant variance errors.
Increase the training data set – The preferred method for dealing with overfitting models, this method allows users to increase the complexity without variance errors that pollute the model, similar to a large data set.
- A large data set offers more data points for the algorithm to generalize data easily, but the major issue with increasing the trading data set is that underfitting or low bias models aren’t that sensitive to the training data set. As such, increasing data is the preferred solution when it comes to dealing with high variance and high bias models.⁴³

Wickramasinghe presents a table listing common algorithms and their expected behavior between bias and variance:

Algorithm	Bias	Variance
Linear Regression	High Bias	Less Variance
Decision Tree	Low Bias	High Variance
Bagging	Low Bias	High Variance (Less than Decision Tree)
Random Forest	Low Bias	High Variance (Less than Decision Tree and Bagging)⁴³

What’s the similarities and differences between artificial intelligence, machine learning, and deep learning?

Wolfewicz defines AI, ML and deep learning as the following:

AI – The theory and development of computer systems able to perform tasks normally requiring human intelligence.
ML – Subset of AI that gives computers the ability to learn without being explicitly programed.
Deep learning – Specialized subset of ML that relies on ANNs, a layered structure of algorithms that are inspired by the biological neural network of the human brain. ML leverages a process of learning that’s far more capable than that of standard machine learning models and allows it to make intelligent decisions on its own.²⁸

Middleton argues that the key differences between machine learning and deep learning involve the following elements:

Human intervention – Grieve notes that “while basic [ML] models do become progressively better at performing their specific functions as they take in new data, they still need some human intervention. If an AI algorithm returns an inaccurate prediction, then an engineer [must] step in and [adjust]. With a deep learning model, an algorithm can determine [if] a prediction is accurate through its own neural network—no human help is required”.²⁹ Middleton notes that “[ML] requires more ongoing human intervention to get results and deep learning is more complex to establish but requires minimal intervention afterwards”.²⁷
Hardware – “[ML] programs [are] less complex than deep learning algorithms and can often run on conventional computers, but deep learning systems require far more powerful hardware and resources. This demand for power [drives] use of graphical processing units. GPUs are useful for their high bandwidth memory and ability to hide latency (delays) in memory transfer due to thread parallelism (the ability of many operations to run efficiently at the same time)”.²⁷
Time – “[ML] systems [are] set up and operate quickly but may be limited in the power of their results. Deep learning systems take more time to set up but can generate results instantaneously (although the quality is likely to improve over time as more data becomes available)”.²⁷
Approach – “[ML] [requires] structured data and uses traditional algorithms like linear regression. Deep learning employs [ANNs] and is built to accommodate large volumes of unstructured data”.²⁷
Applications – “[ML] is already in use in your email inbox, bank, and doctor’s office. Deep learning technology enables more complex and autonomous programs, like self-driving cars or robots that perform advanced surgery”.²⁷

How will machine learning and deep learning affect you today?

So much of the technology that we use today relies on machine learning and deep learning algorithms that we take for granted. Grieve notes that in customer service, today’s AI apps “are used to drive self-service, increase agent productivity, and make workflows more reliable”.²⁷

Waves of customers’ queries are fed into these algorithms that aggregate and process the information, before producing answers for the customers. Grieve notes that ML and deep learning both help to “power [NLP and help] computers to comprehend text and speech [while] Amazon Alexa and Apple’s Siri are two good examples of ‘virtual agents’ that can use speech recognition to answer a consumer’s questions”.²⁷

Chatbots are another example of AI-infused technology to respond to customers. Grieve notes that “Zendesk’s AI chatbot, Answer Bot, [incorporates] a deep learning model to understand the context of a support ticket and learn which help articles it should suggest to a customer”.²⁷

Resources

Artificial Intelligence (AI), Techopedia, 7 October 2021.
What is AI? Learn about Artificial Intelligence, Oracle Cloud Infrastructure, 2022
What is Artificial Intelligence (AI), IBM, 3 June 2020.
What is Artificial Intelligence?, John McCarthy, 24 November 2004.
Computing Machinery and Intelligence, Alan Turing, 1950.
Artificial Intelligence: What it is any why it matters, SAS, 2022.
Artificial Intelligence: A Modern Approach, Stuart Russell and Peter Norvig, 9 June 2021.
Machines Who Think: A Personal Inquiry into the History and Prospects of Artificial Intelligence, Pamela McCorduck, 2004.
The Iliad, Homer, Richmond Lattimore (translation), 1951.
TALOS, GreekMythology.com, 2021.
Charles Babbage: Father of the Computer, D.S. Halacy, 1970.
Let's build Babbage's ultimate mechanical computer, John Graham-Cumming, New Scientist, 15 December 2010.
The Link Between Mary Shelley’s Frankenstein and AI, Charlotte Mckee, Big Cloud, 2020.
The Czech Play That Gave Us the Word ‘Robot’, John M. Jordan, The MIT Press Reader, 29 July 2019.
Roger Ebert's movie home companion, Roger Ebert, Internet Archive, 1985.
Alan Turing: The codebreaker who saved 'millions of lives', Prof Jack Copeland, BBC.com, 19 June 2012.
Completely Automated Public Turing Test To Tell Computers And Humans Apart (CAPTCHA), Techopedia, 11 October 2016.
Top 16 Mostly Used Captcha Examples, IIH Global, 2022.
The Turing test: AI still hasn’t passed the “imitation game”, Big Think, 7 March 2022.
Alan Turing: The experiment that shaped artificial intelligence, Prof Noel Sharkey, BBC.com, 21 June 2012.
7 Types of Artificial Intelligence, Naveen Joshi, Forbes, 19 June 2019.
Types of Artificial Intelligence, Java T Point, 2021.
Minds, Brains, and Programs, John R. Searle, Behavioral and Brain Sciences, 1980.
What is Artificial Intelligence? How does AI work, Types, Trends and Future of it? Great Learning Team, Great Learning, 19 January 2022.
Types Of Artificial Intelligence You Should Know, Zulaikha Lateef, edureka!, 29 July 2021.
Deep learning, Techopedia, 23 February 2022.
Deep Learning vs. Machine Learning — What’s the Difference?, Michael Middleton, Flatiron School, 8 February 2021.
Deep Learning vs. Machine Learning – What’s The Difference?, Arne Wolfewicz, Levity, 21 April 2022.
Deep learning vs. machine learning: What’s the difference?, Patrick Grieve, Zendesk Blog, 8 March 2022.
Top 10 Deep Learning Algorithms You Should Know in 2022, Avijeet Biswal, Simplilearn, 21 February 2022.
LSTM Networks | A Detailed Explanation, Rian Dolphin, Towards Data Science, 21 October 2020.
Radial Basis Function Network (RBF Network), Techopedia.
Multilayer Perceptron (MLP), Techopedia.
Self-Organizing Map (SOM), Techopedia, 9 July 2018.
A learning algorithm for Boltzmann machines, David H Ackley; Geoffrey E Hinton; Terrence J Sejnowski, Cognitive science 9, 147-169, 1985.
Restricted Boltzmann Machine (RBM), Techopedia, 1 June 2018.
Deep Belief Network (DBN), Techopedia.
Autoencoder (AE), Techopedia, 18 July 2018.
Autoencoders, minimum description length and Helmholtz free energy, G. E. Hinton; R. S. Zemel, Advances in neural information processing systems 6, 3-10, 1994.
Machine Learning (ML), Dr. Tehseen Zia, Techopedia, 16 August 2021.
Some Studies in Machine Learning Using the Game of Checkers, A.L. Samuel, IBM Journal of Research and Development, 3, 211-229, 1959.
Difference Between Model and Algorithm, Finance Train, 2021.
Bias & Variance in Machine Learning: Concepts & Tutorials, Shanika Wickramasinghe, BMCBlogs, 16 July 2021.
Overfitting, Techopedia, 3 March 2022.