What is machine learning?
Patterns. The world is full of them, and we often don’t even notice.
Human language is a great example. We understand each other because things like verb conjugations and word order typically follow a rigid structure. The same is also true for music, where the predictability of rhythm and scales provide a measure of order and predictability.
If computers can identify and model these patterns, they can make predictions. That, in essence, is the nature of machine learning (ML).
These predictions can be used to mimic the fundamental elements of what makes a person human, like the ability to write creatively and articulate ideas, as with chatbots like ChatGPT and Google Bard. With enough real-world examples (high-quality training data), they can — in the case of a self-driving car — identify the elements leading up to a potential collision, allowing the vehicle to avert disaster. Or, in the case of a computer network or financial system, they can find the tell-tale signs of malicious activity.
Those examples are, by no means, comprehensive. ML is an increasingly fundamental component of our highly connected, digital world. It’s present in — or provides the basis for — countless business and consumer-oriented systems, from the camera in your smartphone to the security systems in your workplace.
This post will give you a gentle overview of ML — from the earliest experiments to the most recent advancements. You’ll learn about how it works, why it works, and the growing usage of ML.
The history of machine learning
Artificial Intelligence (AI) and ML are often treated as interchangeable concepts. In reality, AI is a large family of statistical and mathematical techniques, of which ML is a subset. Each family tends to solve different types of problems.
The foundation of ML is the logical (pun intended) descent of a long history of progress in the Theory of Computation, from Ada Lovelace in the XIXth century to Alan Turing. Its premise started in 1943, long before the invention of the digital stored memory computer, when logician Walter Pitts and neuroscientist Warren McCulloch tried to create a mathematical representation of the neurons in a human brain. This was significant for two reasons.
Creating a system like a brain
First, artificial neural networks (a key component in modern ML systems) are engineered to represent a brain’s neural activity and the relationships between neurons. But, more fundamentally, computers are essentially number-crunching machines. If you can represent an idea in mathematical terms, it becomes possible to implement it in code.
With computers still in their infancy, the earliest developments in ML were primarily theoretical in nature, and the main question was ‘What is computable’? (i.e., What mathematical problems can computers solve?) — like Turing’s concept of a thinking machine, as proposed in his 1950 paper Computing Machinery and Intelligence .
One such proof of concept was the 1951 Stochastic Neural Analog Reinforcement Calculator (SNARC), which implemented a 40-synapse neural network. Its role was to output the probability of a signal propagating from one end of a circuit to another. Although the experiment was simple by today’s standards and had no obvious commercial utility, it nonetheless pioneered many of the concepts that underpin today’s ML systems. For example, it included a reinforcement mechanism, where the operator could “reward” correct answers, thereby improving its accuracy.
The same decade, Cornell University and the US Office of Naval Research created the first implementation of the Perceptron — a machine conceived by Pitt and McCulloch in 1943 that could classify objects based on their type. This was an early experiment in computer vision, albeit one that largely failed to meet its objectives. ML was a nascent field in 1959, and computers were underpowered.
A trigger for ML innovation
And yet, its failure was — as is often the case in technology — a trigger for innovation. It led to the discovery of multi-layer neural networks (where networks become deep, where layers of neurons are accessible only from previous layers), which produced more accurate predictions than a single layer of neurons.
The subsequent decades saw an explosion of interest in ML, with the discovery of new methods and algorithms (like the nearest neighbour algorithm, which is used in pattern recognition, and new methods for constructing and optimising artificial neural networks) and new implementations. It allowed researchers to build systems that could traverse real-world environments, play games, and analyse data.
By the late 1990s, ML was starting to deliver real business value — or, at the very least, handle tasks that were once the exclusive domain of human beings. In 1997, IBM’s Deep Blue system beat Garry Kasparov — then the world’s greatest chess player — in a five-game contest, winning two games and drawing on three. Two years later, researchers proposed a method for using ML in spam detection systems.
ML grows up
This progress only accelerated after the turn of the millennium, with ML becoming — for lack of a better word — vastly more accessible. In 2002, the Idiap Research Institute in Switzerland released Torch, the first open-source library for ML applications. Open source data — particularly data for computer vision applications — became vastly more abundant, allowing researchers and organisations to easily build ML systems that reflected real-world conditions.
And the accomplishments of ML-based systems continued to grow, with IBM’s Watson trouncing two human champions on the quiz-based game show Jeopardy, and Google’s AlphaGo system defeating a professional Go player. It wasn’t merely that ML technologies were maturing, but also that we finally had the technological capabilities to implement them in a way that was, if not useful, then certainly impressive.
The uses of ML
Nine decades after researchers conceptualised the idea of an artificial neuron, ML is a seemingly omnipresent part of our digital lives. It plays a critical, although often invisible, role in consumer and business applications, including:
- Threat detection: Security systems use ML to identify the anomalous elements that could be indicative of an attack. Examples of these elements include the header and body of an email, or traffic on a network.
- Smartphone cameras: Your phone uses ML to identify and classify the objects in a frame and adjust the camera settings accordingly, resulting in a better photograph. This technology is used on Android and iOS devices.
- Large language models (LLM): Chatbots like ChatGPT or Google Bard use ML to understand the relationships between words, concepts, and the elements of writing styles. This allows them to write convincingly about a myriad of topics, in a number of styles and formats.
- Autonomous vehicles: Self-driving cars (or those with semi-automated systems) use AI and ML to recognise objects in their surrounding environment, the consequences involved with performing a certain action, and understand how other drivers typically behave.
- Facial recognition: ML allows social networking sites to recognise individuals captured in photos and automatically “tag” them. An example of this is Meta’s DeepFace algorithm.
- Speech recognition: Thanks to ML, users can understand the relationship between a sound and a written word, allowing them to create accurate transcriptions from spoken words.
- Fraud detection: Financial institutions use ML to recognize anomalous spending behaviours, allowing them to block potentially fraudulent transactions as they happen.
- Manufacturing & Logistic: ML allows industrial robotics to work autonomously, allowing for smoother supply chains and safer workplaces. It helps these machines understand their environments and the objects they interact with, and respond to potential anomalous events that could harm a human employee.
ML is present in every sector of the economy. But how does it work? In the next section, we’ll explain the concepts behind ML. We’ll also touch on how some popular ML algorithms work.
How ML works
ML is a vast collection of algorithms, and they share the same common objectives — helping a computer identify patterns, which can be used to make predictions or gain knowledge about a phenomenon — Each algorithm approaches the problem of learning in subtly different ways.
Training Data
ML systems use real-world data to identify patterns and make predictions. An AI system’s effectiveness is often determined by the quantity and quality of that training data, and so, researchers and developers will spend a lot of time curating the material they provide to their algorithms.
Fortunately, this data is often easily accessible. ChatGPT, for example, used a portion of the CommonCrawl database — a scraped version of the internet that is formatted for use in AI/ML tasks. Developers of computer vision applications can turn to ImageNet, a large, highly organized, and publicly accessible repository of photos.
Getting the data is one problem. Curating it is another — and it’s a big one. AI/ML developers will want to remove data that is irrelevant, low-quality, or incorrectly categorized. So, in the case of a computer vision system, they’d want to remove pictures that are low-resolution, blurry, or otherwise unclear.
Supervised, unsupervised (and sometimes semi-supervised)
So, let’s imagine you’re building a computer vision application. Something simple, like the hotdog-identifying app from HBO’s Silicon Valley. You want to train a system that identifies whether an image contains a hotdog or not. But first, you’re met with a challenge: how do you teach a computer what a hotdog looks like?
Here, you might use a supervised learning system. You’d feed the computer a bunch of images that contain hotdogs. Because the system is ‘supervised’, these images are annotated (also referred to as labeled) with ‘has hotdog’ or ‘doesn’t have hot dog’, so the algorithm builds a statistical model of photos that correlates with ‘hot dog’, and ‘no hot dog’ , and thus, identify the distinct visual elements that comprise a hotdog. In essence, the algorithm looks at the commonality between all the pictures with a hot dog, and that commonality IS the hot dog. When presented with a new picture, if that commonality is present, the algorithm will predict that the new picture has a hot dog, or vice versa.
However, getting your data annotated might be very costly. In the absence of labels, an AI developer can turn to unsupervised algorithms (no labels — no supervision!).
Here, the system identifies commonalities (also called clusters) — like the shape and colour of a hotdog — that it can use to make predictions.
A semi-supervised approach uses a hybrid of the two methods, with the researcher or developer providing a mix of labeled and unlabeled images. Usually, the amount of labeled data represents a small portion of that provided. With labelling a time-consuming and laborious task, this approach gives developers many of the benefits that come from a supervised learning approach, but cuts data preparation time significantly.
Researchers will then evaluate the effectiveness of their training models. They’ll look at certain metrics such as accuracy (how often did the algorithm predicts correctly compared to the original label) or precision (how often did the algorithm predicts ‘hot dog’ on an actual ‘hot dog’ picture),
ML algorithms and models
Developers and researchers have large libraries of ML algorithms at their disposal. These work under different concepts, making them ideally suited to different tasks and problems. Here are some of the most noteworthy:
- Deep Neural Networks: These models are mathematically inspired by how brains process information and make decisions. They consist of hidden layers of interconnected neurons (nodes) When an input — like an image or video — passes through, the neural network breaks it into discrete elements, with each weighted against a real-world value
- Decision Trees: A decision tree uses data to construct an if/else logic tree, where each split branches out into a hierarchical flow chart.
- Random Forest: A random forest model is essentially a scaled-up version of a decision tree model. It will consist of multiple “trees” operating parallel to each other, with the expectation that multiple trees can — by the wisdom of the crowd — produce a more accurate or useful result than a single tree.
- Linear Regression: A linear regression model makes predictions based on the relationship between a dependent variable (what we want to predict) and one or more independent variables. (What we think helps predict what we want to predict, for instance, years of experience and diploma may predict salary.)
- Naive Bayes: A Naive Bayes model is a probabilistic model used to classify objects based on a series of independent attributes, their observed probabilities. This model is particularly useful for text classification for tasks, such as spam filtering and sentiment analysis.
The limits of ML
ML is a major reason why technology products are getting faster, smarter, and more capable. And yet, there are places where it underperforms, either due to technical limitations or to flaws in the development and training process. These include:
- Bias: When you use training data that isn’t representative of society at large, you end up with a system that can only make accurate predictions about a small subset of the population, and therefore be very inaccurate and harmful for other groups. An example would be a facial recognition system that’s only trained on pictures of white people.
- Ethics: ML systems are unable to process — or even identify — areas of ethical concern and respond appropriately. The use of ML itself presents ethical concerns, including user safety, privacy, and accountability. These concerns must be considered and addressed during the development and implementation phase.
- ML reproducibility: Many of the findings produced by AI researchers cannot be peer-validated or reproduced by a third-party participant. This is often the result of poorly documented development processes, discrepancies in the environments where the experiments are reproduced, and the use of non-deterministic algorithms that can produce different results for the same input.
- Interpretability: Understanding how ML systems process data and make predictions is often a difficult — if not impossible — task.
- Causality: ML doesn’t understand the concept of cause and effect, but rather makes predictions based on the relationship between objects and entities. While it can identify correlation, it can’t identify causation.
These problems are not insurmountable. ML technology constantly improves, and AI developers are increasingly aware of the problems posed by bias, as well as the importance of user rights.
Understanding an issue is the first step to solving it, and according to a 2023 survey of global tech leaders, 73% recognise an issue of data bias within their organisations. Many are taking action, with 65% citing bias as a factor when considering AI/ML vendors, and 76% believing that the best way to address bias is with a coherent and centralised response, rather than one that only applies to a specific silo within the organisation.
Progress takes time, however, and in the interim, it’s important for users, developers, and organisations to recognise the limitations of ML — as well as its incredible promise.
The future role of ML in our lives
ML has already achieved a high degree of prominence in our working and personal lives, and as time drags on, that role will only grow. ML has untapped potential across every sector of the economy, from healthcare to manufacturing, and beyond.
According to Forrester, AI/ML will reduce the time it takes to access retail healthcare — like that provided by a pharmacist — by 25%. PWC predicts AI/ML will add $15.7tn to the global economy by 2030, contributing 14.5% to North American GDP alone. It’s a tool for greater productivity, improved health outcomes, and better lives.
It’s a journey that’s almost a century in the making, and has involved the contributions of countless researchers, companies, governments, and enthusiasts — and it’s not yet over. It only seems like we’ve reached a culmination because, for the first time ever, we have the computational muscle to bring the ideas that were first envisioned in the 1940s and 1950s to life. Our abilities have finally aligned with our ambitions.
But this journey must be managed carefully, with rigorous consideration made to user privacy, safety, sustainability, and the suitability of ML for each given task. If we get this right, we pave the way for future improvements for all and greater AI/ML adoption by all.