AI Glossary - TRAILS.scot

A

Accountability: Accountability means that the people or organisations who design, deploy, or use an AI system are responsible for its outcomes and for how it affects others. It involves being able to explain decisions, address problems, follow laws and guidelines, and take action when something goes wrong.
Accountability ensures that humans, not the AI system, are answerable for how the technology is used.
Accuracy: Accuracy refers to how correct something is. It is a way of measuring how often an ML model makes a correct prediction.
AGI (Artificial General Intelligence): Artificial General Intelligence (AGI) refers to a hypothetical type of AI that could perform a wide range of tasks at a level similar to humans, across many different areas. Unlike today’s AI systems, which are designed for specific tasks, AGI would be able to learn, reason, and adapt very broadly. AGI does not currently exist; it is an idea discussed in research.
AI Agent: An AI agent is a system that can carry out tasks for users by following instructions and using different tools or steps. Instead of only answering questions, an AI agent can do actions such as searching for information, organising files, running tasks in order, or helping complete a goal.
AI alignment: AI alignment is the process of designing and guiding an AI system so that its outputs match human goals, values, and safety expectations. It aims to reduce harmful or unintended behaviour, though it cannot eliminate all risks.

Alignment helps AI systems stay within the goals and safety standards set by people.
AI Generated Content: AI‑generated content is any text, image, audio, video, or other material produced by an AI system. It is created by models that generate outputs based on patterns learned from training data, rather than being created directly by a human. Examples include AI‑written text, AI‑made images, synthetic voices, and automatically generated summaries.
AI Literacy: AI literacy is the set of skills and understanding people need to use, question, and make informed decisions about artificial intelligence. It includes knowing what AI can and cannot do, how it works at a basic level, how to use it responsibly, and how to recognise risks such as bias or misuse.
AI project lifecycle: The artificial intelligence (AI) project lifecycle refers to the different steps it might take to design and build a machine learning (ML) model. The steps include defining the problem, preparing the data, training the model, testing the model, evaluating the model, and explaining the model.
AI psychosis: AI Psychosis: This term describes how spending a lot of time with AI chatbots can sometimes make people more confused, paranoid, or out of touch with reality, especially if they’re already feeling vulnerable. It’s not an official medical condition, but it explains how using chatbots can sometimes make mental health issues worse.
AI slop: AI slop is an informal term used to describe low‑quality AI‑generated content. This can include material that is repetitive, inaccurate, confusing, or clearly produced with little or no human review. The term is often used for cheap, mass‑produced social media content, where AI‑generated posts, images, or videos are created quickly to attract attention, fill space, or generate clicks without focusing on accuracy or usefulness.
Algorithmic bias: Algorithmic bias occurs when a model consistently produces results that disadvantage or favour certain groups because of patterns in the data, the design choices in the system, or the way the model is trained.
Anthropomorphism: Anthropomorphism is when human qualities — such as emotions, thoughts, intentions, or consciousness — are incorrectly attributed to non‑human things, including animals, objects, or technologies. In AI, anthropomorphism occurs when people assume a system “thinks,” “feels,” or “understands,” even though it is only following mathematical patterns and rules.
Anthropomorphism is treating something non‑human as if it were human.
API (Application Programming Interface): An API is a set of rules and formats that allows one software system to request data from another system in a controlled way. It defines how systems can interact and share without showing their internal code or workings.
Artificial Intelligence (AI): Artificial intelligence (AI) is the design and study of systems that do tasks that appear to mimic intelligent behaviour. These tasks include recognizing patterns, using language, making predictions, and solving problems. AI systems don’t think or feel like humans—they work by analyzing large amounts of data and learning patterns or rules from it so they can make useful decisions or suggestions.
Audio recognition: Audio recognition is a technology that processes and identifies sounds in an audio signal. These sounds can include speech, music, environmental noises, alarms, or other acoustic patterns. The system analyses features of the sound — such as frequency, rhythm, or waveform — and classifies or labels them based on patterns in training data.
Avatar: An avatar is a digital character or image that represents a person in an online space. It can be a cartoon figure, a photo, a 3D model, or even an animated character. Avatars are used in games, virtual worlds, video chats, and social platforms to show who a user is or how they want to appear.

B

Bias/Biased: Bias is a consistent tendency for something to be treated or represented in an uneven or unbalanced way. In data and machine learning, bias often appears when certain groups, situations, or types of information are included more often than others, leading to results that are not fair or accurate for everyone.
Black Box: A black box is a system whose internal workings are not easily visible or understandable to users or developers. In AI, a model is called a black box when it produces outputs without providing clear information about how it arrived at those results. This can make it difficult to check for errors, bias, or unexpected behaviour.
Breadth-first search: Breadth‑first search is a method for exploring a graph or tree by visiting all the nodes at one level before moving on to the next level.

It works outward in layers, step by step.
It’s useful when you want to find the shortest path in terms of number of steps.

C

Chatbot: A chatbot is a computer program designed to have text‑based conversations. It uses rules or AI methods to respond to messages, answer questions, or help users complete tasks.
Class: A class is a category or label that a machine‑learning model can assign to a piece of data. For example, in an image‑classification task, classes might be cat, dog, or bird. During training, the model learns patterns that help it sort new data into the correct class.
Classification: Classification is a type of machine‑learning task where data is sorted into predefined categories, called classes. During training, the model is given labelled examples, which allow it to form mathematical patterns that map inputs to the correct class. When new data is provided, the model uses these patterns to assign the most likely class. Examples include sorting emails into “spam” or “not spam,” or identifying whether an image shows a dog, a cat, or a bird.
Classifier: A classifier is a type of machine‑learning model designed to assign data to one of several categories (called classes). It uses patterns formed during training from labelled examples to map new inputs to the most likely class. Examples of classifiers include models that sort emails into spam or not spam, identify objects in images, or label text as positive or negative.
Cluster: A cluster is a group of data points that are more similar to each other than to data points in other groups. Unsupervised learning algorithms organise data into these groups based on patterns such as distance, similarity, or shared features – without using labels.
Collaborative Filtering: Collaborative filtering is a technique used in recommendation systems that suggests items based on patterns of similarity between users or between items. It works by analysing what groups of users have liked, chosen, or rated, and then recommending items that similar users have interacted with.
For example if two people have chosen lots of the same films, and one of them watches a new film, collaborative filtering may recommend that film to the other person.
Computer Vision: Computer vision is an area of AI that focuses on ways to analyse visual information from images or videos to then make decisions. It takes useful details (like objects, people, actions, or patterns) and turns them into structured information that a system can use for tasks like classification, detection, or tracking.
Confidence: Confidence is a number or score that shows how strongly a model supports a particular prediction. It shows how likely the model considers its chosen output to be correct, based on patterns in the data it was trained on.

Higher confidence means the model assigns a higher probability to its prediction, but it does not guarantee accuracy.
Confidence threshold: A confidence threshold is the minimum confidence score a model must reach before its prediction is accepted as valid. If the model’s confidence for a prediction is below this threshold, the system may reject the output, label it as uncertain, or ask for another action (such as human review).

It’s a cutoff point that determines when a model’s prediction is considered reliable enough to use.
Content-based Filtering: Content‑based filtering is a recommendation technique that suggests items based on the features of those items. It works by analysing characteristics such as genre, keywords, style, or description, and recommending items with similar features to those a user has previously selected.
For example, if someone has bought graphic novels with bright colours and superhero themes, a content‑based system recommends other books with similar visual styles or themes,
Context window: A context window is the maximum amount of text a language model can process at one time when generating a response. If the input is longer than this limit, the model only uses the most recent portion.
Crawling: Crawling is the process where automated programs (called crawlers or bots) visit web pages to collect information about their content and links. Crawling is mainly used to discover and map web pages so they can be indexed by search engines. Crawling is often used to gather large datasets for training AI systems like LLMs.

D

Data bias: Data bias means the data used to train a model does not fairly represent all groups, situations, or possibilities. Some types of data may appear too often, and others not enough, which can lead to uneven or unfair results when the model is used.

For example, some facial recognition models are biased against faces of certain skin tones, because the ML models have been trained using mostly images of faces of one skin tone.
Data centres: A data centre is a special building or room that stores, processes, and manages large amounts of digital information. It contains servers, networking equipment, cooling systems, and power supplies that keep data available and secure. Data centres support many digital services, including cloud storage, AI model training, and online applications.
Data Cleaning: Data cleaning is the process of fixing or removing errors, duplicates, and inconsistencies in a dataset. It may involve correcting mistakes, filling in missing information, standardising formats, or removing data that is not useful. Clean data helps machine‑learning models perform more accurately and reliably.
Data collection: Data collection is the process of gathering information from various sources so it can be used in an AI or data‑analysis project. The data may come from surveys, sensors, devices, websites, databases, images, or recordings. Good data collection ensures that the dataset represents the real‑world situation the project is trying to address.
Data Poisoning: Data poisoning is a type of attack where someone intentionally changes, corrupts, or adds misleading data to a training dataset in order to influence how a machine‑learning model behaves. Because models form patterns based on their training data, poisoned data can cause them to make incorrect, unsafe, or biased predictions once deployed.

Data poisoning is the deliberate tampering of training data to harm or manipulate a model’s performance.
Data-driven: A data‑driven AI system learns how to make decisions by studying large amounts of example data, rather than by following a list of step‑by‑step instructions written by humans. It finds patterns in the data and uses those patterns to make predictions or choices. Because it learns from examples, it can adapt and improve as it sees more data.
Decision Trees: A decision tree is a model in machine learning that makes predictions by following a series of simple yes/no or either/or questions. Each question leads to a new branch, and the final branch (a “leaf”) gives the outcome.
Deep learning: Deep learning is a type of machine learning that uses neural networks with many layers. Each layer transforms the data in small ways, allowing the system to recognise complex patterns such as faces in images or meaning in text. Deep learning powers many modern AI tools.
Deepfake: A deepfake is audio, image, or video content that has been digitally altered using AI techniques to make it appear as if a real person said or did something they never actually said or did. Deepfakes often copy a person’s face, voice, or movements and insert them into new, fabricated content. They can be used creatively, but they also pose risks when used to decieve, impersonate, or spread false information.
Depth-first search: Depth‑first search is a method for exploring a graph or tree by following one path as far as it can go before backtracking and trying another path.

It goes deep first, then returns to explore other branches.
It’s useful for tasks like checking all possible paths or searching through large structures.
Diffusion Model: A diffusion model is a type of generative AI that creates images (and sometimes other content) by starting with random noise—like static on a TV—and gradually removing the noise step by step. Each step makes the image clearer, guided by patterns the model learned during training. Diffusion models power many modern AI image‑generation tools.
Disinformation: Disinformation is false or misleading information that is created and shared on purpose to deceive people. It is designed to cause confusion, influence opinions, or hide the truth. Disinformation can appear in many forms, such as fake news stories, altered images or videos, or messages that twist facts.

E

Emulated empathy: Emulated empathy is when an AI system generates language that sounds caring, supportive, or emotionally aware, even though it does not experience feelings. It follows patterns in data about how people express empathy, producing responses designed to be comforting or understanding without actually feeling emotions. The system imitates empathetic language but does not experience empathy.
Ethics: Ethics refers to the principles and values that help people decide what is right, fair, and responsible. In technology and AI, ethics guides how systems should be designed and used so that they are safe, respectful, and do not cause harm.
Explainability: Explainability refers to how clearly and effectively an AI system’s results or decisions can be understood. It involves showing the key factors, patterns, or steps that led to an output, so people can see why the system produced a particular result. Good explainability helps users check for errors, bias, or unexpected behaviour.

Explainability is about how well someone can understand why a model produced a particular output.

F

Fitting: Fitting is the process of adjusting a model’s parameters so that the outputs match the training data as closely as possible.
Foundation Model: A foundation model is a very large AI model trained on massive amounts of data so it can be adapted for many different tasks. Instead of being built for one specific purpose, it can be fine‑tuned to perform a wide range of activities, such as answering questions, generating images, or analysing documents.
Frontier AI: Frontier AI refers to the most advanced and powerful AI systems available at a given time. These models push the boundaries of what AI can do, often involving very large training datasets or wide‑ranging abilities.

Because frontier AI models are so powerful, they offer new possibilities in areas like science, education, and healthcare — but they also need strong safety measures to reduce risks such as errors, bias, or misuse.

G

Generative Adverserial Networks (GANs): A Generative Adversarial Network (GAN) is a two‑part system where one part makes content and the other part judges it, helping the system to create realistic content, such as images, audio, or videos.

One part (the generator) creates new examples. The other part (the discriminator) checks whether those examples look real or fake based on the training data. Through this back‑and‑forth process, the generator gets better at producing content that looks convincing.

GANs are often used for creating synthetic images, deepfakes, and artwork.
Generative AI (GenAI): Generative AI is a type of AI system that creates content—such as text, images, or audio—using patterns learned from large amounts of data. It can be helpful for creativity and problem‑solving, but it can also produce content that closely follows the style of real artists or creators, which raises questions about permission and fairness.
GPU (Graphics Processing Unit): A GPU is a special computer chip designed to handle lots of calculations at the same time, especially those needed to create images, animations, and graphics on screens. In AI, GPUs speed up the training of models by quickly performing the huge number of mathematical operations needed to learn from data.
Guardrails: Guardrails are the rules, limits, and safety measures built around an AI system to reduce the chances of it producing harmful, inaccurate, or inappropriate outputs. These can include filtered topics, restricted actions, safety checks, and instructions that guide how the system should respond.

Guardrails lower the risk of harm but cannot guarantee that every output will be correct or safe, so human oversight is still important.

H

Hallucination: An AI hallucination is when an AI system produces an output that is incorrect, unsupported by data, or entirely made up, even though it may sound plausible. This can include invented facts, false references, or descriptions of events that never occurred. Hallucinations happen because the system generates text based on patterns in data rather than verified knowledge.
Human-in-the-loop: Human‑in‑the‑loop (HITL) refers to an approach where humans are actively involved at key stages of an AI system’s operation. Humans stay involved to guide, supervise, or approve an AI system’s actions. This involvement can include checking outputs, correcting mistakes, providing feedback, or making final decisions. HITL is used to improve accuracy, increase safety, and ensure that important judgments are made by people rather than being left entirely to automated systems.

I

Image recognition: Image recognition is a technology that analyses an image and identifies what is shown in it, such as objects, people, animals, or scenes. It works by examining visual features — like shapes, colours, and patterns — and matching them to patterns learned from training data.

Examples of image recognition include self-driving cars that identify road markings and other vehicles, and medicine where it is used it to highlight areas in scans that might need checked by a human expert.
Inference: Inference is the step where a trained machine‑learning model uses what it learned from training data to produce an output (like a prediction, classification, answer, or label) for new, unseen input data.
Interpretability: Interpretability is the degree to which a person can understand how an AI model produces its outputs. It focuses on making the model’s internal processes — such as features, patterns, or steps — understandable enough that someone can see how a result was generated.

Interpretability is about how easily a person can understand how a model works internally — for example, which features it uses and how they influence the output.
Iterate: To iterate means to repeat a process or set of steps multiple times, often making small adjustments at each step, in order to improve a result or move closer to a goal.
In machine learning, models iterate during training by repeatedly updating their parameters to reduce errors.

K

Knowledge graph: A knowledge graph is a map that shows how different pieces of information are connected. It is made up of nodes (things like people, places, objects, ideas) and edges (the relationships between them). This structure shows how pieces of information link together, making it easier for computer systems to organise, search, and use that information.

L

Large language model (LLM): A large language model is a type of AI system that processes and generates text by using mathematical patterns learned from very large amounts of language data. It predicts the most likely next words or sentences based on those patterns, which allows it to answer questions, summarise text, translate languages, and create written content.
Linear Regression: Linear regression is a method that draws the best‑fitting straight line through a set of data points. Once the line is drawn, it can be used to predict a number, such as guessing someone’s height from their age or predicting temperature from the time of day.

M

Machine Learning (ML): Machine learning is a way of creating computer systems that find patterns in data and use those patterns to make predictions or decisions. Instead of programming every step, developers give the system lots of examples, and the system adjusts itself so it can handle new inputs in similar situations. ML is about using patterns in data to solve problems.
Matrix: A matrix is a table of numbers arranged in rows and columns used for organising and calculating data. Matrices are used in AI and machine learning to store data, represent images, or perform calculations that involve many numbers at once.
Misinformation: Misinformation is false or misleading information that is shared by mistake. The person sharing it believes it to be true and does not intend to cause harm. Misinformation can spread quickly, especially online, when people pass along incorrect facts, rumours, or misunderstandings without checking their accuracy.
Model: A model is a structure created during the training process of a machine‑learning system. It captures patterns and relationships found in the training data and uses them to produce outputs—such as predictions, classifications, or recommendations—when new data is provided.
A model is the result of training that maps inputs to outputs based on patterns in data.
Model Parameters: Model parameters are the numbers inside a machine‑learning model that control how a machine‑learning model transforms input data into an output. During training, these numbers are adjusted again and again so the model can reduce errors and better match the patterns in the training data.
Model Weights: Model weights are settings in a machine‑learning model that control how much influence each piece of input data has on the final output. During training, these weights are adjusted so the model can respond more accurately to different types of inputs. In many models (especially neural networks) weights play the biggest role in shaping how the model behaves.
Multimodal Model: A multimodal model is an AI system designed to process and combine more than one type of data — such as text, images, audio, or video — to produce outputs. It uses mathematical patterns from each data type and connects them so it can perform tasks that require information from multiple sources, like describing an image using text or answering questions about a video.

N

Natural language processing (NLP): Natural Language Processing is a field of AI that develops methods for computer systems to analyse, interpret, and generate human language using patterns found in text or speech data. It enables tasks like translation, speech recognition and text generation
Nearest-Neighbour: Nearest‑Neighbour is a method that makes decisions by looking at the most similar examples. When the system is given something new, it finds items in its dataset that are closest or most alike — and uses those to decide the label or category.

For example if you show it a new drawing of an animal, it looks for drawings that are most similar and uses those to guess what the new animal is. It decides based on the closest match.
Neural Network: A neural network is a layered system for finding patterns in data. Neural networks are a type of machine‑learning model made up of interconnected layers that process data step by step. Each layer changes the data slightly, helping the system find patterns such as shapes in images, words in text, or sounds in audio. Neural networks are used in many modern AI systems because they can handle complex tasks.

O

Optimise: To optimise means to adjust a model or system so that it performs as well as possible according to a specific goal. In machine learning, optimisation usually involves changing the model parameters to make it faster, more accurate, and more efficient.
Over-fitting/Under-fitting: Overfitting happens when a model matches the training data too closely, including noise or random details that are not useful for general patterns. As a result, the model performs well on the training data but poorly on new, unseen data.

Underfitting happens when a model is too simple to capture the important patterns in the training data. It performs poorly on both the training data and new data because it has not formed patterns that describe the problem well.

P

Perception: Perception in AI refers to the ways a system uses to collect and interpret information from its environment (like images, sound or sensor data) to form a representation of the world, so it can make informed decisions and take meaningful actions

It is the process of changing sensor data into structured information a system can act on.
Prediction: A prediction is an estimate or guess a model makes about something based on patterns in data. In machine learning, predictions can be things like the label for an image, the next word in a sentence, or a number such as tomorrow’s temperature.
Prompt: A prompt is the text or instructions given to a language model to specify the task you want it to perform. It provides the information or direction the system uses to generate an output.
Prompt engineering: Prompt engineering is the process of designing and refining prompts so a language model produces useful, accurate, or specific outputs. This can include choosing clear wording, ordering information, giving examples, or structuring the request in a particular way. It does not change the model itself — only the way the task is presented to it.

R

Reasoning model: A reasoning model is a type of AI system designed to handle tasks that require multi‑step thinking, problem‑solving, or following logical chains. Instead of just predicting the next word or pattern, it uses structured approaches—such as planning, checking, or breaking tasks into parts—to produce more reliable answers for complex problems.
Recommender system: A recommender system is a tool that suggests items to users based on patterns in data. It looks at things like past choices, similarities between items, or what groups of users prefer, and uses that information to recommend content such as videos, books, games, or products.
Reinforcement learning: Reinforcement learning is a type of machine learning where a system tries actions, receives feedback (called rewards or penalties), and uses that feedback to adjust its behaviour over time. Its aim is to find a sequence of actions that leads to the highest overall reward.
Examples include:
systems that learn to play games by improving their moves based on scores
robots that adjust actions to navigate a space
Reproducibility: Reproducibility means that an AI system’s results can be obtained again by someone else using the same data, the same settings, and the same steps. When a system is reproducible, its outputs are consistent and can be checked or verified by others. This helps build trust and makes it easier to identify errors or unexpected behaviour.
Robot: A robot is a machine that can perform tasks automatically, often using sensors, motors, and programmed instructions. Some robots follow fixed routines, while others can adjust their actions based on data from their environment. Robots can be used in places like factories, homes, hospitals, or outer space to carry out tasks that might be repetitive, difficult, or unsafe for humans.

Robots can be different shapes. Some robots are humanoid, but some are fixed mechanical arms that move objects, such as in a factory.
Rule-based / Symbolic AI: A rules‑based or symbolic AI system works by following clearly written instructions created by humans. These instructions—called rules—tell the system exactly what to do in every situation it can handle. The AI uses symbols, logic, and step‑by‑step reasoning to decide what actions to take.

S

Scraping: Scraping is the process of extracting specific data from web pages — such as text, images, or tables — using automated tools. The goal is to collect structured information from a site’s content. Scraped data may be processed using AI tools to filter, label, or group the extracted data.
Sign recognition: Sign language recognition is a technology that analyses hand shapes, movements, facial expressions, and body positions from video or sensor data and identifies them as signs from a sign language. It uses patterns learned from training examples to classify the signs and convert them into text or another form of output.
Small language model: A small language model is a lighter, faster AI system that needs less computing power to do language tasks. It can answer simple questions, summarise text, or help with writing, but on a smaller scale than a large language model.
Speech-to-text: Speech‑to‑text is a technology that processes spoken audio and converts it into written text. It analyses sound patterns in the speech signal and maps them to words based on models trained on audio and language data. Speech‑to‑text is used in captioning, transcription tools, voice assistants, and accessibility technologies.
Stochastic: Stochastic describes a process that involves some level of randomness or uncertainty. In AI and statistics, a stochastic method includes outcomes that can vary each time it runs, even when starting with the same conditions.
Supervised learning: Supervised learning is a type of machine learning where the training data includes the correct answers (labels). The model builds mathematical relationships between the inputs and the labels so it can map new, unseen data to the most likely label.
Examples include:

sorting emails into spam or not spam
recognising objects in images
predicting whether a review is positive or negative
Sycophancy: Sycophancy in AI refers to when a model adjusts its answers to agree with what it thinks the user wants to hear, even if the response becomes inaccurate. This happens because the model follows patterns in its training data that reward agreement, politeness, or positive tone, rather than correctness. The model produces overly agreeable answers instead of reliable ones.

T

Term: New definition
Text-to-speech: Text‑to‑speech is a technology that converts written text into spoken audio. It analyses the characters and words in the text, determines how they should be pronounced, and generates synthetic speech using pre‑built voice models. Text‑to‑speech is used in screen readers, navigation systems, voice assistants, and accessibility tools.
Token: A token is a small unit of text — such as a whole word, part of a word, or even punctuation — that a language model processes. Instead of handling text all at once, the model breaks it into tokens and works with those units to analyse or generate language.
Training: Training is the process where a machine‑learning model is repeatedly adjusted so it can map inputs to outputs effectively. During training, the model processes lots of examples, compares its predictions with the correct answers, and updates its parameters to reduce errors. Training shapes the model so it can make useful predictions on new data. The quality of training largely depends on the quality of the data used.
Training data: Training data is the set of examples used to train a machine‑learning model. Each example includes input information, and in supervised learning, the correct output label. The patterns in the training data influence how the model behaves when it is then used.
Transparency: Transparency means providing clear information about how an AI system was built, what data it uses, how it works at a high level, and what its limitations are. Transparent systems make it possible for people to understand important aspects of the system without revealing sensitive or confidential details.

U

Unsupervised learning: Unsupervised learning is a type of machine learning that works with data that has no labels. The model looks for structure or patterns in the data, such as grouping similar items together or finding common features.
Examples include:

clustering customers into groups with similar behaviour
grouping documents by common themes
Organising songs or films into groups based on features like tempo, genre, or mood, without labels.

V

Vector: A vector is an ordered list of numbers that represents data. In AI, vectors are used to represent information in a mathematical form, for example turning words, images, or other data into numbers so they can be processed by a model.
Video recognition: Video recognition is a technology that analyses video footage to identify and label what is happening in the frames. This can include detecting objects, actions, people, movements, or events over time. It works by processing each frame of the video, tracking changes between frames, and classifying patterns based on training data.
Voice assistant: A voice assistant is a software system that processes spoken instructions and provides responses or carries out tasks such as setting reminders, searching for information, or controlling devices. It uses speech recognition to convert spoken words into text to work out the requested action.
Voice recognition: Voice recognition is a technology that processes audio recordings of speech to identify the words being spoken, and converts them into text or machine‑readable data.

Voice recognition is used in tools like speech‑to‑text apps, captioning systems, and voice‑controlled devices.