Glossary

AI, Data & Analytics

472 terms in this category.

Removing model components to measure their contribution. Understanding what matters.

Abstractive Summarization

Generating new text summarizing source. Unlike extractive (copying sentences).

Abstractive Summary

Generating new text summarizing source material.

The percentage of correct predictions out of total predictions. Simple but misleading for imbalanced datasets where prec

Speech recognition component for audio signals.

Activation Function

A mathematical function determining if a neuron fires. ReLU (most common), sigmoid, tanh, and GELU. Introduces non-linea

Neural network layer output visualization.

Active Learning

ML technique where the model selects the most informative unlabeled examples for human annotation. Reduces labeling cost

Small trainable modules inserted into frozen model. Parameter-efficient fine-tuning.

Adversarial Attack

Inputs designed to fool ML models. Small, imperceptible image perturbations cause misclassification. Adversarial trainin

Agent Framework

Library for building AI agents. LangChain, CrewAI, AutoGen.

AI systems that operate with autonomy — planning multi-step tasks, using tools, and making decisions. Goes beyond Q&A to

Hardware optimized for AI. GPU, TPU, NPU.

An AI system that can plan, use tools, and take actions autonomously. Browses the web, writes code, manages files. Claud

Ensuring AI systems act according to human values and intentions. A core challenge as AI becomes more capable. RLHF and

Interactive AI helper. Claude, ChatGPT, Gemini.

Specialized processor for AI workloads. GPU, TPU.

Optimizing models for specific hardware targets.

Moral principles guiding AI development. Fairness, transparency, accountability.

Ensuring AI treats all groups equitably.

Policies and processes managing AI use. Risk management, compliance.

AI Infrastructure

Computing resources for training and inference.

Understanding AI capabilities and limitations.

Trained system making predictions from data.

End-to-end workflow from data to deployment.

Infrastructure for building AI. Vertex AI, SageMaker.

Model ability to draw logical conclusions.

Government rules for AI. EU AI Act, executive orders. Compliance requirements.

Scientific investigation advancing AI capabilities.

AI Responsibility

Accountable AI development and deployment.

Potential negative outcomes from AI systems.

Research ensuring AI systems are beneficial and don't cause unintended harm. Robustness, interpretability, and alignment

Organizational plan for AI adoption.

AI Transparency

Making AI decision process understandable.

Period of reduced AI funding and interest.

Systematic unfairness in algorithm outputs. Training data and design causes.

Labeling data for ML training. Bounding boxes, text spans, categories.

Anomaly Detection

Identifying unusual patterns that don't conform to expected behavior. Fraud detection, system monitoring, and quality co

Numerical measure of how unusual a data point is.

A distributed streaming platform. Pub/sub messaging, event sourcing, and log aggregation. Processes millions of events p

Distributed computing engine. Large-scale data processing. PySpark, Spark SQL.

API Endpoint (AI)

URL for model inference. POST /completions. Rate limited, authenticated.

Architecture Search

Automatically finding optimal neural network architecture. NAS.

Artificial Intelligence

A computing field focused on creating systems that simulate human intelligence. Includes machine learning, NLP, computer

Aspect-Based Sentiment

Sentiment about specific product aspects.

A component in transformer models computing attention over input sequences. Multi-head attention runs multiple attention

Attention Mechanism

Allows neural networks to focus on relevant parts of the input. Self-attention in Transformers weighs relationships betw

Attention Score

Weight indicating relevance between tokens. Higher score = more attention.

Automatically selecting models and hyperparameters. H2O, Auto-sklearn.

Automated Labeling

Using models to generate training labels.

Using technology to execute tasks without human intervention. CI/CD, scripts, cron jobs, and workflows. Zapier and n8n a

Autoregressive Model

A model generating output one token at a time, each conditioned on previous tokens. GPT and all decoder-based LLMs are a

Backpropagation

An algorithm calculating error gradients relative to each neural network weight, layer by layer, from back to front. Ess

Text as unordered word frequency counts.

Batch Inference

Processing multiple inputs simultaneously. Higher throughput than real-time.

Batch Normalization

Normalizing layer inputs during training. Stabilizes and speeds training.

Batch Processing

Processing large data volumes in scheduled blocks (daily, hourly). MapReduce, Spark, and dbt. Complementary to streaming

Number of samples per training step.

Bayesian Inference

Updating probability estimates as new evidence arrives. Prior belief + evidence = posterior belief. Used in spam filters

Decoding keeping top-K candidates at each step.

A standardized test measuring model performance. MMLU for knowledge, HumanEval for coding, HellaSwag for reasoning. Enab

Standardized model test. MMLU for knowledge, HumanEval for coding.

Bidirectional Encoder Representations from Transformers — Google's model understanding text by looking at context in bot

Systematic bias in a model producing unfair results. Can originate from biased training data, discriminatory features, o

Bias-Variance Tradeoff

Balance between model simplicity and flexibility. High bias = underfit.

Processing input in both directions. BERT reads left-to-right and right-to-left.

Bidirectional Model

Processing input in both directions. BERT.

Datasets so large or complex that traditional tools can't process them. Defined by the 5 Vs: Volume, Velocity, Variety,

Binary Classification

Predicting one of two classes. Spam/not spam, positive/negative.

A metric evaluating machine translation quality by comparing to reference translations. Higher is better. Used for MT bu

Business Intelligence

BI — using data to make informed business decisions. Dashboards, reports, and ad-hoc analysis. Power BI, Tableau, and Me

Categorical Variable

Variable with discrete categories. Color, country, type. Encode for ML.

Causal Inference

Determining cause-and-effect relationships from data, beyond correlation. A/B tests establish causality. Causal models (

Chain-of-Thought

A prompting technique where the model reasons step-by-step before answering. Dramatically improves accuracy on math, log

Software simulating human conversation. Rule-based (simple) or AI-powered (LLMs). ChatGPT, Claude, and Gemini are cuttin

Class Imbalance

When training data has unequal class distribution (99% negative, 1% positive). Models bias toward majority class. SMOTE,

An ML task assigning categories to data: spam/not-spam, cat/dog, positive/negative. Logistic regression, SVM, and random

Contrastive Language-Image Pre-training. Text+image.

Grouping similar data without predefined categories. K-means, DBSCAN, and hierarchical clustering. Used in customer segm

Convolutional Neural Network — neural network using convolution layers to detect spatial patterns. Dominant in image cla

Code Generation (AI)

AI writing code from descriptions. GitHub Copilot, Claude, GPT-4.

Cognitive Computing

AI systems simulating human thought processes.

Collaborative Filtering

Recommendation based on similar users' preferences. Netflix, Spotify.

Total computation allocated for training. Measured in GPU hours or FLOPs.

Computer Vision

AI field enabling computers to understand images and video. Object detection, segmentation, OCR, and face recognition. O

Conditional Generation

Generating content based on conditions. Text-to-image from prompt.

Confusion Matrix

A table showing predicted vs actual classifications. True positives, false positives, true negatives, false negatives. R

Constitutional AI

Training AI with principles. Model self-critiques and revises. Anthropic approach.

Content Moderation

AI filtering inappropriate content. Text, image, video classification.

The maximum number of tokens an LLM can process at once. GPT-4 has 128K, Claude has 200K tokens. Larger windows enable p

Continuous Variable

Numerical variable with infinite possible values. Temperature, price.

Contrastive Learning

Training models by comparing similar and dissimilar pairs. CLIP learns image-text associations, SimCLR learns visual rep

Conversational AI

AI engaging in human-like dialogue.

Convolutional Layer

Neural network layer detecting spatial patterns. Filters slide over input.

Statistical relationship between variables.

Cosine Similarity

A metric measuring similarity between two vectors by the cosine of their angle. 1 = identical, 0 = orthogonal, -1 = oppo

Cross-Entropy Loss

Classification loss function measuring probability error.

Cross-Validation

Technique splitting data into K folds, training on K-1 and validating on the remaining fold. Repeated K times. More reli

NVIDIA's parallel computing platform for GPU programming. Required for training deep learning models. cuDNN accelerates

A visual panel presenting key metrics and KPIs in one place. Real-time or periodic. Grafana for infra, Power BI for busi

Data Annotation Tool

Software for labeling training data.

Data Augmentation

Artificially expanding training data through transformations: rotation, flipping, cropping for images; paraphrasing for

Systematic data collection or representation error.

A metadata management tool organizing and documenting data assets. Data discovery, lineage, and governance. DataHub, Amu

Removing errors, duplicates, inconsistencies from datasets.

Data Collection

Gathering data for analysis or ML. Surveys, scraping, sensors, APIs.

Data Distribution

Statistical pattern of values in dataset.

When production data distribution changes from training data. Model performance degrades. Monitoring tools detect drift.

Data Engineering

Building systems for collecting, storing, processing data. Pipelines.

Moral principles for data collection and use.

Data Exploration

Initial investigation of datasets. Statistics, visualizations, patterns.

More usage generates more data improving model.

Data Governance

Policies and processes for managing data securely, compliantly, and with quality. Ownership, cataloging, lineage, and ac

Unequal representation of classes. Oversampling, undersampling, SMOTE.

Loading raw data into processing pipeline.

Data Integration

Combining data from multiple sources into unified view.

Annotating data with correct answers for supervised learning. Manual (Labelbox, Scale AI) or semi-automated. The quality

A repository storing raw data in any format (structured, semi, unstructured) at low cost. S3, Azure Data Lake, and Delta

Tracking data origin and transformations. Audit trail for compliance.

Subset of data warehouse for specific department or use case.

Decentralized data architecture by domain.

The process of discovering patterns and relationships in large datasets. Clustering, association, and classification are

Defining data structure and relationships. ER diagrams, dimensional modeling.

Data Normalization

Scaling features to standard range. Min-max, z-score normalization.

An automated sequence of steps moving data from source to destination. Ingestion, transformation, validation, and loadin

Data Pipeline (Detail)

Automated data movement: ingest, transform, load.

Infrastructure for data storage, processing, analysis.

Data Preprocessing

Preparing raw data for ML. Cleaning, encoding, scaling, splitting.

Protecting personal data. Anonymization, pseudonymization, consent.

Analyzing data quality and characteristics. Statistics, distributions.

Ensuring data is accurate, complete, consistent, and up-to-date. Poor quality data produces wrong insights. Great Expect

Selecting representative data subset.

An interdisciplinary field using statistics, programming, and domain knowledge to extract insights from data. Python, R,

Data Transformation

Converting data format or structure. Encoding, aggregation, normalization.

Data Versioning

Tracking dataset changes over time. DVC, LakeFS. Reproducibility.

Data Visualization

Graphical representation of data: charts, graphs, maps, and heatmaps. D3.js, Recharts, and Plotly for web. Good visualiz

A centralized system storing structured data from multiple sources for analysis. Snowflake, BigQuery, and Redshift are m

Cleaning and transforming messy data.

A structured collection of data used to train and evaluate ML models. Hugging Face Datasets and Kaggle are popular repos

Decision Boundary

Surface separating classes in feature space.

A tree-shaped model making predictions through sequential decisions. Interpretable and visual. Random forests and gradie

Transformer component generating output tokens. GPT is decoder-only.

A subset of ML using neural networks with multiple layers. Foundation of LLMs, image recognition, and content generation

Deep Reinforcement Learning

RL with deep neural networks.

Removing noise from data or images.

Fully connected neural network layer. Every neuron connects to all inputs.

Dependency Parsing

Analyzing grammatical sentence structure.

Deployment (ML)

Putting trained model into production. API serving, edge deployment.

Diffusion Model

A generative model that learns to denoise images. Starts from pure noise and iteratively removes it. Stable Diffusion, D

Dimensionality Reduction

Reducing the number of features while preserving important information. PCA, t-SNE, and UMAP. Enables visualization of h

GAN component distinguishing real from generated.

Training a smaller model (student) to mimic a larger model (teacher). The student achieves similar performance with fewe

AI processing documents: extraction, classification.

Document Embedding

Vector representation of entire document. Doc2Vec, sentence transformers.

Domain Adaptation

Adapting model from one domain to another. Transfer learning variant.

Domain Knowledge

Subject matter expertise informing model design.

Direct Preference Optimization — a simpler alternative to RLHF. Directly optimizes the model from preference data withou

Randomly disabling neurons during training. Regularization technique. Prevents overfitting.

Halting training when validation performance stops improving. Prevents overfitting by finding the optimal training durat

Running AI models directly on devices (phones, IoT, cameras) instead of the cloud. Lower latency, privacy, and works off

Running model predictions on edge devices.

Regression combining L1 and L2 regularization. Best of both.

Extract, Load, Transform — a modern variant where raw data is loaded first and transformed at the destination. More effi

Embedding Dimension

Number of values in embedding vector. 768 for BERT, 1536 for text-embedding-3.

Embedding Model

Model converting data to vector representations. text-embedding-3, CLIP.

Embedding Space

Vector space where similar items are close.

Numerical representation (vector) of text, images, or other data. Similar texts have close embeddings. Foundation of sem

Transformer component processing input. BERT is encoder-only.

Encoder-Decoder

Architecture where an encoder compresses input into a representation and a decoder generates output from it. BERT is enc

End-to-End Learning

Training complete system directly from input to output. No manual features.

Ensemble Method

Combining multiple models for better predictions. Bagging, boosting.

Entity Extraction

Identifying and extracting structured information from unstructured text. Names, dates, amounts, addresses. Combines NER

Connecting mentioned entities to knowledge base.

One complete pass through the entire training dataset. Models typically train for many epochs. Too few = underfitting, t

Examining model mistakes to improve performance. Confusion matrix, examples.

Extract, Transform, Load — extracting data from sources, transforming (cleaning, aggregating), and loading into a data w

Extract, Transform, Load data pipeline pattern.

Evaluation Metric

Measure of model performance. Accuracy, F1, BLEU, perplexity.

Experiment Tracking

Recording ML experiment parameters and results. MLflow, W&B.

XAI — techniques to make AI model decisions understandable by humans. SHAP, LIME, and attention maps explain why a model

Extractive Summarization

Selecting important sentences from source. No new text generated.

Extractive Summary

Selecting important sentences from source text.

The harmonic mean of precision and recall. Balances both metrics. F1 = 2 × (precision × recall) / (precision + recall).

Input variable for ML model. Age, income, pixel values. Feature engineering.

Feature Engineering

Creating, transforming, and selecting variables (features) to improve ML models. Normalization, categorical encoding, an

Feature Extraction

Deriving useful features from raw data. CNN features from images.

Feature Flag (ML)

Toggling model features in production.

Feature Importance

Ranking which features most influence predictions. SHAP, permutation.

Feature Scaling

Normalizing features to similar ranges.

Feature Selection

Choosing the most relevant variables for the model. Reduces overfitting, improves performance, and speeds up training.

A centralized repository for ML features. Consistent features across training and serving. Feast (open-source) and Tecto

Array of features representing a data point.

Federated Learning

Training models across decentralized devices without sharing raw data. Each device trains locally and shares model updat

User interactions informing model improvement.

Few-Shot Learning

Providing a few examples in the prompt to guide model behavior. The model generalizes from examples without fine-tuning.

Few-Shot Prompt

Prompt containing examples of desired behavior. In-context learning.

Adapting a pre-trained model for a specific task with additional data. More efficient than training from scratch. LoRA a

Floating Point Operation. Measures computation. GigaFLOPs, TeraFLOPs.

Foundation Model

A large AI model trained on broad data, adaptable to many tasks. GPT-4, Claude, Llama, and Stable Diffusion are foundati

Frequency Analysis

Analyzing occurrence patterns in data.

Model whose weights aren't updated during training. Only added layers train.

Generative Adversarial Network — two neural networks (generator and discriminator) competing. Generator creates fakes, d

Gaussian Distribution

Normal distribution bell curve. Common in nature.

Model performing well on unseen data. Goal of ML training.

AI creating new content: text, images, code, music. LLMs, diffusion models.

Generator (GAN)

GAN component creating synthetic data.

Genetic Algorithm

A metaheuristic inspired by natural evolution. Candidate solution populations evolve through selection, crossover, and m

Generative Pre-trained Transformer — OpenAI's autoregressive language model family. Predicts the next token. GPT-4 power

GPU (Computing)

Graphics Processing Unit — a processor with thousands of cores optimized for parallel operations. Essential for training

Multiple GPUs for parallel training. A100, H100.

Partial derivative of loss with respect to parameters. Direction for optimization.

Gradient Clipping

Limiting gradient magnitude to prevent explosion. Stabilizes training.

Gradient Descent

An optimization algorithm adjusting model parameters in the direction that minimizes error. SGD, Adam, and AdaGrad are v

Graph Neural Network

Neural networks operating on graph-structured data. Nodes exchange information with neighbors. Applications: social netw

Greedy Decoding

Selecting highest probability token at each step.

Exhaustive hyperparameter combination testing.

Correct labels in training data. What model should predict.

Connecting AI outputs to verified sources of truth. Search-augmented generation, citations, and fact-checking ground res

Constraints on AI behavior. Content filters, output validation, safety checks.

When an AI model generates confident but factually incorrect information. A major challenge for LLMs. RAG, grounding, an

Neural network layer between input and output. 'Deep' = many hidden layers.

Hugging Face (Platform)

ML model hub with 500K+ models. Transformers library, Datasets, Spaces for demos. The GitHub of machine learning.

Hugging Face Spaces

Free hosting for ML demos on Hugging Face. Supports Gradio, Streamlit, and Docker. Share interactive models with the com

Human in the Loop

Human involvement in AI decision process. Review, correction, approval.

Combining keyword and semantic search. BM25 + embeddings. Better retrieval.

A parameter set before training begins, not learned from data. Learning rate, batch size, number of layers, and dropout

Image Augmentation

Creating training variations: flip, rotate, crop.

Image Classification

Assigning category to entire image. Cat/dog, malignant/benign.

Image Generation

Creating images from descriptions. Stable Diffusion, DALL-E, Midjourney.

Image Recognition

Identifying objects or patterns in images.

Image Segmentation

Classifying each pixel in an image into categories. Semantic (class per pixel), instance (individual objects), and panop

Generating text describing image content. Captioning, OCR.

Imitation Learning

Learning from expert demonstrations.

In-Context Learning

Learning from examples in the prompt.

Running a trained model to generate predictions or outputs. Different from training. Inference optimization (batching, c

Information Retrieval

Finding relevant documents from a large collection. Search engines, recommendation systems, and RAG. BM25 (keyword) and

Intent Classification

Determining user's purpose from text.

Interpretability

Understanding why model made a prediction. SHAP, attention visualization.

Jupyter Notebook

An interactive document combining code, visualizations, and text. Standard for data science exploration. JupyterLab, Goo

K-Fold Cross-Validation

Splitting data into K equal parts, training K times with each part as validation. Provides robust performance estimates.

Clustering algorithm partitioning data into K groups by distance.

K-Nearest Neighbors — classifies data points based on the K closest training examples. Simple, no training phase. Used f

Structured information repository for AI queries.

Knowledge Distillation

Training small model to mimic large one.

Knowledge Graph

A structured representation of entities and their relationships. Google Knowledge Graph powers search cards. Neo4j store

The target value associated with each training example in supervised learning. In image classification, the label is the

Incorrect labels in training data. Human annotation errors or systematic biases. Degrades model quality. Confident learn

Model predicting next tokens given context. GPT, Claude, Llama.

Language Understanding

AI comprehending text meaning and intent.

Large Language Model

LLM — massive transformer model trained on text. Billions of parameters.

Time to generate a model prediction. Critical for real-time applications. Batching, quantization, caching, and model dis

Latent Diffusion

Diffusion in compressed latent space. Faster than pixel-space. Stable Diffusion.

A compressed representation of data learned by a model. Similar items are close together in latent space. Used in embedd

Neural network building block. Dense, convolutional, attention, normalization.

Ranking models by benchmark performance.

Plot of model performance vs training data amount.

The step size for updating model weights during training. Too high = divergence, too low = slow training. Learning rate

Learning Rate Schedule

Adjusting learning rate during training.

Linear Regression

Predicting continuous values with linear relationship. y = mx + b.

Large Language Model — a deep learning model trained on vast amounts of text. GPT-4, Claude, Llama, and Gemini are examp

LLM with tool use and planning capabilities.

Standardized test comparing language models.

Assessing LLM quality across dimensions: accuracy, helpfulness, harmlessness, and honesty. Human evaluation, automated b

LLM Fine-Tuning

Adapting LLM with domain-specific data.

Deploying LLM for inference. vLLM, TGI.

Logistic Regression

A classification algorithm predicting probability of a binary outcome. Despite the name, it's classification not regress

Raw model output before softmax. Unnormalized score.

Models handling very long inputs. 200K+ tokens. Document analysis.

Low-Rank Adaptation — efficient fine-tuning method adding small trainable matrices to frozen model weights. Dramatically

A function measuring how wrong a model's predictions are. Cross-entropy for classification, MSE for regression. Training

Visualization of loss function across parameter space.

Low-Rank Adaptation

LoRA efficient fine-tuning technique.

Machine Learning

A subset of AI where systems learn patterns from data without explicit programming. Supervised, unsupervised, and reinfo

Ensemble combining predictions by voting. Multiple models, take mode.

Distributed processing: map parallel, reduce aggregate.

Probabilistic model where next state depends only on current.

Masked Language Model

Training by predicting masked tokens. BERT's pre-training objective.

Matrix Multiplication

Core mathematical operation in neural networks. GPU-accelerated.

Maximum Likelihood

Estimation finding parameters maximizing data probability.

Model Context Protocol — an open standard by Anthropic for connecting AI models to external tools and data sources. Enab

Mean Squared Error

MSE — average of squared prediction errors. Regression loss function.

Learning to learn. Few-shot adaptation from prior tasks.

A subset of training data processed together in one forward/backward pass. Batch size 32-256 is typical. Balances comput

Mixture of Experts

Architecture routing inputs to specialized sub-networks. Efficient scaling.

DevOps practices applied to machine learning: model versioning, training pipelines, monitoring, and deployment. MLflow,

HTTP interface for model predictions. POST /predict.

Model Architecture

Structure defining how model processes data. Layers, connections, dimensions.

Systematic prediction error from training data.

Documentation describing an ML model's intended use, performance, limitations, and ethical considerations. Standardized

Model Checkpoint

Saved model state during training. Resume training, select best epoch.

Model Complexity

Number of parameters and architectural depth.

Model Compression

Reducing model size. Quantization, pruning, distillation.

Model Deployment

Moving trained model to production environment.

Model performance degrading over time. Data distribution changes.

Model Evaluation

Assessing model quality on test data.

Model Explainability

Understanding model decision factors.

Model Fine-Tuning

Training pre-trained model on specific data. Adapts to task.

Model Inference

Using trained model for predictions. Optimization for speed and cost.

Model Monitoring

Tracking model performance in production.

Model Optimization

Improving model speed, size, or accuracy.

Model Parameter

Learned values during training. Weights and biases. Billions in LLMs.

Sequence of preprocessing and model steps. Scikit-learn Pipeline.

Removing unimportant weights/connections. Smaller, faster models.

A centralized repository tracking model versions, metadata, and deployment status. MLflow Model Registry and Weights & B

Model Selection

Choosing best model for specific task.

Deploying trained models to handle inference requests. TensorFlow Serving, TorchServe, and vLLM. Batching, caching, and

The process of feeding data to an ML model so it learns patterns. Involves forward pass, loss calculation, and backpropa

Model Validation

Evaluating model on held-out data during training.

Model Versioning

Tracking model iterations. MLflow, W&B.

Computational methods using random sampling to obtain numerical results. Monte Carlo simulation estimates probabilities.

Classification with 3+ categories. Softmax activation. One-vs-all.

Each input can have multiple labels simultaneously.

Processing multiple data types: text, image, audio.

Multi-Task Learning

Training model on multiple tasks simultaneously. Shared representations.

AI models processing multiple data types: text, images, audio, video. GPT-4V, Claude, and Gemini understand both text an

Proper noun: person, organization, location, date.

Named Entity Recognition

NER — identifying and classifying named entities in text: persons, organizations, locations, dates. SpaCy and Hugging Fa

Natural Language

Human communication language as opposed to code.

Natural Language Generation

NLG — AI generating human-like text. Chatbots, summarization.

Natural Language Processing

NLP — AI understanding and generating human language.

Natural Language Understanding

NLU — AI understanding the meaning and intent behind text. Sentiment analysis, intent classification, and slot filling.

Negative Sampling

Training with randomly selected negative examples.

Neural Architecture

Design of neural network layers and connections.

A computational model inspired by the human brain. Artificial neurons organized in layers process data. CNNs for images,

Performance improving with more compute and data.

Natural Language Processing — an AI subfield enabling computers to understand and generate human language. Foundation of

Random variation in data. Training noise can help generalization.

Scaling data to a standard range (0-1 or mean=0, std=1). Improves model training convergence. Batch normalization and la

Numeric Feature

Numerical input variable. Age, price, temperature. May need scaling.

Python library for numerical computing with multi-dimensional arrays. Foundation of the Python scientific ecosystem. Vec

Object Detection

Locating and classifying objects within images. YOLO (You Only Look Once), Faster R-CNN, and DETR. Applications: autonom

Object Recognition

Identifying objects in images. Classification + localization.

Optical Character Recognition — converting images of text into machine-readable text. Tesseract (open-source), Google Vi

Offline Evaluation

Evaluating model on historical data. Before deployment.

One-Hot Encoding

Representing categorical variables as binary vectors. Cat = [1,0,0], Dog = [0,1,0]. Creates sparse high-dimensional data

Online Learning

Model updating continuously with new data. Adapts in real-time.

Open Source Model

Publicly available model weights. Llama, Mistral, Stable Diffusion.

Open Vocabulary

Model handling words not in training vocabulary.

Algorithm updating model weights. Adam, SGD, AdamW. Controls learning.

Out-of-Distribution

Data different from training distribution. Model may fail.

Data point significantly different from others.

When a model memorizes training data instead of learning generalizable patterns. Performs well on training but poorly on

Overfitting Detection

Identifying when model memorizes training data.

Google large language model family.

Python library for tabular data manipulation and analysis. DataFrames are the central structure. Read CSV, filter, aggre

Parameter Count

Number of trainable values in model. GPT-4 estimated 1.7T parameters.

Parameter-Efficient Fine-Tuning

PEFT adapting models with few new params.

Part-of-Speech Tagging

Labeling words as noun, verb, adjective, etc.

Pearson Correlation

Statistical measure of linear relationship strength.

The simplest neural network — a single neuron with weighted inputs and an activation function. Can learn linearly separa

Language model evaluation metric. Lower = better predictions.

Sequential data processing and model steps.

Reducing spatial dimensions in CNN. Max, average.

The target class in binary classification.

Microsoft's BI platform. Interactive dashboards, DAX language, and integration with Excel and Azure. Dominant in compani

Training model on large general dataset before fine-tuning.

Of all positive predictions, how many were actually positive. High precision = few false positives. Important when false

Model output for given input. Classification label or regression value.

Prediction Interval

Range where future predictions likely fall.

Predictive Model

A statistical or ML model trained to predict future outcomes based on historical data. Regression, classification, and t

Preprocessing Pipeline

Chained data cleaning and transformation steps.

Principal Component Analysis

PCA — dimensionality reduction finding orthogonal directions of maximum variance. Reduces features while retaining most

Probability Distribution

Function describing likelihood of outcomes.

Production Model

Model deployed and serving live predictions.

Input text given to language model. Instructions, context, examples.

Prompt Engineering

The art of crafting effective instructions for LLMs. System prompts, few-shot examples, chain-of-thought, and structured

Prompt Injection

Manipulating AI via malicious prompt input.

Prompt Template

Reusable prompt structure with variable placeholders.

Removing unnecessary model weights. Smaller, faster with minimal quality loss.

A versatile language dominating data science, ML, automation, and backend. Simple syntax, massive ecosystem (pip), and h

Meta's deep learning framework. Dynamic computation graphs, Pythonic API, and strong in research. Dominant in academia.

Reducing model precision (32-bit to 8-bit or 4-bit) to decrease size and speed up inference. GPTQ, GGUF, and AWQ are qua

A language specialized in statistics and data analysis. ggplot2 for visualization, tidyverse for manipulation. Popular i

Retrieval-Augmented Generation — combining LLMs with external knowledge retrieval. The model searches a database before

Retrieval-Augmented Generation workflow. Query → retrieve → generate.

An ensemble of decision trees each trained on random data subsets. Reduces overfitting through averaging. Robust, interp

Random hyperparameter combination testing.

Real-Time Analytics

Analyzing data the moment it's generated. Live dashboards, alerts, and instant decisions. ClickHouse, Druid, and Materia

Of all actual positives, how many were correctly identified. High recall = few false negatives. Important when missing p

Proportion of relevant items in top K results.

Recommender System

ML system suggesting relevant items to users. Collaborative filtering (users who liked X also liked Y) and content-based

An ML task predicting continuous numerical values: house price, temperature, sales. Linear regression, polynomial, and g

Techniques preventing overfitting by penalizing model complexity. L1 (Lasso) encourages sparsity, L2 (Ridge) penalizes l

Reinforcement Learning

ML where an agent learns by trial and error, receiving rewards or penalties. Foundation of AlphaGo, robotics, and RLHF t

Reinforcement Learning Environment

The world an RL agent interacts with. OpenAI Gym, MuJoCo for robotics, Atari for games. The agent takes actions, receive

Representation Learning

Learning useful data representations automatically. Deep learning core.

Creating new samples from existing data.

Residual Connection

Skip connection adding input to layer output.

Retrieval Model

Model finding relevant documents for queries.

Model scoring outputs for RLHF. Trained on human preferences.

Reinforcement Learning from Human Feedback — training AI models using human preferences. Humans rank outputs, a reward m

Recurrent Neural Network — neural network processing sequential data. Hidden state carries information across timesteps.

Plot of true positive vs false positive rates.

Robotic Process Automation — bots automating repetitive tasks in graphical interfaces: filling forms, extracting data, p

Number of data points in dataset. More data usually better performance.

Sampling Strategy

Method for selecting training data subsets.

Predictable relationship between compute/data/params and performance.

Python ML library with classification, regression, clustering, and preprocessing algorithms. Consistent interface (fit/p

Self-Supervised Learning

Learning from unlabeled data. Masked prediction, contrastive.

Semantic Search

Search understanding meaning rather than just keywords. Uses embeddings to find conceptually similar content. Vector dat

Semantic Similarity

Measuring meaning closeness between texts.

Semi-Supervised Learning

Learning from mix of labeled and unlabeled.

Sentiment Analysis

Determining emotional tone in text: positive, negative, or neutral. Used for brand monitoring, customer feedback, and so

Sequence-to-Sequence

Input sequence to output sequence. Translation, summarization.

Sigmoid Function

Activation squashing output to 0-1 range.

Function converting logits to probability distribution.

Model with many zero-valued parameters.

Converting spoken audio to written text. Whisper (OpenAI, open-source), Google Speech-to-Text, and AWS Transcribe. Found

Statistical Test

Method determining if results are significant.

Common words removed from text processing.

Stratified Split

Maintaining class proportions when splitting data.

Continuous real-time data processing as it arrives. Kafka, Flink, and Spark Streaming. Different from batch processing w

Structured Output

Model generating data in specific format. JSON, XML, function calls.

Subword Tokenization

Breaking words into subword units. BPE.

Supervised Learning

ML where the model trains with labeled data (input → expected output). Classification and regression are tasks. The mode

Artificially generated data mimicking real data. Train models when real data is scarce or sensitive. GANs and simulation

Instructions defining AI behavior. Context, rules, persona.

Data visualization and BI platform. Drag-and-drop to create complex visualizations. Strong in visual data exploration. A

Data organized in rows and columns. CSV, databases.

Target Variable

Value model tries to predict. Label.

Task-Specific Model

Model trained for one specific task. More efficient than general.

Large model training smaller student model.

A parameter controlling LLM output randomness. Low temperature (0.0) = deterministic, predictable. High temperature (1.0

A multi-dimensional array of numbers — generalization of vectors and matrices. TensorFlow and PyTorch operate on tensors

Google's deep learning framework. Keras as high-level API, TensorBoard for visualization, TFLite for mobile. Complete ec

Data used only for final evaluation. Never seen during training.

Text Classification

Assigning categories to text. Sentiment, topic, intent.

Vector representation of text. Semantic meaning captured.

Text Generation

Creating new text from context. LLMs, autocomplete, creative writing.

Extracting information from text documents. NLP techniques.

Generating images from text descriptions.

Converting written text to spoken audio. ElevenLabs, Google TTS, and Amazon Polly. Neural TTS produces natural-sounding

Converting natural language to SQL queries.

Term frequency-inverse document frequency weighting.

Data points ordered by time. Stock prices, sensor readings, and website traffic. Forecasting with ARIMA, Prophet, and ne

The basic unit LLMs process. A word, subword, or character depending on the tokenizer. GPT-4 tokenizes roughly 4 charact

Tokenization (NLP)

Splitting text into processable tokens.

Splits text into tokens (words, subwords, or characters) for model processing. BPE (Byte-Pair Encoding) is common. Diffe

AI calling external functions. Search, calculator, API. Function calling.

Selecting from K most likely next tokens.

Nucleus sampling — selects from the smallest set of tokens whose cumulative probability exceeds P. Top-P 0.9 means consi

Tensor Processing Unit — Google's custom chip optimized for tensor operations. More efficient than GPUs for certain ML w

Train/Test Split

Dividing data into training and testing sets (typically 80/20). Train on training data, evaluate on unseen test data. Pr

Data used to train model. Quality and quantity matter enormously.

Iterative process: forward pass, compute loss, backward pass, update weights.

Data subset used for model learning.

Transfer Learning

Using a model trained on one task as a starting point for another. Fine-tuning BERT for sentiment analysis or GPT for co

The neural network architecture behind modern AI. Self-attention mechanism processes all tokens in parallel. GPT, BERT,

Transformer Block

Self-attention plus feed-forward layer unit.

Truncation (ML)

Cutting sequences to maximum model length.

Adjusting model for better performance.

False positive: incorrectly rejecting null hypothesis.

False negative: incorrectly accepting null hypothesis.

A model too simple to capture data patterns. Performs poorly on both training and new data. Solution: more complex model

Unsupervised Learning

ML without labeled data. The model discovers patterns and structures on its own. Clustering (K-means), dimensionality re

Variational Autoencoder — a generative model learning compressed representations (latent space) of data. Used in image g

Data for tuning during training. Separate from test set.

A model's sensitivity to fluctuations in training data. High variance = overfitting. The bias-variance tradeoff is funda

Variational Inference

Approximate Bayesian inference technique.

Ordered list of numbers. Embeddings are vectors. Distance measures similarity.

Vector Database

A database optimized for storing and searching embeddings. Pinecone, Weaviate, ChromaDB, and pgvector. Essential for RAG

Data structure for fast nearest-neighbor search. HNSW, IVF, flat.

Finding similar vectors by distance. Cosine similarity, Euclidean.

Vision Transformer

ViT — applying transformer architecture to images. Patches as tokens.

Set of tokens known to a model. Tokenizer defines vocabulary.

Learnable parameter in neural network. Multiplied with inputs.

Weight Initialization

Setting initial values for model weights before training.

Dense vector representations of words where similar words have similar vectors. Word2Vec, GloVe, and FastText. Precursor

Early word embedding model. Skip-gram, CBOW.

Gradient boosting library. Fast, accurate, tabular data champion.

Zero-Shot Learning

Asking the model to perform a task without any examples. Relies on the model's pre-trained knowledge. Works well for com

Zero-Shot Prompt

Prompt without examples. Model uses pre-trained knowledge only.

Advertise with Us

Put your brand in front of 10,000+ tech professionals

Native placements that feel like recommendations. Newsletter, articles, banners, and directory features.

Stay ahead of the tech curve

Join 10,000+ professionals who start their morning smarter. No spam, no fluff — just the most important tech developments, explained.