Categories Machine Learning

Sequential Decision Analytics

Press enter or click to view image in full size

Let’s Start With a Problem You Can’t Solve

You’re managing a fleet of 500 trucks. Each driver has 15 attributes (location, hours of service, equipment type, certifications, home base). You have 800 loads that need moving today, and you’ll get 200 more tomorrow — except you don’t know which 200.

How many possible states exist in your system? 10²⁰.

How many possible decisions? 10³⁰⁰.

Your integer programming course didn’t prepare you for this.

Not because the math was wrong. Because the framing was wrong.

You don’t need the optimal decision for this exact moment. You need a systematic way to make good decisions, repeatedly, as new information keeps arriving.

Welcome to sequential decision analytics.

The Core Insight: We’re Searching for Functions, Not Vectors

For 70 years, optimization has meant one thing:

min C(x) subject to Ax ≤ b

Find the vector x that minimizes cost. Elegant. Powerful. And fundamentally limited.

Because real problems look like this:

Decision → Information → Decision → Information → Decision → …

You make a decision. The world responds. New information arrives. You decide again. What you did yesterday constrains what you can do today. What you do today affects your options tomorrow.

You’re not searching for the best vector x. You’re searching for the best policy π — a function that maps states to decisions. A Universal Framework for Sequential Decision Problems | ORMS Today

That’s not a small change. That’s a completely different mathematical object. We went from searching over R^n (vector spaces) to searching over function spaces.

And we don’t have a “simplex algorithm for functions.”

The Five Elements: A Universal Modeling Framework

Here’s what makes sequential decision analytics universal: any sequential decision problem can be modeled using exactly five elements: state variables, decision variables, exogenous information variables, a transition function, and an objective function. A Universal Framework for Sequential Decision Problems | ORMS Today

Let me show you with that trucking problem:

State (St): Everything you know when making decision t

  • Where each driver is right now
  • Their hours of service remaining
  • Which loads are available
  • Time until each load must be picked up

Decision (xt): What you control

  • Which driver gets which load
  • Who deadheads to a better location
  • Who goes home for the weekend

Exogenous Information (Wt+1): What the world throws at you

  • New loads arrive
  • Drivers call in sick
  • Traffic delays deliveries
  • Customers cancel orders

Transition Function: How the state evolves

  • St+1 = f(St, xt, Wt+1)
  • Driver locations update after assignments
  • Hours of service decrease
  • Load boards refresh

Objective Function: What you’re trying to maximize

  • Revenue minus costs over a planning horizon
  • Expected profit accounting for uncertain future

This framework works for trucking. It works for inventory management. It works for medical treatment protocols. It works for energy storage. It works for financial trading.

The problems span finance, energy, transportation, health, e-commerce, supply chains, and include pure learning problems that arise in laboratory or field experiments. PrincetonPrinceton

Same five elements. Every time.

The Jungle: 15 Fields, 8 Notations, No Common Language

Here’s where it gets interesting.

Sequential decision problems have attracted at least 15 distinct fields of research, using eight distinct notational systems, producing a vast array of analytical tools. Wiley Online LibraryAmazon

You’ve got:

  • Dynamic programming (calling it “Bellman equations”)
  • Reinforcement learning (calling it “Q-learning”)
  • Stochastic programming (calling it “scenario trees”)
  • Optimal control (calling it “model predictive control”)
  • Stochastic search (calling it “multi-armed bandits”)
  • Simulation optimization (calling it “what-if analysis”)

Each community has its own conferences. Its own journals. Its own notation. Its own blind spots.

The dynamic programming people think in discrete states and backwards recursion. The control theorists think in continuous state spaces and differential equations. The stochastic programming folks think in scenario trees. The reinforcement learning crowd thinks in neural networks.

They’re all solving the same problem. They just don’t know it.

While deterministic optimization enjoys an almost universally accepted canonical form, stochastic optimization has been described as a jungle of competing notational systems and algorithmic strategies. SiamSemantic Scholar

The Four Classes: Every Method Ever Invented

After working on these problems for 40 years — trucking, rail, energy, healthcare, finance, materials science — a pattern emerged.

Every method for making decisions, whether from academic literature or used in practice, falls into exactly four fundamental classes. Reinforcement Learning and Stochastic Optimization: A unified framework for stochastic optimization — CASTLE

Not four popular methods. Not four recommended approaches. Four classes that exhaust the space of possibilities.

Class 1: Policy Function Approximations (PFAs)

Direct mapping from state to action.

Structure: xt = Xπ(St|θ)

You look at the state, apply a function, get a decision. No optimization. No simulation. Just input → output.

Examples:

  • “Assign driver to nearest profitable load”
  • “Order inventory when stock drops below reorder point”
  • “Treat patient based on symptom profile”
  • Neural network that outputs actions

Why it works:

  • Fast. Milliseconds to make decisions.
  • Scalable. Works with 10,000 drivers as easily as 10.
  • Robust. No optimization solver to crash.

Why academics ignore it:

  • “Too simple.”
  • “Not optimal.”
  • “Just a heuristic.”

Why industry loves it:

  • It works.
  • It’s fast.
  • You can actually deploy it.

The challenge? Designing good features (the φ functions) and tuning parameters (the θ values). That’s where the art is.

Class 2: Cost Function Approximations (CFAs)

Use an optimization model as your policy.

Structure: xt = arg min C̄(St, x|θ)

CFAs are parameterized optimization models. Widely used in industry on an ad-hoc basis, they have been largely overlooked by the research literature. Reinforcement Learning and Stochastic Optimization: A unified framework for stochastic optimization — CASTLE

This is where you take everything you learned about optimization and use it as a method for making decisions.

Example: Every day at 6am, you solve an optimization model:

  • Ignore uncertainty (use expected values)
  • Plan 7 days ahead
  • Implement day 1
  • Tomorrow, do it again with updated information

This is not solving the full stochastic problem. This is using a simplified optimization model as a decision rule.

Why this is brilliant:

  • Leverages 70 years of algorithm development
  • Scales way better than you’d expect
  • Handles complex constraints naturally
  • Easy to explain to management

CFAs feature much simpler scaling issues than the simpler PFAs, despite what intuition might suggest. Reinforcement Learning and Stochastic Optimization: A unified framework for stochastic optimization — CASTLE

Why academics miss it: They’re looking for “the optimal policy.” They don’t realize that a parameterized optimization model IS a policy — and often a very good one.

Class 3: Value Function Approximations (VFAs)

Learn the value of being in each state.

Structure: xt = arg min [C(St, x) + γ·V̄(St+1)]

VFAs cover all methods based on Bellman’s equation, which approximates the downstream value of landing in a state. A Universal Framework for Sequential Decision Problems | ORMS Today

This is the academic favorite. Reinforcement learning. Approximate dynamic programming. Q-learning. Deep RL. All VFAs.

The idea:

  • Estimate V(s) = “value of being in state s”
  • Choose actions that minimize immediate cost + future value
  • Update your estimates based on experience

Why it’s powerful:

  • Theoretically grounded
  • Provably converges (under conditions)
  • Can find truly optimal policies (in theory)

Why it’s hard:

  • Curse of dimensionality. Need to estimate V(s) for every state.
  • Sample complexity. Needs millions of experiences to learn.
  • Exploration. Must try suboptimal actions to learn they’re bad.
  • Computational cost. Training takes forever.

This approach has attracted tremendous attention under names such as approximate dynamic programming, adaptive dynamic programming, neurodynamic programming, and most commonly today, reinforcement learning. A Universal Framework for Sequential Decision Problems | ORMS Today

The gap between theory and practice is largest here. Beautiful algorithms that work on GridWorld. Struggle on real problems.

Class 4: Direct Lookahead Approximations (DLAs)

Plan into the future to decide what to do now.

DLAs explicitly plan into the future to help make a decision now, and split into two subclasses: deterministic and stochastic lookaheads. A Universal Framework for Sequential Decision Problems | ORMS Today

4a. Deterministic Lookahead:

  • Ignore uncertainty
  • Plan N steps ahead using expected values
  • Optimize over the deterministic forecast
  • Implement first decision
  • Re-plan tomorrow

Also called: rolling horizon, receding horizon, model predictive control.

Industry loves this. It works. It’s intuitive. It uses standard optimization tools.

4b. Stochastic Lookahead:

  • Model uncertainty with scenarios
  • Build a scenario tree
  • Optimize over the tree
  • Implement first-stage decision

This is classical stochastic programming. Two-stage models. Multi-stage models. Scenario trees.

Works well when:

  • Future has limited scenarios
  • Planning horizon is short (2–3 stages)
  • Can solve the tree efficiently

Falls apart when:

  • Too many scenarios → computational explosion
  • Long planning horizon → tree becomes unmanageable
  • High-dimensional state space → can’t enumerate scenarios

The Critical Insight: Different Data, Different Winners

Here’s what changed my thinking.

Even for the same problem, each of the four classes of policies may work best, depending on the data. In the context of an energy storage problem, five different problem instances were created and solved using each of the four classes plus a hybrid. Each policy worked best on one of the five problems. From the jungle of stochastic optimization to… Sequential Decision Analytics — CASTLE

Read that again.

Same problem. Different data. Different winner.

Sometimes a simple PFA rule beats everything. Sometimes you need VFA learning. Sometimes CFA optimization is best. Sometimes DLA planning wins.

You can’t know beforehand. You have to test.

And often, the answer is: combine them.

  • Use PFA for fast decisions
  • Use CFA for complex constraints
  • Use VFA to learn patterns
  • Use DLA when you have time to plan

Hybrid policies beat pure methods more often than not.

Real Applications: From Theory to Practice

Trucking: 66,000 Drivers, Real-Time Decisions

As of 2011, systems based on this framework were being used to dispatch over 66,000 drivers for 20 of the largest truckload carriers in the U.S. PrincetonPrinceton

The problem: Match drivers to loads, accounting for:

  • Hours of service regulations
  • Driver home time preferences
  • Equipment compatibility
  • Future positioning needs
  • Uncertain future demand

Modeling a truck driver requires a 15-dimensional attribute vector (location, domicile, hours of service, hazmat flags, citizenship, equipment characteristics, etc.). With spatial dimensions, this quickly introduces 10²⁰ different combinations. Commentary: Optimizing a truck fleet using artificial intelligence — FreightWaves

Traditional optimization? Can’t handle it. Dynamic programming? Curse of dimensionality. Reinforcement learning? Too slow.

The solution: Hybrid CFA/VFA approach.

  • Use approximate value functions to estimate future opportunity cost
  • Embed in optimization model as dual costs
  • Solve daily assignment problem
  • Update value estimates based on outcomes

Results: One customer with 120 trucks saved $4 million in year one through better utilization. They sold off about 20 vehicles they didn’t need anymore. Startup focus: Optimal Dynamics drives logistical decisions with AI | Office of Innovation

Dispatchers now plan 60% more freight in the same time while exceeding operational efficiency goals. Fleet managers moved 80% away from routine planning tasks. Optimal Dynamics

Energy: 175,000 Time Periods, Grid-Scale Optimization

The SMART energy resource planning model handled 175,000 time periods for grid-scale optimization. Biography — CASTLE

Energy storage decisions:

  • When to charge batteries (electricity cheap?)
  • When to discharge (prices high?)
  • Account for degradation
  • Handle renewable intermittency
  • Optimize over 20-year horizon

You can’t enumerate 175,000 states. Dynamic programming fails immediately.

Solution: Approximate dynamic programming with post-decision state variables. Update value functions recursively. Handle massive state spaces with function approximation.

Works. Scales. Deploys.

Healthcare: Personalized Treatment Protocols

Applications have included drug discovery, drug delivery, blood management, dosage decisions, personal health, and health policy. CASTLE — Computational Stochastic Optimization and Learning

Treatment sequencing:

  • Patient responds to drug A → continue or switch to B?
  • Monitor biomarkers → adjust dosage?
  • Side effects appear → what’s optimal action?

This is exactly a sequential decision problem:

  • State: Patient characteristics, treatment history, current biomarkers
  • Decision: Drug choice, dosage level
  • Information: How patient responds (uncertain)
  • Objective: Maximize health outcomes over time

Different patients need different policies. Some respond to standard protocols (PFA). Others need personalized optimization (CFA). Some benefit from adaptive learning (VFA).

The framework handles all of it.

What Makes This Different From “Reinforcement Learning”

People hear “sequential decisions” and think “oh, reinforcement learning.”

Not quite.

Reinforcement learning is one method within one class (VFAs) of sequential decision analytics.

RL focuses on learning from experience. SDA recognizes that:

  • Sometimes you don’t need learning (you have a good model)
  • Sometimes you can’t afford learning (need decisions now)
  • Sometimes simple rules outperform fancy algorithms
  • Sometimes you should combine multiple approaches

Making decisions is the most advanced form of AI. Rules are used for guiding computers and must be designed by people (1970s AI). Learning involves using an external dataset to fit a statistical model. Making decisions requires a model of the physical system and metrics, but does not require a training dataset. Professor Powell’s decision analytics series — Topics 1–19 | by Optimal Dynamics | Medium

RL is part of the story. It’s not the whole story.

And honestly? The methods that are widely used in industry (CFAs) have been largely overlooked by the research literature, while academics obsess over VFAs that are theoretically elegant but computationally intractable. Reinforcement Learning and Stochastic Optimization: A unified framework for stochastic optimization — CASTLE

There’s a gap. SDA closes it.

The Academic-Industry Disconnect

Here’s a truth nobody likes to say out loud:

What academics research and what industry actually uses are almost disjoint sets.

Academia focuses on:

  • Provably optimal algorithms
  • Convergence guarantees
  • Novel neural architectures
  • Papers in top journals

Industry needs:

  • Something that works Monday morning
  • Runs in production without crashing
  • Explains decisions to executives
  • Actually scales to real problem sizes

The academic community insists on researching and teaching the most complex policies that are rarely used in practice. We should be teaching all four classes of policies, where we can guarantee that whatever a student does in practice, they will be using a policy drawn from one or more of these classes. Sequential Decision Analytics — CASTLE

The most widely deployed methods (simple PFAs and CFAs) get almost no research attention. The most researched methods (sophisticated VFAs) rarely make it to production.

This is backwards.

How to Actually Apply This

Let’s get practical. You have a sequential decision problem. Now what?

Step 1: Model the Five Elements

Don’t jump to solutions. Model first, then solve.

Start with a plain English narrative describing the big picture. Then identify the core elements without mathematics: What metrics are we trying to impact? What decisions do we control? What do we know? What uncertainties exist? Sequential Decision Analytics and Modeling — CASTLE

Write down:

  • St = {…} (everything you know when deciding)
  • xt = {…} (what you can control)
  • Wt+1 = {…} (what random events occur)
  • St+1 = f(…) (how the state updates)
  • max E{Σt r(St, xt)} (what you’re optimizing)

Be precise. Be complete. Get this right.

Step 2: Characterize Your Problem

Ask these questions:

How big is your state space?

  • Small (< 10⁶ states) → Might use exact DP or VFA
  • Large (> 10⁶) → Need PFA or CFA
  • Massive (> 10¹²) → Definitely PFA or simple CFA

How fast do decisions need to be?

  • Real-time (< 1 second) → PFA only
  • Near-real-time (< 1 minute) → PFA or simple CFA
  • Offline planning (minutes-hours) → Any method works

How complex are your constraints?

  • Simple (linear constraints) → Any method
  • Complex (routing, matching, networks) → CFA or hybrid
  • Very complex (regulatory, physical) → CFA likely best

Do you have historical data?

  • Lots of data + time to train → VFA might work
  • Little data or fast deployment → PFA or CFA
  • No data but good models → CFA or deterministic DLA

What’s your computational budget?

  • Tight budget → PFA
  • Moderate → CFA or simple VFA
  • Unlimited → Try everything, pick best

Step 3: Start Simple, Add Complexity

Phase 1: Build a working PFA

  • Simplest rule that could possibly work
  • “Assign nearest driver to nearest load”
  • “Order up to S when inventory hits s”

Get something running. Establish baseline performance.

Phase 2: Try a CFA

  • Build optimization model of your problem
  • Simplify (ignore some uncertainties)
  • Solve it as a policy
  • Often beats PFA by 20–30%

Phase 3: Consider VFA or DLA

  • If you have time and need more improvement
  • VFA if learning from data makes sense
  • DLA if planning ahead helps and computation allows

Phase 4: Hybridize Combine classes into hybrid policies that work even better than pure classes. From the jungle of stochastic optimization to… Sequential Decision Analytics — CASTLE

  • PFA for fast screening + CFA for final decisions
  • CFA with VFA opportunity costs
  • DLA with PFA fallbacks when time runs out

Step 4: Evaluate and Iterate

Finding the best policy means evaluating policies to determine which is best. Most often policies can be evaluated in a simulator. Sequential Decision Analytics and Modeling — CASTLE

Build a simulator of your system. Test policies. Measure performance. Iterate.

Don’t expect perfection on first try. Expect learning.

Common Mistakes (And How to Avoid Them)

Mistake 1: Trying to solve the “full” stochastic problem

You don’t need the optimal solution. You need a good decision now. A simple CFA that you can run every day beats an optimal VFA you can’t compute.

Mistake 2: Ignoring simple methods because they’re “not sophisticated”

Industry runs on PFAs and CFAs. They work. They’re fast. They’re robust. Don’t dismiss them because they’re not using deep learning.

Mistake 3: Forgetting that you’re searching for a function, not a vector

Traditional optimization finds x*. You need X^π(·). That function needs to work across all states you might encounter, not just today’s state.

Mistake 4: Not testing on realistic problem instances

Your policy might work on the average case but fail on edge cases. Test it. Stress test it. Break it. Then fix it.

Mistake 5: Optimizing the wrong objective

Make sure your objective function captures what you actually care about. “Minimize makespan” might not be what matters if service quality suffers.

The Future: Where This Goes Next

Integration with Modern Machine Learning

Transformers can be used for sequential decision-making where actions are tokens generated by the model and sequences correspond to coherent decisions. Sequential Decision-Making with Transformers | Taewoon Kim

We’re starting to see:

  • Foundation models for decision-making
  • Transfer learning across related problems
  • Few-shot policy learning
  • Natural language policy specification

The gap between large language models and sequential decision analytics is closing.

Autonomous Systems at Scale

AI-based decision support systems have the potential to support autonomous decision-making in industrial settings. AI-Based Decision Support Systems in Industry 4.0, A Review — ScienceDirect

Not just recommendations. Actual decisions. Thousands per day. With human oversight but not human bottlenecks.

Multi-Agent Coordination

Moving beyond single-agent problems:

  • Fleet of autonomous vehicles coordinating
  • Supply chain partners sharing decisions
  • Market mechanisms with strategic agents

The framework extends. The math gets harder.

Explanation and Trust

As systems make more decisions autonomously:

  • Why did the policy choose this action?
  • Can we trust it in novel situations?
  • How do we maintain human oversight?

We need interpretable policies, not just accurate ones.

What You Should Do Next

If you’re working on problems where:

  • Decisions happen repeatedly over time
  • New information arrives between decisions
  • Uncertainty matters
  • Today’s choices affect tomorrow’s options

You need this framework.

Start here:

  1. Read Powell’s “Sequential Decision Analytics and Modeling” (free PDF available)
  2. Model one of your problems using the five elements
  3. Implement a simple PFA baseline
  4. Try a CFA approach
  5. Measure, compare, iterate

Don’t try to master everything. Start with what works.

For academics:

Please. Please. Start teaching all four policy classes. Not just VFAs. Not just what’s theoretically elegant. What actually gets used.

Our students graduate thinking “reinforcement learning” is the answer to sequential decisions. It’s an answer. One of four classes. And often not the best one.

For industry:

The ad-hoc methods you’re using? They’re probably CFAs. Which is great! But you can make them better by:

  • Recognizing they’re policies (functions, not solutions)
  • Tuning the parameters systematically
  • Combining with other policy classes
  • Understanding when they’ll work and when they won’t

The Bottom Line

We’ve been solving optimization problems the same way for 70 years. It’s time to evolve.

Not because the old ways are wrong. Because the problems we face have changed.

Static → Dynamic
Certain → Uncertain
One-shot → Repeated
Complete information → Evolving information

Sequential Decision Analytics is a field centered on the broad problem class of sequential decision problems, drawing on a broad class of methods (the four classes of policies) that span every solution approach that might be used. A Universal Framework for Sequential Decision Problems | ORMS Today

This is where optimization is going. Not as a replacement for what came before, but as its natural next step.

The math is different. The thinking is different. But the goal is the same: make the best decisions we can, with the information we have, as the world keeps changing around us.

That’s what matters. That’s what works. That’s where we’re headed.