What changed in six years, and what didn't
A survey of the third edition of Machine Learning for Trading — generative AI, nine case studies, five libraries, and the process that holds it together.
The second edition of Machine Learning for Trading shipped on 31 July 2020. Two months earlier, OpenAI posted the GPT-3 paper to arXiv. The third edition ships in June 2026. The second edition added a few early deep-learning applications to the first (December 2018). The second-to-third gap is much larger because it covers some of the most consequential six years AI and ML have ever seen.
The field moved. Two questions the book tries to answer did not:
How to develop a trading strategy end-to-end; the book now includes nine case studies from equities and futures to ETFs, FX, and crypto, with holding periods from minutes to months.
How to evaluate a strategy without fooling yourself with a plausible-looking backtest; the new ml4t-diagnostic library ships state-of-the-art overfitting guards — from the Deflated Sharpe ratio to the Rademacher Anti-Serum — and the chapters cover process discipline and the relevant tests in detail.
How the landscape changed
Generative AI and autonomous agents are rapidly becoming part of the research workflow. Three new chapters respond directly:
Retrieval-augmented generation for financial research (Ch 22),
Knowledge graphs (Ch 23), and
Autonomous agents (Ch 24).
Alongside these three, Chapter 10 compresses the second edition’s three NLP chapters — sentiment, topic modeling, word embeddings — into a single chapter organized around transformer-based embeddings as a pipeline stage. Topic modeling and word2vec mostly drop out.
Deep learning diversified, then dispersed. The second edition had a dedicated six-chapter deep-learning part; the third has none. The deeper shift is that finance has begun to develop its own domain-specific architectures, from latent-factor models to end-to-end portfolio learning, rather than importing deep learning from other domains unchanged. The material now travels with the application it serves.
GANs and diffusion models are used for synthetic data (Ch 5).
Transformers support the text feature pipeline (Ch 10).
Tabular DL sits alongside gradient boosting (Ch 12).
Sequence models land in Chapter 13. Gu, Kelly, and Xiu’s 2019 conditional autoencoder and Chen, Pelger, and Zhu’s 2021 deep-learning stochastic discount factor anchor the latent-factor chapter (Ch 14).
End-to-end portfolio learning sits in Chapter 17.
Deep reinforcement learning, with three concrete applications — optimal execution, market making, and deep hedging — stays in Chapter 21.
Chapter 13 takes a deliberately skeptical view of deep learning for time series. Foundation models are harder to extract value from off the shelf on financial data than in other domains. The chapter’s cross-dataset rollup asks where deep learning actually lands on the curve when LSTMs, TCNs, attention variants, and a foundation model are run across the case-study datasets. Deep learning is a tool with specific strengths, not a blanket replacement.
Three additions at the chapter and section levels. Causal analysis and conformal predictions have continued to gain importance:
Causal machine learning (Ch 15) is a new chapter: Pearl-style identification, double ML for isolating factor effects, Bayesian structural time series, time-series causal discovery.
Conformal prediction is now a standard pipeline stage in Chapter 11, not an advanced topic.
Both matter more now than at any earlier point because LLMs and agents have collapsed the cost of generating plausible-looking hypotheses, and the counterweight is formal robustness.
Chapter 9 adds a new perspective: ARIMA, GARCH, spectral, regime-switching, and Bayesian time-series models are treated as feature extractors for a downstream predictor rather than as standalone forecasters.
Operational reality moved from the edges of the book to the center. From strategy implementation to deployment, details matter in practice:
Chapter 18 is a dedicated chapter on transaction costs — taxonomy, microstructure-regime link, Almgren–Chriss as the unifying framework, and the guardrails for when costs kill a strategy.
Chapter 19 is dedicated to risk management — VaR and CVaR, path risk, stress testing, adaptive controls without leakage, and kill switches.
Chapter 25 covers live trading through Interactive Brokers, Alpaca, and QuantConnect.
Chapter 26 covers MLOps and governance.
None of the four had a counterpart in the second edition. The backtrader and zipline backtesters have been replaced by ml4t-backtest, and we also demonstrate vectorized alternatives like vectorBT.
Two foundation-level additions.
Market microstructure gets its own chapter (Ch 3): tick, volume, and dollar bars as information-driven sampling, limit-order-book reconstruction, continuous-futures construction.
Synthetic financial data moved from an advanced topic in 2E Chapter 21 to a foundation chapter (Ch 5), and broadened well beyond GANs to include Monte Carlo baselines, diffusion models, LLM-based structured-data synthesis, and an explicit fidelity–utility–privacy evaluation framework.
Data and infrastructure caught up. Polars replaces pandas across notebooks where the migration is worthwhile. Commercial data sources sit alongside free ones, because free data has become increasingly rare over the past six years, and has serious limitations. Crypto is more central, and platforms like Alpaca make it materially easier to move from research prototype to paper trading and then to small-scale live execution than it was in 2020. Prediction markets — Kalshi, Polymarket — appear to be a new research frontier.
What didn’t change
The constant is process discipline. If anything, the third edition gives it more weight than the second.
Backtesting is one stage in a research pipeline, not the finish line. The book breaks the research-to-deployment arc into dedicated chapters rather than a single chapter on simulation. Chapter 16 is the simulation stage: the ml4t-backtest library, event-driven and vectorized modes, walk-forward with purging and embargo. Chapter 17 is portfolio construction: equal-weight and risk parity as hard benchmarks, the Markowitz curse, hierarchical risk parity, regime-adaptive allocation without discrete switching, and end-to-end portfolio learning. Chapter 18 handles costs, Chapter 19 handles risk, and Chapter 20 synthesizes across the nine case studies — reporting what generalized, what didn’t, and what was deliberately left on the table.
Statistical discipline is threaded through the chapters. The anchor papers are organizing content in Chapters 7, 11, 16, and 20:
Deflated Sharpe ratio — Bailey and López de Prado (2014)
Rademacher Anti-Serum — Paleologo (2025), Elements of Quantitative Investing, §8.3
Purged, embargoed, and combinatorial cross-validation — López de Prado (2018), Advances in Financial Machine Learning
Probability of Backtest Overfitting — Bailey, Borwein, López de Prado, and Zhu (2015)
Multiple-testing corrections in factor research — Harvey, Liu, and Zhu (2016)
Conformal prediction — the Vovk, Gammerman, and Shafer lineage
Hands-on implementation remains front and center, growing substantially in scope and scale. The third edition is built around nine case studies across asset classes and frequencies:
ETFs
Broad US equities
US firm characteristics
NASDAQ-100 microstructure on minute-bar TAQ data
S&P 500 equities joined with options analytics
S&P 500 options as a volatility strategy
CME futures
FX majors
Crypto perpetuals, with funding as a structural signal
Roughly 170 case-study notebooks carry each case through the same pipeline stages — setup, labels, features, model-based features, evaluation, linear, GBM, tabular DL, sequence DL, latent factors, causal, backtest, portfolio construction, costs, risk, synthesis. Cross-case rollups appear at the ends of the model chapters and in a dedicated synthesis chapter.
The second edition taught, model by model, on different datasets. The third edition teaches one pipeline across nine datasets, with explicit protocols for reporting across cases. The cross-case grid is the clearest pedagogical difference between the editions.
More than ‘just a book’: 450+ notebooks, 100+ primers, 56 agent skills, and five libraries
The third edition ships with roughly 450 notebooks, over one hundred primer articles, 56 agent skills across nine categories (concepts, data, features, validation, backtest, portfolio, production, advanced AI, workflows), and five open-source Python libraries:
ml4t-data — sourcing, validation, and point-in-time data pipelines.
ml4t-engineer — feature and label engineering with 120+ financial indicators.
ml4t-diagnostic — model evaluation, overfitting guards, and uncertainty quantification.
ml4t-backtest — event-driven and vectorized strategy simulation with walk-forward controls.
ml4t-live — broker adapters for live execution (Interactive Brokers, Alpaca, QuantConnect).
The agent skills exist because coding agents increasingly participate in implementation. A skill encodes the canonical approach to a specific task — a walk-forward split with purging and embargo, a deflated Sharpe check on a set of backtests, a cost-sensitivity sweep — in a form a reader’s agent can consume without reinventing it. The book carries the argument; the skill shortens the distance between the argument and a correct implementation when an agent does the typing.
The Agent Lab on ml4trading.io is our implementation of Bridgewater’s AIA Forecaster — the multi-agent research pipeline that Chapter 24 shows how to build. It publishes Platt-calibrated probabilities on live Kalshi and Polymarket questions.
About Insights
This is Issue 1 of Insights, a twice-weekly letter running through the June launch and after. Each issue takes one claim from the book — a library, a primer article, an agent skill, a case study, or a new paper — and goes deeper than the book alone has room for. The next issue covers what the five libraries, the primer set, and the 56 skills do that 27 chapters alone cannot. The one after opens the Agent Lab.
If you subscribed to the second-edition list: welcome back.


