Anna Gustafson
Critical Question
Tidalwave began with a critical question: How will the U.S. sustain its forces in a conflict with China? President Xi Jinping has directed his military to be prepared to take Taiwan by 2027. Every day, the window to prepare for such an invasion shrinks, and there remain significant, unaddressed vulnerabilities in U.S. sustainment capacity. Of these, fuel and munitions systems are suspected to be among the largest setbacks to supporting American military projection in the Indo-Pacific.
At the same time, the People’s Republic of China (PRC) is not invulnerable. Its own ability to project and sustain forces also rests on brittle fuel and munitions systems, which are vulnerable to both kinetic and non-kinetic pressure from the U.S. Thus, it becomes a matter of defining and characterizing these pressure points—the very purpose of Tidalwave.
Upon identifying both the problem and the opportunity facing the U.S., it became evident that in all too many cases no program of record existed to address these needs. Tidalwave was thus constructed explicitly to determine the sustainment requirements in a protracted conflict to reveal U.S. vulnerabilities, expose Chinese vulnerabilities, and make recommendations to strategically address both, focusing on the four key systems:
- The U.S. Indo-Pacific Munitions System;
- The U.S. Indo-Pacific Fuel System;
- The PRC Munitions System; and
- The PRC Fuel System.
The Hybrid Simulation
To pursue this goal, the Allison Center at Heritage adopted a hybrid simulation architecture built around human-led design and AI-enabled execution. The Allison Center partnered internally with Heritage’s Center for Data Analysis and externally with The Watch, an AI company specializing in tradecraft and structured analytic workflows to compile, vet, and model U.S. and PRC forces; and The University of Central Florida’s (UCF) Modeling and Simulation Center, to build a visual, campaign-level representation of the simulation.
The project thus became a three-part effort:
- A potential year-long conflict simulation of fuel and munitions sustainment;
- A visualization layer to make those dynamics intelligible to policymakers; and
- This report, which documents the methodology, analysis, results, and recommendations.
The strategic objective of Tidalwave was to employ a rigorous human-led simulation enhanced by AI for additional validation and broader data coverage to identify gaps, deficiencies, and corresponding solutions to resolve the anticipated shortfalls in our ability to project and sustain forces and exploit adversary vulnerabilities in a protracted conflict.
To achieve this objective, the study rejected the binary choice between traditional wargaming (high human judgment, limited data volume) and purely algorithmic modeling (high data volume, limited operational context). Instead, the methodology utilizes a hybrid architecture that meshes professional military judgment with agentic AI processing power and structured analytic pipelines.
Parameters
Within this architecture, several factors shaped the design of the models, simulation, and scenarios. To achieve the project’s goal while maintaining analytical comprehensiveness, this effort is bounded by set parameters without constraining its strategic value.
First, the conflict scenarios are limited to a conventional, non-nuclear scope. Theater-level nuclear escalation pathways will be modeled in Azure Dragon, a follow-on to Tidalwave.
As air and naval forces constitute the majority of usable combat power in a probable conflict, those platforms are the focus of Tidalwave. Deficiencies in transport make meaningful land-force participation unlikely, reinforcing our decision to concentrate on air and naval domains. Because our analysis operates at the strategic level, the model centers on the systems that support the projection of force—in other words, the systems that provide the most critical resources enabling each side to sustain a prolonged conflict. We identified those as fuel and ammunition systems.
Partners and allies currently require additional capabilities and capacity to significantly alter a potential conflict’s outcome. Given that the conflict itself will almost certainly be concentrated in the Indo-Pacific, the simulation is anchored in that theater though it still accounts for global supply disruptions. At the same time, the simulation does not attempt to incorporate the full range of potential disruptions inside the United States (e.g., the effects of cyberattacks, strikes on continental U.S. [CONUS], etc.), as these would require assumptions too extensive to support defensible outcomes in a quantitative simulation.
Importantly, the simulation does not encompass the resource requirements for assets to arrive in theater (e.g., from CONUS or other theaters), though it is safe to conclude that incorporating this dynamic would only exacerbate the deficiencies identified. The simulation is contemporary and employs the current disposition of the U.S. as of late 2025, unaffected by projected additions to the Joint Force or associated infrastructure and resources on either side. It is also not intended to evaluate specific platform performance, though capabilities were assessed by subject-matter experts (SMEs) and included as appropriate.
As a strategic tool, Tidalwave was not intended to evaluate operational maneuver or contrasting tactics, though these were sufficiently addressed to generate realistic attrition and consumption. The focus remains on the strategic, critical vulnerabilities of both sides. It does account for escalation deliberately. While the range of scenarios covers a spectrum, the limited presence of opposing forces incentivizes People’s Liberation Army (PLA) aggression as it also complicates managing escalation for the same reason.
Overarching Key Intelligence Questions
At its most basic level, Tidalwave confronted a quantitative problem:
In a conflict prompted by a PRC invasion of Taiwan, how much fuel and ammunition would U.S. and PLA forces actually consume, and when would these key systems collapse and result in culmination?
Answering that required identifying:
- Which platforms and in what number are most likely to be employed;
- Where and how they would operate; and
- What their wartime consumption rates and expenditure rates would be under realistic tempos and attrition.
This core question was translated into seven Key Intelligence Questions (KIQs) that together prompt a systems-level inquiry of fuel and munitions endurance, chokepoints, and leverage primarily over the first six to 12 months of conflict:
- PLA Fuel Sustainment and Vulnerabilities: For PLA naval and air combat forces operating across the Western Pacific how do varying levels and sequences of U.S./allied interdiction on PLA fuel imports, refining, storage, and tactical distribution affect PLA sortie rates, on-station endurance, and the emergence of binding fuel chokepoints and operational degradation over the course of a high-end conflict?
- PLA Munitions Sustainment and Vulnerabilities: For PLA forces employing key missile, air-defense, anti-ship, air-to-air, and torpedo munitions in a Western Pacific conflict, how do different expenditure profiles, U.S./allied disruptions of production and distribution, and constraints on foreign-sourced inputs affect the PLA’s ability to sustain required rates of fire, avoid local and systemic depletion, and preserve campaign-relevant munitions capacity over the pre-D-day build-up and over the course of a potential war?
- U.S. Fuel Throughput in a Contested Indo-Pacific: For U.S. and allied joint forces dependent on JP-8, JP-5, and F-76 in the Western Pacific, how do PLA operations and maritime interdiction—alongside alternative U.S. and allied posture, routing, and resilience options for fuel movement—affect actual fuel throughput, distribution reliability, and the onset of fuel-driven constraints on sortie generation and operational options over the course of a high-end conflict?
- U.S. Munitions Endurance, Stockpiles, and Launch Capacity: For U.S. Indo-Pacific forces that depend on key munitions, how do alternative mixes of initial stockpiles, industrial surge, contested logistics throughput, and PLA disruption of forward storage and movement affect time to depletion at critical nodes, sustainable rates of fire, and the extent to which munitions availability constrains operational plans and campaign endurance over the pre-D-day period and over the course of the conflict?
- Foreign-Sourced Chokepoints in PLA Fuel and Munitions: For PLA fuel and munitions system chokepoints that rely on foreign-sourced materials, additives, catalysts, components, and software, how do different portfolios and sequences of U.S. and allied export controls, sanctions, targeted cyber operations, and interdiction actions affect the PLA’s ability to refine, produce, and deliver usable fuel and munitions at required scale and speed, and accelerate the onset of system-level shortfalls or collapse over the pre-crisis and course of the conflict?
- Foreign-Sourced and Domestic Chokepoints in U.S. Fuel and Munitions: For the U.S. fuel and munitions industrial and logistics base, including foreign-sourced inputs and domestic bottleneck plants and distribution nodes, how do alternative portfolios and timelines of short-, mid-, and long-term actions—such as stockpiling, source diversification, capacity expansion, and resilience investments—affect the reduction of single-point failures, increase surge capacity, and close the most consequential vulnerability gaps for a Taiwan-relevant conflict across those three time horizons?
- Comparative Timelines, Leverage, and Optimal Sequencing: For PLA and U.S. fuel and munitions sustainment systems viewed as competing logistical architectures in a Taiwan contingency, how do different cross-domain portfolios and sequences of U.S. and allied non-kinetic, kinetic, industrial, and posture levers shift the relative timelines of degradation, recovery, and exhaustion, change which side holds logistical leverage, and alter expected campaign outcomes across the pre-crisis, D-day, Day 0 to 30, and Day 30 to 150 windows?
Knowledge Review
At each phase, the analysis passed through both The Watch’s AIenabled structured rinse cycle—covering sourcing, verification, tradecraft, audit, and formatting—and a separate SME review process led by experts retired from the Air Force, Navy, and Marines as well as energy experts and PLA analysts. This along with tradecraft standards ensured that every factual claim was traceable to at least one of the more than 7,000 curated sources.
This knowledge review, data collection, and modeling methodology followed a threestage workflow that functioned as the qualitycontrol engine for initial data analysis. Each stage was humanled and AIenabled.
- Analytic teams set the questions, source-evaluation criteria, and acceptance thresholds for factual and modeling inputs;
- Narrowly scoped and repetitive tasks were delegated to specialized AI “agents” operating in The Watch Intelligence Platform under strict protocols and constraints.
- Agents executed structured procedures, such as systematic literature harvesting, data extraction, and other sensitive cross-checking techniques developed by former senior intelligence professionals at The Watch.
These prompts enforced structured analytic techniques, constrained model behavior to reproducible steps rather than free-form generation, and ensured that every machine-produced output was reviewable, auditable, and subordinate to human tradecraft.
The Watch’s Intelligence Platform integrated:
- OpenAI’s ChatGPT 4o, 5, and 5.1;
- Google Gemini 2.5 Pro and 3.0 Pro;
- xAI’s SuperGrok Expert and Heavy;
- Anthropic Claude Sonnet 4.5 and Opus 4.5; and
- Internal software, small language models, and proprietary processes specialized in interrogating, cross-checking, and reconciling outputs against each other.
For each task required for data collection, AI Pipelines accelerated the literature review across government, academic, industry, and media sources by:
- Triaging documents;
- Extracting candidate figures, parameters, and methods; and
- Flagging contradictions and outliers.
Analysts then determined which questions could be answered directly from existing data and where only ranges, analogs, or qualitative judgments were possible. Where the open-source record was thin or contested, those gaps were explicitly recorded to drive new modeling work and to inform later confidence statements.
Generation and Synthesis
The KIQs were then deconstructed into structured sub-questions and subordinate research tasks to capture the platform-level and rate-level mechanics needed to answer the core consumption problem, including:
- Platform inventory and posture: Which ships, aircraft, launchers, and key support nodes are relevant for the simulation, and where are they based at D-day?
- Wartime operating patterns: Which sortie rates, transit cycles, time-on-station, and reload patterns are plausible under each scenario’s operational tempo and attrition assumptions?
- Platform-level fuel demand: How much fuel such as JP-5, JP-8, or F-76 does each platform type consume per flying hour, steaming day, or mission cycle under wartime operating conditions?
- Munitions expenditure: Given firing doctrines and probability-of-kill assumptions, how many rounds of each key munition are expended per day by platform and by theater node?
- Endurance timelines: Given starting inventories, production, and contested throughput, on what day do specific platforms, bases, or the whole theater hit critical depletion thresholds?
For knowledge review, the KIQ/sub-KIQ hierarchy is treated as a filing system and asked, “What do we already know about each box?” This methodology incorporated:
- Prior U.S. and PLA logistics studies;
- Open-source work on PRC oil imports and refining;
- U.S. Combat Logistics Force (CLF) and sealift inventories;
- Existing wargame reports and historical campaign logistics cases; and
- Commercial energy and shipping data.
Collected data were tagged by:
- System (U.S. or PLA/Fuel or Munitions);
- Function (stock, throughput, regeneration, foreign-sourced inputs, etc.); and
- KIQ so that tensions and gaps were visible.
AI agents bulk-extracted certain key entities (platform types, flow volumes, node capacities, trade dependencies) and clustered sourcing and documents around specific KIQs. Analysts then rated source quality, resolved conflicts, and identified first-order gaps, such as PLA reserve volumes, PLAN replenishment capacity, and U.S. forward JP-8 holdings, which had to be bounded before simulation outputs would be credible.
Resource Picture
In the resourcepicture phase, the obtained data was mapped against the KIQs and the 365day scenario frame and inventoried:
- Order-of-battle data and force structure data;
- Platform technical manuals and fuel consumption statistics;
- Shipping manifests and tanker/port statistics;
- Refinery and port capacities;
- Commercial energy flows; and
- Prior munitions and fuel cost curves.
At the same time, internal and external SMEs reviewed key findings, identified gaps, and helped inform modeling decisions.
The Watch then selected its tooling and compute constraints:
- A spreadsheet-based, dailybalance engine with linked platform demand tabs and attrition/munitionexpenditure modules;
- Separate PLA campaign models for fuel and munitions; and
- AI support for data ingestion, parameter range identification, and sensitivity sweeps.
For example, PLA aircraft tables included the aircraft type, any associated carrier or base, the number of aircraft, the flight profile, likely fuel capacity, fuel consumption per hour per aircraft, fuel consumption per sortie per aircraft, number of sorties per day per aircraft, total fuel consumption per day, etc. This stage produced a defendable assessment of where the data allowed for deep analysis (e.g., U.S. force structure and consumption) and where the data required a bounding analysis (e.g., PLA reserve policy, clandestine shipping) and explicit confidence statements.
Methodology Building
Scenario Construct
Based on SME validated data and analytic bounding, the model was built within four scenarios differentiated by preparation (reserve size, pre-war actions) and escalation (interdiction and lever use).
Each of the four scenarios captures a distinct combination of operational tempo, risk appetite, and lever use (selfpreservation vs. offensive pressure on the adversary’s sustainment), under shared structural assumptions:
- 365-day horizon;
- Day 30 onset of major combat;
- Day 0 forward stock positions; and
- Common geography (Taiwan Strait, East China Sea, South China Sea, Philippine Sea, and beyond the First Island Chain).
This framework ensured that data collection, model design, and campaign sequencing all answered a specific question for a specific scenario cell, rather than driving toward generic or “average” cases.
- Scenario A: High U.S. Pressure / High PRC Pressure. Scenario A demonstrates high preparation and escalation from both the US and China. This includes China starting with a large, 1.2 billion barrel Strategic Petroleum Reserve (SPR), and early 30 percent redirection of civilian fuel to the PLA, and the U.S. interdicting 50 percent of crude imports, 80 percent of oil tankers, and 30 percent of fuel flow at production, refinement, and distribution points.
- Scenario B: High U.S. Pressure / Low PRC Pressure. Scenario B demonstrates high preparation and escalation from the US, and low from China. China starts with a 600 million barrel SPR, and 15 percent redirection of civilian fuel to the PLA, with the U.S. applying the same pressure as Scenario A.
- Scenario C: Low U.S. Pressure / Low PRC Pressure. Scenario C demonstrates low preparation and escalation from both the US and China. This includes a 600 million barrel SPR, and 15 percent redirection of civilian fuel to the PLA, with the U.S. interdicting 50 percent interdiction of crude imports later that trails down to 25 percent, 40 percent oil tanker degradation, and 15 percent fuel flow degradation at production, refinement, and distribution points.
- Scenario D: Low U.S. Pressure / High PRC Pressure. Scenario D demonstrates low preparation and escalation from the US, and high from China. China starts with a large, 1.2 billion barrel SPR, and early 30 percent redirection of civilian fuel to the PLA, with the U.S. interdicting a 50 percent interdiction of crude imports later that trails down to 25 percent, 40 percent oil tanker degradation, and 15 percent fuel flow degradation at production, refinement, and distribution points.
Methodology building turned the KIQ hierarchy and resource picture into a concrete analytic and modeling design. The backbone of the logistics model is a daily timestep, with four linked modules:
- U.S. Fuel;
- U.S. Munitions;
- PLA Fuel; and
- PLA Munitions.
Each module takes platformlevel demand, stock, and throughput as inputs and outputs:
- First-deficit days;
- Collapse windows; and
- Magnitude and location of unserved demand.
Two cross-cutting structures are then added:
- A munitions attrition and launcher-availability model to capture the coupling between platforms, fuel, and munitions; and
- A PLA campaign-sequencing model that converts node-level chokepoints into ordered kinetic and nonkinetic lever packages.
For each KIQ, the model specifies:
- Which tables or models would produce primary evidence;
- Which ranges or bounding cases would be run; and
- What structured tradecraft tools would be used (key-assumptions checks on scenario and stock assumptions; argument maps for key judgments; explicit alternative hypotheses).
AI Pipelines contained specific roles, such as parameter estimation support, sensitivity analysis, automated ICD-203/206 compliance checks, and maintaining a live map between model outputs and the KIQs they informed. Humans retained ownership of model structure, scenario logic, and all final judgments.
Where highquality data existed, AI helped to aggregate, normalize, and reconcile them (e.g., harmonizing fuel definitions or sortierate assumptions across disparate sources). Where data did not exist, analysts built bottomup models:
- Platform-level fuel curves;
- Wartime tempo multipliers;
- Daily burn and resupply equations for JP-8/F-76 and key munitions; and
- Scenario-based attrition and logistics bottlenecks.
AI functions remained limited to specific tasks, math, bookkeeping, and structured pipelines. Human analysts reviewed all outputs and made final calls on inclusion, ranges, and caveats.
Early Simulation Architecture
An Excel based year-long conflict simulation was created based on the collected data that models production, transport, storage, and distribution of fuel and munitions with platform- and fuel-type fidelity enabling simulation. The core engine:
- Tracks daily production and stocks;
- Models attrition, replenishment, interdiction, and repairs; and
- Simulates consumption using realistic burn rates, sortie cycles, and operational tempos.
The model produces time-stamped endurance metrics such as first-deficit days, collapse windows, and unserved demand for each platform, munition, and fuel type.
This architecture integrates three layers of analysis:
- Discovery: AI agents ingested and synthesized over ~14,000 pages of technical documentation to build foundational datasets for fuel and munitions logistics.
- Spreadsheet Modeling: The Watch’s model processed the discovery by AI agents to create a spreadsheet that identified platform deployment and attrition as well as fuel and munition usage by geographic area.
- Expert Adjudication: Former intelligence officers, operational logisticians, and other SMEs reviewed key variables (e.g., sortie generation rates), adjusting algorithms to align with doctrine, history, and reasonable operations behavior.
Within The Watch platform, Tidalwave was executed through the Foundational Intelligence, Anticipatory Analysis, Actionable Analysis, and Argumentative Analysis pipelines, combining sourcing, tradecraft, and formatting agents with specialized campaignlevel modeling stacks.
Knowledge Development
Knowledge development filled in identified gaps from the initial runs of the simulation. On the U.S. side, that meant building platformlevel fuel and munitions tables: counts and homeports, fuel capacities, burnrates by mission and power setting, sortie lengths and rates, standard weapon loadouts, and reload constraints across ships, aircraft, and logistics vessels. On the PLA side, that meant constructing analogous tables for aircraft, ships, replenishment assets, ports, refineries, pipelines, and inland transport modes, plus separate matrices of foreignsourced catalysts, specialized fuels, components, and service dependencies. SME judgments adjudicated gaps in the open record that needed more clarity, such as the sortie calculations for both the U.S. and the PLA.
At this point, construction of the simulation model completed its first phase through demand libraries of aircraft, ships, and key node-level infrastructure, all gathered from open-source intelligence. Fuel systems were modeled by starting with platform demand first. How many ships, aircraft, and vehicles would be used? How much fuel would they burn daily? And how long could reserves last? This demand-first approach grounded sustainment modeling in consumption curves.
Munitions systems were modeled by starting with probability-of-kill (PK) ratios, multiplied by sortie rates and firing doctrines. After SME review, PK ratios for air-to-air and anti-ship munitions were adjusted. Additionally, a survivability factor was created based on a platform’s radar signature, defensive systems, redundancy, and damage control to produce more realistic kill probabilities.
Essentially, fuel was modeled from consumption up, munitions from effects down—converging in one architecture wherein attrition and survivability dynamics directly shaped endurance. Throughout this process, our human and AI methods enforced ICD-203 and 206 standards, which establish the analytic and sourcing rules for all U.S. intelligence products. This yielded 182,135 derived observations, findings, and judgments across 499 variables per system. Together, these data points required analysis to be deemed objective, accurate, and independent of political influence (ICD-203) while also ensuring that every key judgment is transparently sourced, with reliability and limitations clearly communicated (ICD-206).
Human and AI Teaming. To execute the research plan at scale, analysts paired tasking with AIenabled execution. A Brainstorming Agent proposed source leads and initial extracts for each analytical framework. Human analysts then screened those leads, rejected lowquality material, and rewrote draft arguments before any content entered the model. The hybrid workflow enabled the ingestion, cross-referencing, and synthesis of over 150 primary documents totaling approximately 14,000 pages. Over the course of the project, a combined total more than 150 TB was ingested, including translated Chinese, Japanese, and Filipino sources.
Subsequently, a Source Validation and Synthesis Agent then operated under a strict “sourcefirst” directive to recheck citations, perform 16,594 live lookups and large-language-model queries, and normalize data. Afterwards, analysts again audited and, where necessary, corrected or discarded its work. Live searches confirmed every potential source against publicly accessible documents as well as injected datasets to accelerate the process and broaden the source aperture.
Together, analysts and AI agents ingested and structured more than 7,000 sources from government, industry, commercial, academic, and operational sectors. AI handled bulk scraping and tabular normalization while humans undertook the interpretation and conflict resolution and made the final decisions about what entered the demand libraries and chokepoint matrices. Each data element was stored with an explicit rationale and source pointer so it could be challenged later.
This combination ensured that intelligence assessments were not only rigorous and evidence-based but would also allow policymakers to evaluate the strength, credibility, and potential uncertainty behind each conclusion. Therefore, every factual claim surfaced by AI agents is traceable to one of 5,004 ICD-203 and/206 compliant source reference citations after analysts reviewed the citation and underlying source text. Any claim that could not be independently verified by a human reviewer was removed or explicitly flagged as an assumption, not presented as fact. This human gate at the end of every AI pipeline was created as a check to ensure that all analysis rested on confirmed data and disciplined judgment, not on unchecked AI “hallucination.”
Verification and Interrogation
A critical differentiator of the system is its “interrogative” design. Once a draft is synthesized, it enters a rigorous Audit Rinse process—an AI-powered peer review. The agent acting as a “Claim-Source Auditor” meticulously scrutinizes the text for logical fallacies, evidentiary gaps, and unsubstantiated certainty. It is specifically programmed to detect and flag:
- False inference chains: Where individually true facts are arranged to create a misleading conclusion.
- Citation drift: Where a cited source does not fully support the claim being made.
- Unstated assumptions: Hidden premises that could undermine the integrity of the key judgment.
This internal stress test, guided by a sophisticated sequence of prompts, ensures that only claims grounded in solid evidence and sound reasoning survive to the final report. SME reviewers then repeated this audit with manual spotchecks, redteam reads, and alternativehypothesis drills, treating the AI’s critique as a first pass, not a substitute, for traditional peer review. This stage institutionalized a healthy skepticism of AI output directly into the workflow.
Performing Analysis
Once all data was collected and verified, performing analysis meant actually running the hybrid simulation and relating that information to the KIQs. For each of the four scenarios, the dailybalance engines for U.S. and PLA fuel pulled platform demand from the forcestructure tabs, set intensity and tempo parameters, and dialed in different combinations of damage to ports, depots, CLF/replenishment assets, and shipping routes. Parallel munitions modules translated sortie and firing patterns into daily expenditure, tracked stockpiles by location, and then layered in attrition to show when lack of launchers, not lack of weapons, became the binding constraint.
Systematically, the model was stress tested by toggling civiltomilitary redirection, PLA reserve contributions, levels of nonkinetic and kinetic pressure on PRC nodes, U.S. tempomanagement and substitution strategies, and industrial surge knobs. AI was used as a workhorse to spin out large families of runs, organize outputs, and highlight nonobvious inflection points while humans focused on interpreting patterns, identifying structurally similar behaviors across runs, and deciding which parameter combinations were plausible enough to brief.
Our systems analysis was rooted in Critical Factors Analysis to define how the systems functioned, and CARVER scoring, a method developed by the U.S. military for prioritizing and assessing potential targets, to identify the top critical vulnerabilities. This CARVER method analyzes elements of the PLA Fuel System and PLA Munitions System that were vulnerable to exploitation and strategic targeting. The acronym CARVER stands for:
C—Criticality: How essential is the target to the overall system?
A—Accessibility: How easily can the target be reached, attacked, or affected?
R—Recuperability: How quickly can the system recover if the targets are damaged, disrupted, or destroyed?
V—Vulnerability: How susceptible is the target to damage, disruption, or exploitation with the resources available?
E—Effect: Which broader impacts—political, economic, psychological—would occur if the target is hit or compromised?
R—Recognizability: How easy is it to identify and assess the target within the conditions of the system?
The results for each factor are then scored on a numeric scale of 1 to 5. In combination, these scores determined the greatest deficiencies overall in the P.R.C. fuel and munitions systems.
Reliability and Tradecraft Alignment
In the final stage, a Tradecraft Review Agent supported human reviewers by checking the draft against IC analytic standards for rigor, objectivity, and logical argumentation, as defined by ICD203. It confirms that Key Judgments are stated upfront in active voice, answer the core “What” and “So what” questions, are supported by multiple independent lines of evidence, and clearly identify alternative hypotheses. A separate Formatting Agent applied precise IC style rules, from calibrated likelihood language and confidence levels to citation formats (ICS 2061) and report layouts. Analysts and editors served as the final arbiters of whether assessments were ready for decisionmakers.
This process uniquely combined data analysts, intelligence analysts, domain SMEs, and open-source AI tools and platforms in competition. To date, we know of no similar undertaking to model the complete fuel and munitions systems of the PRC and U.S., nor both joint forces competing in a complex protracted conflict simulation.
Evaluate Analysis
In evaluation, assessments and model outputs are treated as hypotheses to be attacked and not answers to be accepted. Key assumptions were checked on the scenario frame, stock and reserve estimates, loss and repair rates, and foreignsourced dependency logic. Each were asked: “If this is wrong, does the core judgment change in direction, magnitude, or not at all?” Where answers were sensitive, the analysis either reduced the assumptions, added caveats, or reframed the claim to match what the evidence could actually support. These choices were made by analysts, not by AI, to keep final judgments anchored in human responsibility and IC tradecraft.
Source Base and Evidence Profile
At the evidence level, the hybrid simulation rests on an explicitly coded open-source corpus of 5,004 source reference citations—corresponding to roughly 12,000 sourced facts—distributed across government/Department of War (~34 percent), academic and thinktank (~21 percent), media (~22 percent), industry (~12 percent), and proprietary TIDALWAVE simulations and memoranda (~11 percent). These citations span more than 1,100 distinct source entities. Even the most heavily used external contributors each account for well under 3.5 percent of the total, and no external institution or proprietary stream provided more than a small singledigit share. This composition deliberately avoids dependence on any single data set or narrative and instead forces judgments to emerge from converging or competing streams of evidence.
The source base is also recent. Roughly 70 percent of citations (3,534 of 5,004) are dated 2020 or later, with 2023–2025 alone supplying more than half of the corpus. Older material appears primarily where only historical or foundational references exist—such as legacy doctrine, World War II–era logistics baselines, or enduring technical manuals. This temporal profile anchors Tidalwave’s inputs in postCOVID-19 supply chains, current PLA modernization, and contemporary U.S. industrial and logistics constraints rather than in preA2/AD conditions.
Within that corpus, the highestconfidence elements are U.S. and allied technical data, Western industrial and shipping statistics, and transparent internal calculations built directly from those inputs. Judgments about U.S. fuel and munitions baselines and physical bottlenecks rest on this foundation. Greater uncertainty attaches to PLA consumption and reserves, wartime attrition rates, and allied or commercial political behavior, all of which rely more heavily on modeling, analogy, and scenario assumptions. As a result, Tidalwave’s fuelcollapse and munitionscollapse dates should be read as conditional failure windows—if inputs fall in these ranges, these failure modes emerge—rather than as point predictions of specific calendar days.
Finally, the project treated its own sourcing as an object of analysis. Using the coded citation corpus, analysts and AI jointly profiled the sources’ category mix, entity concentration, and recency. This process helped to expose any over or under concentrations and to map where judgments rest on especially strong or weak evidence. That “analysis of the sourcing of the analysis” feeds back into future collection and modeling priorities. This thorough methodology gives readers a transparent answer about which data any given assessment rests on and where residual risk remains.