TPC26-logotype-h-rev-1000

Sessions

Plenary Sessions

Opening Plenary Session: TPC Vision

Monday, June 1, 14:00

This talk marks a pivotal moment in the evolution of scientific discovery, as AI, advanced computing, and human expertise converge to unlock a new era of continuous, real‑time innovation. It highlights the Genesis Mission as a bold national effort uniting national laboratories, industry, and academia to build an unprecedented scientific platform capable of accelerating breakthroughs at extraordinary speed. More than a technological shift, it is a call to action — an invitation for researchers and institutions to help shape a future where bold collaboration, shared purpose, and transformative innovation redefine what humanity can achieve.

Dario Gil, Under Secretary for Science, Department of Energy

The NAIRR Pilot, launched more than two years ago to connect the US-based research and education communities to critical AI resources including compute, data, software, models and expertise, is now supporting over 700 projects and enabling more than 7,000 students. The NAIRR Pilot has produced high-impact discoveries and novel AI models for science domains, and spurred start-up companies. The pilot is now transitioning to a sustainable model with a funded operations center that will be announced later this year. In this talk, NSF will discuss this novel public-private partnership and interagency collaboration model, key outcomes, lessons learned from the NAIRR pilot, and future directions.

Katie Antypas, Director, Office of Advanced Cyberinfrastructure, US National Science Foundation

Rick Stevens, Associate Associate Laboratory Director – CELS and Argonne Distinguished Fellow, Argonne National Laboratory | Professor of Computer Science, The University of Chicago

Plenary Session 2: Industry / Lab / Academia

Monday, June 1, 16:30

Agentic science reframes scientific practice around human–AI teams that co-generate hypotheses, run experiments, and analyze results. Drawing on examples from our replication project, we develop a resource model spanning input tokens, human review time, HPC cycles for training and testing models, and compute for running experiments and analyzing data. We ask: how do we practically accelerate science, and at what cost? Holding scientist headcount fixed, we explore how per-scientist AI investment shifts productivity — rethinking metrics beyond paper counts toward hypothesis depth, saturation, quality, and "deeper" science with greater per-paper impact. We frame all this within the DOE Genesis Mission.

Rick Stevens, Associate Associate Laboratory Director – CELS and Argonne Distinguished Fellow, Argonne National Laboratory | Professor of Computer Science, The University of Chicago

Satoshi Matsuoka, Director, RIKEN R-CCS

Scientific discovery is being transformed by the convergence of high-performance computing, AI for science, and quantum computing, spanning seamlessly from on-premises to cloud. Compressing time to innovation is now the defining challenge, so let’s explore real-world examples of how the industry is empowering the scientific community to harness this boundary-free compute model for breakthrough discovery.

Thierry Pellegrino, Global Head of Advanced Computing, Amazon Web Services

The next leap in AI for science will not come from models alone, but from the systems that sustain them. Building on DOE’s Exascale Initiative and Project, the Genesis Mission marks a shift from standalone machines to persistent, AI-enabled discovery platforms. AI accelerates hypothesis generation, while high-fidelity simulation ensures validation and trust. As workflows become agentic, computation shifts toward tightly integrated systems spanning orchestration, data, and execution. DOE’s LUX and Discovery exemplify this transition, enabling continuous discovery at scale. Leadership will be defined by the ability to build, operate, and sustain trusted scientific ecosystems.

Thomas Zacharia, Senior Vice President, Strategic Technical Partnership and Public Policy, AMD

Plenary Session 3: Frontier Models and Systems

Tuesday, June 2, 8:30

Modern AI systems are often cast as products: a model, a chatbot, an API. But when we think about using AI for science, that framing is too narrow. What scientific communities need is open AI infrastructure: data, codebases, pre- and post-training recipes, documentation, evaluations, and access to model flows across stages of development, not just a final released set of model weights. This talk will use Ai2’s Olmo project portfolio as a case study in what it means to build that infrastructure in the open. Drawing on recent results from our team, including work that exposes and studies multiple stages of model construction rather than only final models, we will show that openness at the level of infrastructure is not only a scientific virtue, but a practical necessity. If researchers are going to build AI systems for their own communities, and if universities, nonprofits, and governments are going to harness AI to serve the public interest, they must be able to invest in, contribute to, and use open infrastructure. Our goal is not to reproduce commercial AI, but to create a healthier open ecosystem to accelerate scientific discovery.

Noah A. Smith, Vice Provost for AI, Charles and Lisa Simonyi Endowed Chair for Artificial Intelligence and Emerging Technologies, and Professor, Paul G. Allen School of Computer Science & Engineering, University of Washington

The Bavarian proposal for the "Blue Swan" European AI Gigafactory addresses the critical need for specialized computing infrastructure to train large-scale foundation models within the European research and industrial landscape. The technical framework integrates high-end GPUs into a coherent, HPC-oriented cluster architecture, building on Leibniz Supercomputing Center's (LRZ’s) pioneering work on holistic energy efficiency. This includes hot water direct liquid cooling, utilization of 100% renewable energy sources, and reuse of waste heat to achieve carbon-neutral operations. On this technological basis, Blue Swan employs a scientific approach to analyze demand and enable and scale industrial applications in dedicated domains. It also integrates national European data spaces to facilitate latency-optimized interoperability between industrial applications and academic research. Thus, Blue Swan intends to serve as a substantial AI resource and a technological validation point for energy-efficient, sovereign AI infrastructures in the exascale range.

Dieter Kranzlmüller, Chairman of the Board of the Leibniz Supercomputing Centre (LRZ) | Full Professor, Ludwig-Maximilians-University Munich (LMU)

Large models are usually measured by what goes in: parameters, tokens, compute. Agents shift the focus to what happens over time: plans, tool calls, experiments, revisions, failures, recoveries, and discoveries. This talk frames agents as the machinery that turns trillion-parameter models from predictors into participants in scientific and technical work. We’ll discuss why agentic systems change how we think about scale, evaluation, reliability, and control, and why the next frontier may be not just bigger models, but larger and more consequential processes built around them.

Ian Foster, Data Science and Learning Division Director, Argonne National Laboratory

Japan’s LLM ecosystem is rapidly moving from adaptation to original capability building. This talk will present lessons from the Swallow and LLM-jp projects: open collaboration, Japanese-centric data curation, multilingual evaluation, and scalable training on domestic infrastructure. Recent work will be presented that shows how pre-training data can be rewritten to improve math and code performance, and how mixture-of-experts sparsity should be optimized for reasoning through active FLOPs and tokens-per-parameter. Together, these efforts point toward transparent, reproducible, and locally-grounded LLM development with global scientific impact.

Rio Yokota, Professor, Institute of Science Tokyo | Team Principal, RIKEN Center for Computational Science

Plenary Session 4: Workforce & Emerging Leaders

Tuesday, June 2, 11:00

Karthik Duraisamy, Professor of Aerospace Engineering, and Director, Michigan Institute for Computational Discovery and Engineering, University of Michigan

This talk presents MIST, a family of molecular foundation models with an order-of-magnitude more parameters and training data than prior works. MIST models predict more than 400 structure-property relationships and demonstrate state-of-the-art performance across diverse benchmarks spanning from physiology to electrochemistry. It will cover MIST's capacity to solve real-world problems across chemical space, from multiobjective electrolyte screening to olfactory perception mapping, along with a systematic application of mechanistic interpretability methods to uncover generalizable scientific concepts learned by the model, which reveals how models encode chemical knowledge. The talk will introduce innovations in training methodology, including hyperparameter-penalized neural scaling laws that reduce model development computational costs by an order of magnitude. Together, these methods and findings represent significant progress toward accelerating materials discovery using foundation models.

Anoushka Bhutani, PhD Student, Mechanical Engineering and Scientific Computing, University of Michigan

Language Models (LMs) often struggle to generate diverse, human-like creative content, raising concerns about the long-term homogenization of human thought through repeated exposure to similar outputs. Yet scalable methods for evaluating LM output diversity remain limited, especially beyond narrow tasks such as random number or name generation, or beyond repeated sampling from a single model. Infinity-Chat is a large-scale dataset of 26K diverse, real-world, open-ended user queries that admit a wide range of plausible answers with no single ground truth. It is the first comprehensive taxonomy for characterizing the full spectrum of open-ended prompts posed to LMs. This talk presents a large-scale study of mode collapse in LMs using Infinity-Chat, revealing a pronounced Artificial Hivemind effect in open-ended generation of LMs. Overall, Infinity-Chat presents the first large-scale resource for systematically studying real-world open-ended queries to LMs, revealing critical insights to guide future research for mitigating long-term AI safety risks posed by the Artificial Hivemind.

Liwei Jiang, PhD Student, Paul G. Allen School of Computer Science & Engineering, University of Washington

Lunch and Panel Discussion

Tuesday, June 2, 12:30

Moderator: Earl Joseph, CEO, Hyperion Research

Hal Finkel, U.S. Department of Energy
Jay Boisseau, Google Cloud
Molly Presley, Hammerspace
Samantika Sury, HPE

Lunch and Panel Discussion

Wednesday, June 3, 12:30

Moderator: Addison Snell, Co-Founder & Chief Executive Officer, Intersect360 Research

Arvind Ramanathan, Argonne National Laboratory
Eliott Jacopen, RIKEN R-CCS

Plenary Session 5: TPC Collaborative Initiatives

Wednesday, June 3, 16:00

Valerie Taylor, Director, Mathematics and Computer Science Division, Argonne National Laboratory

Scientific hypothesis generation is the most consequential step of the research workflow: every downstream action, from experiment design to simulation, observation, and analysis, inherits its quality from it. Researchers are starting to use frontier LLMs and multi-agent co-scientists as Hypothesis Generation Tools (HGTs) to accelerate scientific discovery, with the potential to exploit multi-disciplinary knowledge more effectively than any individual scientist. In practice, HGTs and humans work at different time scales: while a researcher will spend months to come up with a few hypotheses, HGTs can generate thousands of hypotheses for every single scientific question in a matter of hours or a few days. Although many of the HGT-generated hypotheses can be discarded quickly today, rejecting hypotheses will become harder as HGTs progress. This raises a novel question: what should a practical hypothesis generation process for science look like in the AI context? This talk will discuss the current state-of-the-art in hypothesis generation, available HGTs, an analysis of the generated hypotheses, some positive results, and evaluation outcomes. We will also introduce the following questions that the science community will have to answer: How many hypotheses should an HGT generate per problem? What scale of resources (tokens, GPUs) is needed to serve institutions with thousands of researchers? How can the generated hypotheses be filtered/ranked? Do we have the infrastructure to test tens, hundreds, or thousands of hypotheses per research problem?

Franck Cappello, R&D Lead, Senior Computer Scientist, Argonne National Laboratory

AI inferencing capability is increasingly important inside academia for both inference and training. Although options from industry are available, resources for AI development in academia are substantially smaller than their industrial counterparts. While AI training can be relatively simply supported by the batch-scheduled, GPU-enabled clusters in many HPC centers, inference workflows present new challenges. This talk will cover how inference is being addressed at the Texas Advanced Computing Center (the National Science Foundation’s Leadership-Class Computing Facility), examine the unmet needs we see from the user base, and provide a summary of TPC discussions between TACC and other centers around the world on how inference may be addressed in academic research.

Dan Stanzione, Executive Director, Texas Advanced Computing Center (TACC) | Associate Vice President for Research, UT-Austin

Breakout Groups

TPC26 breakout groups are designed to identify, form, and pursue collaborations that will accelerate the development of new AI capabilities and services for scientific discovery. Some sessions are organized by TPC working groups, others are prospective working groups or birds-of-a-feather gatherings. Each session comprises a small set of lightning talks followed by group discussion, and all TPC26 participants are encouraged to submit lightning talk proposals.

The five-way parallel breakout schedule is loosely organized around eight themes: Infrastructure to Enable Shared Data & Computing, Open Frontier Models, Open Frontier AI Systems, Software Infrastructure/Frameworks, Open Suite for Evaluating Model Skills, Knowledge, Reasoning, and Safety, Driving Challenge Applications, Training- and Deployment-Level Safety and Alignment and Workforce Development.

Driving Challenge Applications (Challenge Applications)

Identify challenge applications for driving and evaluating the Infrastructure to Enable Shared Data & Computing, Open Frontier Models (Model Architecture & Performance Evaluation), and Open Frontier AI Systems tracks. Not centrally picking winners and losers, but asking the community to volunteer (and drive) scientific challenge applications, aiming for diversity on multiple axes (including industry applications).

AI for Material Sciences: Session 1

Tuesday, June 2, 14:00
Eliu Huerta, Argonne National Laboratory
Xiaoyun Wang, NVIDIA

This session will showcase how AI is revolutionizing materials discovery across quantum science, semiconductors, chemistry, energy, and advanced manufacturing. Bringing together world-class leaders from academia and industry, the session features keynote talks by Ted Sargent (Northwestern), Cameron J. Owen (Lila Sciences), Laura McGorman (Meta), and Arvin Kakekhani (PsiQuantum), alongside a special NVIDIA tutorial by Xiaoyun Wang and lightning talks from rising innovators in the field. Expect bold ideas, frontier AI methods, foundation models for science, autonomous experimentation, and next-generation computational workflows that are redefining the pace of scientific discovery.

Human-in-the-Loop and Mixed Acceleration for Next-Generation Catalysis Edward (Ted) Sargent (Northwestern University)
Discovering Structural Rules in Scattering Amplitudes via Information Lattice Learning Hazu Yu (Kocree Inc.)
Nahual: A Sequence Model for Language and Atoms Austin Cheng (University of Toronto)
AI and Experimental Automation Emma Bouchard (Carnegie Mellon University)
Open Source AI for Innovations in Energy and Materials Sciences Laura Gorman (Meta)

AI for Material Sciences: Session 2

Tuesday, June 2, 16:00
Eliu Huerta, Argonne National Laboratory
Xiaoyun Wang, NVIDIA

This session will showcase how AI is revolutionizing materials discovery across quantum science, semiconductors, chemistry, energy, and advanced manufacturing. Bringing together world-class leaders from academia and industry, the session features keynote talks by Ted Sargent (Northwestern), Cameron J. Owen (Lila Sciences), Laura McGorman (Meta), and Arvin Kakekhani (PsiQuantum), alongside a special NVIDIA tutorial by Xiaoyun Wang and lightning talks from rising innovators in the field. Expect bold ideas, frontier AI methods, foundation models for science, autonomous experimentation, and next-generation computational workflows that are redefining the pace of scientific discovery.

GridAI Model Team

Wednesday, June 3, 8:30
Kibaek Kim, Argonne National Laboratory
Teja Kuruganti, Oak Ridge National Laboratory

This working group is organized around GridAI, a Genesis Mission seed project to scope a scalable AI platform for power grid modeling, analysis, and decision support. This session will introduce GridAI to the TPC community, describe the seed project’s goals and team, and feature invited contributions from participating institutions on grid-relevant AI directions. An open forum will follow to discuss shared challenges in data, modeling, software, and HPC infrastructure, and to explore connections with scientific ML and energy systems modeling. Researchers in AI/ML, scalable algorithms, optimization, and complex systems are invited to join and help shape the GridAI agenda.

Scalable Heterogeneous Graph Learning for Grid AI Foundation Models Massimiliano Lupo Pasini (Oak Ridge National Laboratory)

BOF: Bio-Foundation Models, Agentic systems, and Biosecurity

Wednesday, June 3, 11:00
Arvind Ramanathan, Argonne National Laboratory
Newton Wahome, CEPI
Sarah Carter, CEPI

Bio-foundation models and agentic AI systems are reshaping biological discovery — from genome-scale language models and protein structure prediction to autonomous laboratory workflows — while simultaneously surfacing critical biosecurity risks that demand urgent community attention. This BOF convenes leading researchers from industry and national laboratories to examine scalable biological AI architectures, agentic orchestration for autonomous science, and governance frameworks for dual-use risks. Topics span model scaling behavior, biosafety-by-design, and policy-aligned deployment. Attendees will gain actionable insight into responsible bio-AI development, preparing the HPC and AI research community for safe, high-impact biological discovery at scale.

AI and Strategic Decision Support

Wednesday, June 3, 14:00
Frank Alexander, Argonne National Laboratory
Manish Parashar, The University of Utah

Building on the consortium’s collaborative development of foundation models for science and engineering, we’ll examine applications in disaster response, supply chain resilience, pandemic management, and other areas. The discussion will connect TPC’s work on scalable architectures, scientific data curation, and exascale optimization to breakthrough capabilities in time-critical decision support.

From a Napkin To a Workflow — Opportunities and Challenges for Workflow Composition Ewa Deelman (University of Southern California)
Evaluating Epistemic Non-Triviality of Large Language Reasoning Models in Scientific Hypothesis Generation Tirthankar Ghosal (Oak Ridge National Laboratory)

Open Frontier Models: Model Arch & Perf Evaluation (Open Models)

Build frontier-scale, open AI models using shared data and computing infrastructure (from the Infrastructure to Enable Shared Data and Computing track), harnessing distributed resources across TPC partner institutions. Ensure that all core components are openly available to enable transparency, reuse, and scientific progress.

Model Architectures and Performance Evaluation: Session 1

Wednesday, June 3, 11:00
Rio Yokota, Institute of Science Tokyo
Murali Emani, Argonne National Laboratory

Architectures for AI models are evolving rapidly, with frequent innovations in transformer variants, to reduce the cost of attention and kv-cache in long-context/agentic reasoning, their framework support, parallelism strategies, and system-level optimizations. Identifying the optimal architecture and framework for training foundation models on scientific data is vital to unlocking the next generation of AI for science. Equally crucial is efficient inference, which enables the practical use of pre-trained models in downstream scientific applications. This multi-session track will bring together researchers and practitioners to discuss cutting-edge strategies for large-scale training, inference, and agentic scaling, alongside robust workflows to integrate them.

Scaling MoE Inference with vLLM on Aurora Padma Apparao (Intel)
Olmo Hybrid: From Theory to Practice and Back William Merrill (Tokyo Technological Institute)
RingX: Scalable and Efficient Long-Context Learning for Scientific Foundation Models on HPC Junqi Yin (Oak Ridge National Laboratory)
AIMNet2: A Foundational Machine-Learned Interatomic Potential for General Chemistry, Reactions, and Open-Shell Systems Shams Mehdi (Carnegie Mellon University)

Model Architectures and Performance Evaluation: Session 2

Wednesday, June 3, 14:00
Rio Yokota, Institute of Science Tokyo
Murali Emani, Argonne National Laboratory

Architectures for AI models are evolving rapidly, with frequent innovations in transformer variants, to reduce the cost of attention and kv-cache in long-context/agentic reasoning, their framework support, parallelism strategies, and system-level optimizations. Identifying the optimal architecture and framework for training foundation models on scientific data is vital to unlocking the next generation of AI for science. Equally crucial is efficient inference, which enables the practical use of pre-trained models in downstream scientific applications. This multi-session track will bring together researchers and practitioners to discuss cutting-edge strategies for large-scale training, inference, and agentic scaling, alongside robust workflows to integrate them.

AI Software Infrastructure/Frameworks (Software Stack)

Develop software infrastructure and middleware to support the training, deployment, and integration of complex frontier-scale AI models and systems. Provide the technical backbone for the Open Frontier Models (Model Architecture & Performance Evaluation) and Open Frontier AI Systems tracks, while enabling integration with experimentation platforms, laboratories, instruments, and other real-world scientific environments.

BOF: AI Frameworks for Multimodal Data Access and Use

Tuesday, June 2, 14:00
Ilkay Altintas, San Diego Supercomputer Center
Manish Parashar, The University of Utah

AI is accelerating discovery, redefining the workforce, and transforming society, yet fragmented data and computing limit innovation. Federated, AI-ready frameworks can bridge distributed data repositories and computing resources through interoperable, production-grade services that enable seamless access, integration, and composability of data, models, and workflows across edge, cloud, and HPC. These frameworks support multimodal analysis, simulation, and end-to-end workflows. This BOF will examine architectures, highlight existing efforts, and explore paths toward a cohesive national ecosystem. Talks and discussions will identify needs, share use cases, and provide practical entry points for leveraging national cyberinfrastructure.

AI Software Infrastructure Frameworks

Tuesday, June 2, 16:00
Mohamed Wahib, RIKEN R-CCS
Rio Yokota, Institute of Science Tokyo

This working group session will discuss the software infrastructure and middleware required to support frontier-scale AI for science, including frameworks for training, deployment, orchestration, and integration with HPC systems, scientific workflows, laboratories, instruments, and experimentation platforms. The session aims to identify common software challenges and collaboration opportunities needed to make large-scale AI systems usable, reliable, and interoperable across scientific environments.

Scaling MoE to Exascale Software on Aurora Padma Apparao (Intel)

DWARF: Data Workflows, Agents Reasoning, and Frameworks

Session 1
Wednesday, June 3, 8:30
Session 2
Wednesday, June 3, 8:30
Robert Underwood, Argonne National Laboratory
Neeraj Kumar, Pacific Northwest National Laboratory
Ian Foster, Argonne National Laboratory

This multi-session track explores emerging systems and strategies for building intelligent, scalable platforms to accelerate scientific discovery. Talks and discussions will cover the design of agent-based architectures, integration of scientific workflows with large language models, scalable data pipelines, and novel reasoning frameworks. The session encourages both the software infrastructure and the users of that infrastructure. Participants will engage in dialogue on the future of scientific AI infrastructure and the coordination required to realize a distributed, agent-enabled discovery ecosystem.

Best Practices for Scientific Workflows Robert Underwood (Argonne National Laboratory)
When More Cores Hurts: The Vector Database Scaling Paradox in HPC Seth Ockerman (University of Wisconsin)
Agents with Agency Yadu Babuji (University of Chicago)
Deploying Agentic AI Across the DOE Accelerator Complex: The MOAT Experience Thorsten Hellert (Lawrence Berkeley National Laboratory)
Unified User Experience Across Heterogeneous GPU Clusters with Diamond Zhao Zhang (Rutgers University)
Enabling Autonomous Scientific Discovery Through Agentic AI and High-Performance Computing Murat Kecelli (Argonne National Laboratory)
Building Reusable and Trustworthy AI Co-Scientists: Lessons From Multi-Domain Scientific Deployments Chandrachur Bhattachar (Argonne National Laboratory)

BOF: Trillion Parameter Models for the Edge and Computing Continuum

Wednesday, June 3, 11:00
DK Panda, The Ohio State University
Barney Maccabe, University of Arizona

Computing continuum at the edge is emerging as a common environment for many applications — transportation, fire prevention, agriculture, medicine, manufacturing, etc. Typical computing environments at the edge (IOT devices, drones, etc.) do not have enough computing or storage capacity, which presents the challenge of how to use TPC Models in this environment. This BOF will focus on the latest state-of-the-art solutions along this direction as well as future opportunities and challenges.

BOF: Large Language Models for Scientific Software

Wednesday, June 3, 11:00
Mohammad Alaul Haque Monil, Oak Ridge National Laboratory
Keita Teranishi, Oak Ridge National Laboratory

Agentic AI is redefining HPC research by introducing intelligent, autonomous capabilities. This BOF explores how large language model (LLM) agents enable code translation, modernization, modeling, and tuning, allowing legacy scientific applications to be efficiently adapted for modern HPC architectures. Beyond code transformation, agentic systems can orchestrate and optimize complex, end-to-end HPC workflows with minimal human intervention. Participants will discuss emerging tools, challenges, and opportunities in deploying LLM-driven agents for scalable, reproducible, and adaptive research pipelines. The session aims to foster collaboration and share insights on advancing autonomous HPC systems powered by agentic AI technologies.

Scaling MoE Inference with vLLM on Aurora Padma Apparao (Intel)
Applications of Deep Knowledge Graph Geoffrey Fox (University of Virginia)
Curating Agentic Workflows with Knowledge Graphs and Operational Experience Ana Gainaru (Oak Ridge National Laboratory)
A Tale of Two Agentic Frameworks: Empirical Studies of LangChain/LangGraph and AG2 in Autonomous Scientific Workflows Meifeng Lin (Brookhaven National Laboratory)
Enabling Autonomous Scientific Discovery Through Agentic AI and HPC Murat Keceli (Argonne National Laboratory)
Applications of Deep Knowledge Graph Geoffrey Fox (University of Virginia)
Curating Agentic Workflows with Knowledge Graphs and Operational Experience Ana Gainaru (Oak Ridge National Laboratory)
A Tale of Two Agentic Frameworks: Empirical Studies of LangChain/LangGraph and AG2 in Autonomous Scientific Workflows Meifeng Lin (Brookhaven National Laboratory)
Enabling Autonomous Scientific Discovery Through Agentic AI and HPC Murat Keceli (Argonne National Laboratory)

Infrastructure to Enable Shared Data & Computing (Shared Infrastructure)

Collectively build scientific training data resources and shared computing infrastructure for model training and further fine-tuning for general-purpose and domain-specific settings. Establish scalable and sustainable capabilities that serve as the foundation for the rest of the tracks.

BOF: Trustworthy Privacy-Preserved Federated Learning for Science

Tuesday, June 2, 14:00
Olivera Kolevska, Oak Ridge National Laboratory
Kibaek Kim, Argonne National Laboratory
Ravi Madduri, Argonne National Laboratory

Federated learning offers a promising approach for enabling collaborative scientific discovery while preserving the privacy of sensitive data across institutions. This BOF will bring together researchers and practitioners to discuss trustworthy, privacy-preserving federated learning frameworks tailored for scientific workloads. The discussion will focus on challenges such as secure model aggregation, data confidentiality, system scalability, and integration with distributed research infrastructures. Objectives include identifying common requirements, sharing emerging techniques, and fostering collaborations within the TPC community. The session is particularly relevant to TPC participants interested in distributed computing, secure data sharing, and scalable AI methods that support cross-institutional scientific research.

NeuroFL: OBI's Intelligence Network for Brain Health Bryce Pickard (Ontario Brain Institute)
OmniFed: Towards Configurable Cross-Silo Federated Learning Sahil Tyagi (Oak Ridge National Laboratory)
Differentially Private Federated Averaging with James-Stein Estimator Minseck Ryu (Arizona State University)
Socio-Technical Infrastructure: Operationalizing FL Systems Mohammed Manzari (Deloitte)
Are You Ready for Production Federated Learning? Holger Roth (NVIDIA)
Federated LLM Training Across NNSA Labs Max Carlson (Sandia National Laboratories)
Scalable Cross-Facility Federated Learning for Scientific Foundation Models on Multiple Supercomputers Yijiang Li (Argonne National Laboratory)
The Next Frontier: Federated AI with Flower William Lindskog-Munzing (Flower Labs)
Towards Trustworthy Federated AI: Privacy, Ownership Protection, and Model Editing Olivera Kolevksa (Oak Ridge National Laboratory)

BOF: 50-State AI Plan: A Grassroots Approach to Building a US AI Continent

Tuesday, June 2, 16:00
Barr von Eohsen, Pittsburgh Supercomputing Center
Jack Wells, NVIDIA

This BOF explores the emerging fabric of state and regional initiatives that will empower American leadership in AI and quantum computing. Using Pennsylvania, Tennessee, Utah, Massachusetts, New York, New Jersey, and California as case studies, we examine how state-led initiatives are aligning regional assets with America's AI Action Plan, the Genesis Mission, and the National AI Research Resource (NAIRR) in building a US AI Continent. Crucially, we discuss how these and other federal initiatives can leverage state-level "factories" to advance research, workforce development, and economic growth by embracing grassroots innovation tailored to local and regional strengths. We highlight essential partnerships between academia, state governments, industry, philanthropy, and federal agencies to drive these efforts. Participants will discuss strategies for democratizing access to high-performance computing resources, streamlining technical deployment, and building the AI workforce pipeline. By fostering inclusive, state-led hubs, we can attract investment, train thousands of workers, and ensure US technological sovereignty in AI.

Catalyzing the Utah Responsible AI Innovation Ecosystem Manish Parashar (The University of Utah)
Experimental Design for Foundation Models: From Uncertainty to Risk Yu Wen (Stony Brook University)
The Pittsburgh Supercomputing Center, An Integrated National Resource for AI for Science Paola Buitrago (Pittsburgh Supercomputing Center)
From Vision to Momentum: The AI Tennessee Blueprint Tabitha Samuel (University of Tennessee)
The Keystone AI + Quantum Factory Barr von Eohsen (Pittsburgh Supercomputing Center)

BOF:  Self-Driving Labs for Accelerating Scientific Discovery at Scale

Wednesday, June 3, 8:30
Arvind Ramanathan, Argonne National Laboratory
Rio Yokota, Institute of Science Tokyo

Self-driving laboratories (SDLs) are transforming scientific discovery by integrating AI-driven hypothesis generation, robotic experimentation, and closed-loop active learning into fully autonomous workflows. This workshop convenes leading researchers from national laboratories, academia, and industry to examine the computational foundations of SDLs — spanning foundation models for experimental design, agentic orchestration across heterogeneous instruments, high-throughput data pipelines, and HPC integration at scale. Applications span drug discovery, materials design, enzyme engineering, and critical minerals extraction. Attendees will gain actionable insight into deploying autonomous discovery platforms, benchmarking SDL performance, and building the open infrastructure needed to accelerate science at exascale.

BOF: Energy-Efficient and Sustainable AI

Wednesday, June 3, 14:00
Siddhartha Jana, Intel
Natalie Bates, Lawrence Berkeley National Laboratory
Shaohui Liu, Massachusetts Institute of Technology

This BOF will focus on the sustainability challenges of large-scale AI, aligned with the mission of advancing responsible and scalable AI for science. As trillion-parameter models demand unprecedented compute, energy, and data resources, critical questions arise around carbon footprint, infrastructure efficiency, and equitable access. The forum will convene researchers, industry practitioners, and infrastructure providers to examine trade-offs between performance and sustainability, share best practices in energy-efficient model design and deployment, and identify collaborative pathways for greener AI. By fostering cross-sector dialogue, this session aims to shape actionable strategies for sustainable AI at extreme scale.

BOF: Bold New World of Heterogenous AI Computing

Wednesday, June 3, 11:00
Satyam Srivastava, d-Matrix
Tom St. John, Gimlet Labs

As AI workloads diversity, no single accelerator wins on every axis of cost, power, and performance. Consequently, traditional monolithic architecture is hitting critical bottlenecks in scaling and efficiency.  This BoF brings together perspectives from researchers, architects, and hardware vendors on building production systems that combine GPUs, ASICs, and more. Speakers will share insights spanning architecture and integration, software stacks and portability, workload scheduling across heterogeneous fabrics, and real-world performance benchmarks. Attendees will gain a strong grasp of the trade-offs, tooling gaps, and emerging best practices for designing AI infrastructure that treats heterogeneity as a foundational feature.

Macroheterogeneity: Enabling Hybrid HPC and AI Workflows Samantika Sury (HPE)
Efficient and Scalable Agentic AI with Heterogeneous Systems Zain Asgar (Gimlet Labs)
Prefill Here, Decode There: Disaggregated LLM Serving Across GPUs and LPUs Vineeth Gutta (NVIDIA)
The NAPA Project: Inference Systems Jason Haga (National Institute of Advanced Industrial Science and Technology (AIST))

Open Frontier AI Systems (AI Systems)

Develop frontier AI systems for science that incorporate reasoning models (start with SOA closed models, eventually include Open frontier model) and develop domain foundation models, knowledge graphs, agentic systems and orchestrations, simulators, and experiments.

BOF: AI Agents as Scientific Collaborators: Building Human-Agent Research Teams

Tuesday, June 2, 14:00
Charlie Catlett, Argonne National Laboratory
Rick Stevens, Argonne National Laboratory

Scientific AI agents are moving from tools to participants — attending conferences, contributing to research, and coordinating across institutions. This BOF features lightning talks delivered by AI agents alongside their human collaborators, showcasing live experiments in agentic research workflows. We then open the floor to explore a proposed international collaboration: a multi-institutional human-agent team working together to accelerate scientific discovery, reduce duplicated effort, and raise the quality of science.

Agent-Enabled Paper to Code Generation for ML Reproducibility Zhao Zhang (Rutgers University)
Towards Intelligent CFD Workflow in the Era of Large Language Models Shaowu Pan (Rensselaer Polytechnic Institute)
An AI Research Assistant for Automating the Computational Catalysis Pipeline Ruchika Mahajan (Stanford University)
Model Capabilities Driving New Paradigms of Agentic Patterns Matt Baughaman (Princeton Plasma Physics Laboratory)
Toward Agentic Closed-Loop AI for Battery Science: From SpectraQuery to Multimodal Experimental Agents Sreya Vangara (Stanford University)

Toward Scientific AI Platforms: Inference, Agents, and AI Services at HPC Facilities: Session 1

Wednesday, June 3, 8:30
Venkatram Vishwanath, Argonne National Laboratory
Ilkay Altintas, San Diego Supercomputer Center

Building on last year’s session on inference-for-science services, this session expands its focus to the broader landscape of scientific AI platforms at HPC facilities. As foundation models, domain-specific AI systems, and agentic workflows gain traction, HPC centers are actively developing infrastructure for scalable inference, AI agents, AI-ready data services, model gateways, and facility-scale AI services. This session will convene members from international HPC centers, application teams, vendors, and the open source community to share emerging best practices for deploying reliable, reproducible, and secure AI services for science. Discussion topics will span simulation and workflow integration, heterogeneous architectures, orchestration, agent skills, observability, sustainability, and workforce development. The session will gather use cases, identify shared technical and operational gaps, and define next steps for continued collaboration across the TPC community.

Toward Scientific AI Platforms: Inference, Agents, and AI Services at HPC Facilities: Session 2

Wednesday, June 3, 11:00
Venkatram Vishwanath, Argonne National Laboratory
Ilkay Altintas, San Diego Supercomputer Center

Building on last year’s session on inference-for-science services, this session expands its focus to the broader landscape of scientific AI platforms at HPC facilities. As foundation models, domain-specific AI systems, and agentic workflows gain traction, HPC centers are actively developing infrastructure for scalable inference, AI agents, AI-ready data services, model gateways, and facility-scale AI services. This session will convene members from international HPC centers, application teams, vendors, and the open source community to share emerging best practices for deploying reliable, reproducible, and secure AI services for science. Discussion topics will span simulation and workflow integration, heterogeneous architectures, orchestration, agent skills, observability, sustainability, and workforce development. The session will gather use cases, identify shared technical and operational gaps, and define next steps for continued collaboration across the TPC community.

BOF: Human-AI Collaboration

Wednesday, June 3, 14:00
Anurag Acharya, Pacific Northwest National Laboratory
Patrick Emami, National Renewable Energy Laboratory

This BOF centers on frameworks for Human-AI co-intelligence in scientific discovery, positioning collaboration — not autonomy — as the primary design goal. Building on our concept of Human-AI Virtual Laboratories, we will explore how mixed-initiative interaction, role differentiation, and coordinated workflows could enable scientists and AI systems to function as true teammates. The discussion will hopefully surface key design principles spanning agency, communication, and coordination, alongside open challenges in building such systems in practice. We will also briefly discuss evaluation as a supporting concern, focusing on how to assess collaborative effectiveness in these new paradigms.

Open Suite for Evaluating Model Skills, Knowledge, Reasoning, & Safety (Evaluation)

Develop an open suite of tools, methods, and benchmarks for evaluating the scientific skills, knowledge, reasoning, agentic capabilities, and safety/security of frontier models and AI systems.

Open Suite for Evaluating Model Skills

Tuesday, June 2, 14:00
Rio Yokota, Institute of Science Tokyo
Mohamed Wahib, RIKEN R-CCS

In this working group session, we will explore the challenges in building foundational AI models for science, as well as the technical policy challenges in training foundational models across the geographical boundaries and across different scientific domains. We will focus on identifying key bottlenecks in compute, networking, and coordination, and what it would take to overcome them. We'll also look at how these challenges play out across different scientific domains.

Benchmarking LLM-Generated Parallel Code for Task-Based Workflow Programming Models Eduardo Iraola de Aceve (Barcelona Supercomputing Center)

Agentic Reasoning with Scientific Foundation Models

Tuesday, June 2, 16:00
Ayan Biswas, Los Alamos National Laboratory
Christine Sweeney, Los Alamos National Laboratory

Scientific AI is rapidly shifting from predictive models toward agents that can plan, use tools, run simulations, analyze data, and automate parts of the research workflow. This session asks what “reasoning” means when scientific AI systems act, not just answer. Scientific agents must satisfy constraints that general-purpose LLM agents often do not: physical consistency, numerical validity, uncertainty quantification, provenance, reproducibility, security, and appropriate human oversight. We will discuss how agentic automation changes evaluation for SciML and AI for science. Which tasks should agents automate? How should they decide when to invoke solvers, query data, generate hypotheses, or ask for human input? How do we distinguish scientific reasoning from brittle tool use, brute-force search, or persuasive but invalid explanations? The session will touch upon topics such as community needs for benchmarks, testbeds, safety practices, provenance standards, and evaluation frameworks for reliable, interpretable, and useful scientific agents.

Training- and Deployment-Level Safety and Alignment (Safety & Alignment)

Develop methods to embed safety and alignment into the training and deployment of frontier-scale models and AI systems. Focus on system-level mechanisms that maintain alignment with scientific objectives and constraints, and with broader societal values, at extreme scale and in high-impact settings.

BOF: Safety and Alignment in Agentic Systems

Tuesday, June 2, 16:00
Charlie Catlett, Argonne National Laboratory

Agentic systems are being built or proposed for a diverse range of applications, from literature review to laboratory automation. Each domain brings common as well as unique safety requirements and alignment/containment strategies. This BOF seeks to find common approaches that might be applied across different application domains, ideally identifying architectural and design approaches that are broadly applicable to developing operational agentic systems for science and engineering.

A Comprehensive Multilingual Jailbreak Evaluation of Open-Source Large Language Models Kashyap Manjusha (University of Illinois Urbana-Champaign)
A Full-Stack Approach to Frontier Model Safety: Red-Teaming, Interpretability, Unlearning, and Formal Verification Sumit Kumar Jha (University of Florida)
The Need for New Safety Measures for LLMs in Scientific Applications Saket Chaturvec (Argonne National Laboratory)

Expanding and Deepening the AI Workforce (Workforce)

Identify and report on progress in developing the workforce required to achieve the rest of the tracks, with particular attention to emerging and evolving roles across the frontier AI stack. Examine needs across all career stages and share recent experiences and lessons learned to inform sustainable talent development.

BOF: Who Builds the Future? Workforce Challenges in Trillion-Parameter Scientific Computing

Wednesday, June 3, 8:30
Lois Curfman, Argonne National Laboratory

The computational science and research software engineering (RSE) communities are at a turning point as trillion-parameter-scale AI systems reshape code generation, simulation, and scientific workflows. Yet software ecosystems, team structures, expected roles, and collaboration practices were designed for a pre-AI era. Addressing this shift requires more than technical integration; it demands rethinking how we organize, incentivize, and sustain research software at scale. While progress has focused on models and infrastructure, critical workforce issues — including cross-disciplinary collaboration, evolving human and AI contributions, team dynamics, and incentive structures — remain underexplored. This BOF invites community input on these emerging challenges and opportunities.

How Fast Can We Evolve the Workforce? Roscoe Giles (Boston University)
From Writing Software to Ensuring It's Written Well: The Evolving RSE Role Arfon Smith (Schmidt Sciences)
Enabling Research Software Engineers to Leverage AI at APL: Policies, Tools, Training, and Use Cases John Vandegriff (Johns Hopkins University)
Designing Scientific Computing Ecosystems for the AI Era Lois Curfman (Argonne National Laboratory)
Lesson Learned from Hosting a Frontier AI and LLM Tutorial Series for a Mixed Audience Meifeng Lin (Brookhaven National Laboratory)
Iterative Co-Design, Co-Development, and Co-Delivery: Accelerating S&T Productivity Mary Ann Leung (Sustainable Horizons Inc.)