Tutorials

TPC places special emphasis on education by offering tutorials. These training opportunities help participants learn new techniques, experiment with state-of-the-art tools, and engage with mentors and experienced practitioners.

TPC will be providing the following full- and half-day tutorials on Sunday, May 31 and Monday, June 1:

TPC26 Tutorials

These tutorials have been refined over the past 18 months, including the introductory “AI for Science” tutorial that has been presented to hundreds of people at Supercomputing Asia in February 2024, the TPC European Kickoff Workshop in June 2024, the University of Michigan’s annual Conference on Foundation Models and AI Agents for Science, and numerous local training events.

Sponsors are welcome to provide appropriate half-day tutorials on a subject that would be of interest to TPC members the morning of Monday, June 1. Download the sponsorship prospectus here.

Tutorials are open to all conference attendees, for an additional fee.

AI FOR SCIENCE: AGENTIC SYSTEMS AND SCIENTIFIC DISCOVERY

This is a lessons-learned tutorial designed to equip researchers with practical insights and conceptual grounding for applying AI systems to scientific challenges. Twelve talks from TPC participants across national laboratories, universities, and industry present concrete experience building AI systems that reason, plan, execute, and refine across simulation, experiment, and literature. The program covers the full arc from autonomous experimentation to production deployment, organized in four sessions:

Session 1: Closed-Loop AI and Self-Driving Laboratories: Deployed agentic systems running real experimental campaigns.
Session 2: Agentic AI for Chemistry and Materials Discovery: Domain research assistants and foundation models for molecules and materials.
Session 3: Agentic Architectures, Frameworks, and Coordination Protocols: Lessons building multi-agent systems, including MCP-based orchestration.
Session 4: Production Systems: Industry, HPC Runtimes, and Safety: Industrial deployment, runtime infrastructure, and safety for agentic AI.

Talks are exemplars showing AI use in applications ranging from accelerator operations, catalysis, battery science, drug discovery, and computational fluid dynamics, to semiconductor engineering. Each illustrates the use of AI frameworks and tools, including LangGraph, AG2, Claude Code, MCP, ChemGraph, Osprey, SpectraQuery, and Dragon, among others.

Learning Takeaways

Participants will develop a practical understanding of building and deploying AI systems for scientific discovery, including:

Designing closed-loop, agentic workflows that integrate experiment, simulation, and reasoning
Selecting and composing agentic frameworks and coordination protocols (LangGraph, AG2, MCP, and beyond)
Scaling multi-agent systems across HPC platforms and distributed scientific facilities
Operational realities of moving from prototype to production: runtimes, industrial deployment, safety, and trust

Attendees will come away with concrete patterns, deployment lessons, and emerging directions drawn directly from working AI-for-science systems across the TPC community.

This tutorial is being organized by Dan Stanzione of the Texas Advanced Computing Center, University of Texas at Austin, and will feature 12 lightning talks.

Session One

Closed-Loop AI and Self-Driving Laboratories

Session Two

Agentic AI for Chemistry and Materials Discovery

Session Three

Agentic Architectures, Frameworks, and Coordination Protocols

Session Four

Production Systems: Industry, HPC Runtimes, and Safety

Enabling Reproducible AI Workflows for the NAIRR Ecosystem

This tutorial aims to provide researchers with an introduction to the latest reproducible Artificial Intelligence (AI) and Machine Learning (ML) workflows and tools available through the NSF-funded Tapis v3 platform, which provides both an Application Programming Interface (API) and User Interface (UI). Using Tapis, researchers can discover AI/ML models and tools and deploy them directly to compute resources within the NAIRR and ACCESS ecosystem, supporting the goal of providing computation, data, software, models, training, and educational materials to advance research, discovery, and innovation.

Through hands-on exercises, participants will gain experience in developing AI/ML workflows and deploying them on a variety of HPC and cloud resources such as Jetstream2, Chameleon, and Vista, and Stampede3. We will emphasize the utilization of various Tapis core APIs, alongside specialized APIs such as Tapis Workflows, Tapis Pods, ML Hub and FlexServ, all seamlessly integrated within the user-friendly TapisUI. Using these production-grade services, we will demonstrate the creation and facilitation of trustworthy, reproducible scientific machine learning workflows. By the end of this tutorial, researchers will be empowered to efficiently develop, deploy, and maintain their own ML workflows.

Learning Takeaways

Participants will learn to securely authenticate with Tapis to access its core and advanced APIs, enabling the creation, execution, and deployment of scientific machine learning AI/ML workflows. By constructing well-defined workflows for real-world use cases, attendees will gain a foundational understanding of how to leverage HPC resources for research and utilize production-quality APIs to build transparent, reproducible ML pipelines.

This tutorial will be conducted by Anagha Jamthe, Wei Zhang, and Christian Garcia of the Texas Advanced Computing Center, University of Texas at Austin.

Attendee Preparation: To participate in the hands-on portions of the tutorial, attendees should create and activate a TACC account in advance by visiting here. We recommend completing this process prior to the tutorial to avoid delays during setup.

Session One

Overview of NAIRR Infrastructure and TapisUI

Session Two

Models, Large Models, and Third-Party Registries

Session Three

Prompt Engineering: Computer Vision Models with Jupyter

Session Four

Fine-Tuning and Analytics

Using AI to Accelerate Innovation & Discovery

This hands-on tutorial will equip computational scientists, engineers, developers, and students with practical skills for using AI models, tools, and agentic systems for maximum productivity, innovation, and discovery. The tutorial will begin by providing a foundation for understanding both predictive and generative AI methods, including how to minimize errors and to increase accuracy and useful results. The tutorial will then cover the powerful capabilities of LLMs and multi-modal models, with demos and hands-on labs. The majority of the workshop will then show attendees how AI technologies can augment phases from discovery and innovation from start to finish: deep literature research, ideation/hypothesis generation, research/development planning, application prototyping and development, code optimization, surrogate creation, and data analysis. The emphasis for every phase will be how AI technologies can assist and empower (not replace), and which tools are most useful for each task (and why).

Tutorial examples and labs will leverage the latest production and research/prototype Google technologies — Gemini, NotebookLM, Gemini CLI, Code Assist, Antigravity, AlphaEvolve, Co-Scientist, and others — that are available at time of this workshop. However, the core principles and strategies are designed to be portable, enabling scientists to effectively use any comparable AI models and tools in their own endeavors (and even in most of the labs). Multiple AI science applications (WeatherNext, AlphaFold 3, AlphaGenome, etc.) developed by Google DeepMind will be used to show the capabilities of AI-powered scientific discovery.

Learning Takeaways

Participants will learn how AI-powered tools can help in every phase of the computational research/application development process:

Prompt engineering: how to get the best results, and why some methods work better than others
Leveraging context windows to improve accuracy and usefulness of answers
Providing sources/content to further improve accuracy and usefulness of responses
Using agentic systems to further improve accuracy and to achieve useful results and make decisions

This tutorial will be conducted by Jay Boisseau, Advanced Computing Strategist and Megan Gawlik, Outbound Product Manager, HPC, at Google Cloud.

Session One

Deep Literature Research, Novel Hypothesis Generation, and Innovative Research Planning

Session Two

AI-Supercharged Code Prototyping, Development, Execution, and Optimization

Session Three

Understanding and Using AI Agents

Session Four

Developing and Using AI Agents and Surrogates

EVALUATION OF AI MODEL SCIENTIFIC SKILLS

This is a hands-on tutorial designed to equip researchers with practical skills and conceptual grounding in the application of LLMs to scientific challenges. Large Language Models (LLMs) are becoming capable of solving complex problems while presenting the opportunity to leverage them for scientific applications. However, even the most sophisticated models can struggle with simple reasoning tasks and make mistakes.

This tutorial focuses on best practices for evaluating LLMs for science applications. It guides participants through methods and techniques for testing LLMs at basic and intermediate levels. It starts with the fundamentals of LLM design, development, application, and evaluation while focusing on scientific application. Participants will also learn various complementary methods to rigorously evaluate LLM responses in benchmarks and end-to-end scenario settings. The tutorial features a hands-on session where participants use LLMs to solve provided problems.

Learning Takeaways

Participants will learn the principles and approaches for the use of LLMs as scientific assistants and how these can be evaluated with respect to scientific knowledge and reasoning skills, such as:

Use cases of LLMs for scientific applications
Importance of prompting and performance
Basic of LLM evaluation
Evaluation of LLMs for science and engineering
Advanced evaluation techniques of LLMs for Science and Engineering
Hands-on

This tutorial will be conducted by Franck Cappello, R&D Lead, Senior Computer Scientist, Sandeep Madireddy, Computer Scientist and AI Researcher, Neil Getty, Assistant Computer Scientist and Robert Underwood, Assistant Computer Scientist at Argonne National Laboratory.

Session One

Use Cases and Basic Evaluation Techniques

Session Two

Advanced Evaluation Techniques

Session Three

Hands-On Work

Building Scalable Agentic Systems for Science: Concepts, Architectures, and Hands-On with Academy

Agentic systems, in which autonomous agents collaborate to solve complex problems, are emerging as a transformative methodology in AI. However, adapting agentic architectures to scientific cyberinfrastructure — spanning HPC systems, experimental facilities, and federated data repositories — introduces new technical challenges. In this half-day tutorial, we introduce participants to the design, deployment, and management of scalable agentic systems for scientific discovery. We will present Academy, a Python-based middleware platform built to support agentic workflows across heterogeneous research environments.

Learning Takeaways

Participants will learn core agentic system concepts, including asynchronous execution models, stateful agent orchestration, and dynamic resource management. A guided hands-on session will help attendees build and launch their own agentic systems. This tutorial is designed for researchers, developers, and cyberinfrastructure professionals interested in advancing AI-driven science with next-generation autonomous systems.

This tutorial will be conducted by Ian Foster, Data Science and Learning Division Director at Argonne National Laboratory, and Yadu Babuji, University of Chicago.

Session One

Introduction to Agentic Systems and Academy

Session Two

Hands-On Implementation of Agentic Systems

End-to-End and Open-Source AI for Science: Training, Inference, and Agents Using AMD GPUs

Based on demonstrations and access to AMD Developer Cloud, this hands-on tutorial is designed to equip scientists with the necessary tools to leverage AI in scientific workflows using AMD’s open-source ROCm stack (consisting of frameworks, compilers, libraries and tools).

The program will leverage open data (e.g., Wikipedia), open models (e.g., AMD Instella, GPT-OSS, OpenFold), and recipes inspired by real use cases to demonstrate AI model training from first principles, domain-specific fine-tuning, optimized model inference with distillation, interleaving modeling/simulation codes with AI (at full and mixed precision), and orchestrating agentic frameworks on AMD GPUs.

We will also show use of AMD Primus, the flexible training framework enabling large-scale foundation model training, and Enterprise AI Suite, for model hosting and serving, both applied to scientific domains.

Learning Takeaways

Participants will experience a high developer-velocity guide with practical patterns and performance insights that they can apply to their own scientific workloads and infrastructure. Specifically, attendees will learn how to:

Get started and learn best practices for porting to AMD systems using open-source tools, models, and code
Build models, surrogates, and digital twins by training and fine-tuning AI models
Use tools to profile, troubleshoot, and optimize multi-GPU and multi-node training and inference jobs
Integrate “Computational Science Assistant” agents within simulation codes for application of AI in autonomous labs scenarios

This tutorial will be conducted by the AMD AI for Science Team, consisting of Ashwin Aji, Ryan Yard, Mike Schulte, and Jon Belof.

Session One

AMD AI Workflows, Ecosystem, Deployment, and Profiling Stack Overview

Session Two

AI4Science Studio: Agent-Driven Workflows for Scientific AI Models

HETEROGENEOUS COMPUTING AND AGENTIC WORKFLOWS FOR SCIENTIFIC DISCOVERY ON AWS

This hands-on tutorial equips researchers and computational scientists with practical skills for leveraging heterogeneous computing architectures and agentic AI workflows to accelerate scientific discovery on AWS. The tutorial is organized in two sessions.

The first session introduces classical HPC solutions and the quantum computing service, Amazon Braket, covering hybrid quantum-classical resources that support scientific research in academic and private industry settings. Participants will explore AWS HPC services and solutions — AWS Batch, AWS Parallel Computing Service, and AWS ParallelCluster — alongside the integration and deployment of hybrid quantum-classical workloads. The session features a recent implementation of the quantum-classical auxiliary field quantum Monte Carlo workflow and its application to modeling chemical reaction energies.

The second session demonstrates how agentic AI workflows offer a new paradigm for scientific computing: an AI agent receives a research question, reasons about the computational approach, retrieves benchmark datasets from a catalog, configures and launches simulations on cloud HPC, and analyzes the results — synthesizing findings, identifying anomalies, and recommending next steps without manual intervention. Participants will explore how agentic reasoning can accelerate the hypothesis-to-computation cycle, making production-scale scientific computing more accessible and repeatable.

Learning Takeaways

Participants will develop practical skills across heterogeneous computing and agentic orchestration for science:

AWS HPC services and solutions: AWS Batch, AWS Parallel Computing Service, and AWS ParallelCluster
The AWS fully-managed quantum computing service: Amazon Braket
Running hybrid quantum-classical applications with Amazon Braket
Scaling hybrid quantum-classical applications with multi-service solutions
Designing agentic pipelines that automate the scientific computing workflow from problem formulation through simulation execution and result verification
Leveraging dataset catalogs and agent tool-use patterns to enable autonomous data retrieval and simulation configuration
Deploying computational science workloads on elastic cloud HPC infrastructure through agent-driven orchestration

This tutorial will be conducted by Evan Donato, Senior Specialist Solutions Architect, AWS; Evgeny Epifanovsky, Senior Staff Technical Program Manager, IonQ; Tyler Takeshita, Senior Applied Scientist, AWS; and Lowell Wofford, Principal Technical Product Manager, AWS.

Session One

Heterogeneous Quantum and Classical Computing on AWS

Session Two

Agentic Workflows for AI-Driven Scientific Computing on AWS

Tutorials

TPC places special emphasis on education by offering tutorials. These training opportunities help participants learn new techniques, experiment with state-of-the-art tools, and engage with mentors and experienced practitioners.

TPC26 Tutorials

AI FOR SCIENCE: AGENTIC SYSTEMS AND SCIENTIFIC DISCOVERY

Learning Takeaways

Session One

Session Two

Session Three

Session Four

Enabling Reproducible AI Workflows for the NAIRR Ecosystem

Learning Takeaways

Session One

Session Two

Session Three

Session Four

Using AI to Accelerate Innovation & Discovery

Learning Takeaways

Session One

Session Two

Session Three

Session Four

EVALUATION OF AI MODEL SCIENTIFIC SKILLS

Learning Takeaways

Session One

Session Two

Session Three

Building Scalable Agentic Systems for Science: Concepts, Architectures, and Hands-On with Academy

Learning Takeaways

Session One

Session Two

End-to-End and Open-Source AI for Science: Training, Inference, and Agents Using AMD GPUs

Learning Takeaways

Session One

Session Two

HETEROGENEOUS COMPUTING AND AGENTIC WORKFLOWS FOR SCIENTIFIC DISCOVERY ON AWS

Learning Takeaways

Session One

Session Two

Countdown to

Keep Me Posted

Please email info@tpc26.org with any questions.

Quick Links

Agenda

Hackathons

Tutorials

What Is TPC?

Sponsor

Travel

Register