Introduction

Introduction
Over the past nine years, the landscape of stream processing research has evolved significantly, driven by advancements in real-time analytics, AI integration, and distributed architectures. In response to these shifts, we are pleased to present the 9th Workshop on Stream Processing, Stream-based AI & Stream Data Management in Big Data at IEEE Big Data 2025, bringing together the latest contributions in the field.

Stream processing remains a cornerstone of the Big Data ecosystem, with its applications expanding across IoT, AI-driven decision-making, financial technology, and cybersecurity. Innovations in serverless computing, edge stream analytics, continual learning, evolving graphs and event-driven data mesh architectures are now defining the future of real-time computing, reflecting the dynamic progress in this domain.

Key Evolutions in Stream Processing
Stream processing systems have become central components of modern data-intensive architectures, playing a central role in managing and analyzing continuous data flows [18, 19]. These systems increasingly integrate specialized frameworks for data management and analytics, enabling efficient real-time integration across diverse applications [20].

A key evolution in this space is the incorporation of advanced data management functionalities, allowing seamless interaction between transactional and analytical workloads [21]. One notable example is the deep integration of stateful stream processing with Hybrid Transactional and Analytical Processing (HTAP). This paradigm enables real-time analytics on streaming data with minimal latency by unifying "hot" (real-time) and "cold" (historical) data streams. Concepts such as Stream Tables and Materialized Views are leveraged to facilitate low-latency decision-making, demonstrating how stream processing is evolving beyond traditional event processing into a foundational pillar of modern data infrastructure [1,2].

New initiatives in data mesh architectures are also driving real-time data products and self-serve streaming infrastructures [3].

Parallel to this, AI-powered stream processing has reached a new level of maturity, with adaptive streaming machine learning models that address concept drift, support incremental learning, and perform real-time anomaly detection [4,6,7]. Emerging frameworks like Apache Flink, Ray Streaming [9,5] extend streaming pipelines to natively support machine learning workflows.

Graph-Based Stream Processing and Temporal Analytics
Graph-based stream processing is a crucial component of real-time analytics, particularly for applications requiring temporal graph analytics, such as fraud detection, cybersecurity, social network analysis, and supply chain optimization [10, 11]. Advances in this field enable real-time anomaly detection, link prediction, and dynamic knowledge graph construction, enhancing the scalability of streaming event processing.

Recent breakthroughs in graph neural networks (GNNs) have further expanded real-time stream processing capabilities, supporting tasks such as social network monitoring [12], dynamic fraud detection, and network anomaly detection.

Federated Learning, Edge Stream Processing, and Privacy-Aware Architectures
Federated learning and edge computing are reshaping real-time analytics, reducing reliance on centralized cloud architectures. Privacy-preserving stream analytics has become essential [8], enabling IoT devices to process data locally while ensuring secure and distributed intelligence.

Continual Learning in Stream-Based Applications
Continual learning [15, 16, 17] enables models to assimilate new information while preserving previously acquired knowledge. In the context of stream processing and real-time analytics, this capability is used for adapting to evolving data patterns and concept drift—situations where the statistical properties of the data change over time. Implementing continuous learning within streaming architectures allows systems to update predictive models on-the-fly, ensuring sustained accuracy and responsiveness. This adaptability is particularly useful in dynamic environments such as financial markets, cybersecurity, and IoT applications, where real-time decision-making hinges on the ability to learn from continuous data flows.

New Frontiers: Digital Twins, Decentralized Streaming, and Smart Systems
Stream processing in digital twins is a rapidly growing field, enabling real-time data pipelines for industrial automation, smart cities, and healthcare applications [13]. Similarly, decentralized streaming architectures are gaining traction, particularly in blockchain monitoring, decentralized finance (DeFi) analytics [14], and secure streaming frameworks.

With these evolving trends, the 2025 workshop will serve as a premier venue for researchers and industry leaders to discuss the latest advances in real-time streaming, event processing, and AI-driven stream analytics. We invite contributions to:

  • Stateful stream processing & hybrid transactional-analytical models
  • AI-powered stream mining, anomaly detection, and federated learning
  • Temporal graph streaming, graph neural networks, and dynamic event detection
  • Edge-based streaming analytics and privacy-preserving architectures
  • Real-time data products in data mesh and event-driven architectures
  • Streaming digital twins for industrial automation and smart cities
  • Decentralized streaming for blockchain, cybersecurity, and DeFi analytics

This workshop will bring together leading researchers and practitioners to explore cutting-edge advancements and shape the future of real-time data streaming.

Research Topics

The topics of interest include but are not limited to:

Keynotes

Keynote 1:Scaling Agentic Workflows with Apache Beam- Danny McCormick

Use of LLMs and agents is steadily growing in prominence and importance across the data processing ecosystem and more broadly across all of software. Today, though, many agentic prototypes fail to reach production for familiar reasons, including: overly complex resource management, the inability to colocate the correct context, high deployment costs, and a lack of guardrails for non-deterministic agents. Agentic workflows can solve some of these problems by providing boundaries for an agent to operate within, but there is a lack of scalable tools to do this well.

This talk will discuss why streaming data processing systems are broadly well positioned to safely scale and deploy agentic workflows. It will then explore how Apache Beam can be used to build and scale complex, agentic workflows, focusing on recent advancements in Beam ML and infrastructure, and possible future areas of investment.

About the Speaker

Danny McCormick is a staff software engineer at Google and a member of the Apache Beam Project Management Committee (PMC). Within Beam, he is particularly focused on Beam ML, the Yaml SDK, and improving Beam’s infrastructure. Prior to working on Beam at Google, Danny worked at GitHub where he helped launch GitHub Actions.

Keynote 2: CapyMOA, Adaptive Machine Learning for Data Streams and Online Continual Learning - Heitor Gomes

Machine learning is increasingly popular, with numerous applications across different scenarios. In this talk, I will explore problems where the data stream abstraction is particularly beneficial, such as situations where the data is evolving (i.e., experiencing concept drifts), and show how these same challenges underpin online continual learning, where models must continuously learn, adapt, and retain useful knowledge as new data arrives. Furthermore, I will introduce some specific challenges this setting presents, addressing both theoretical issues and practical considerations. Additionally, I will introduce our open-source software efforts, specifically capymoa, designed to facilitate further discoveries and advancements in the field.

About the Speaker

Heitor Murilo Gomes is a Senior Lecturer and leads the Adaptive AI Lab at Victoria University of Wellington (VUW) in New Zealand. His research centres on machine learning for data streams, with interests spanning ensemble methods, semi-supervised learning, concept drift, and the intersection of online continual learning and streaming data. Heitor is a core developer and project lead for CapyMOA and a maintainer of MOA (Massive Online Analysis), contributing broadly to open-source tools for online and stream learning. Prior to joining VUW, he was Co-Director of the AI Institute at the University of Waikato, where he supervised PhD students, led research projects, and taught the “Data Stream Mining’’ course (2020–2022). More information: personal page.

Information

IMPORTANT DATES

SUBMISSION DEADLINE
October 5, 2025
DECISION NOTIFICATION
November 4, 2025
CAMERA-READY
SUBMISSION DEADLINE
November 23, 2025
Workshop
December 8-11, 2025

PUBLICATIONS

Your paper should be written in English and formatted to IEEE Computer Society Proceedings Manuscript Formatting Guidelines (Templates). The length of the paper should not exceed 6 pages.

All accepted papers will be published in the Workshop Proceedings by the IEEE Computer Society Press

SUBMIT PAPER

PROGRAM CO-CHAIRS

  • Sabri Skhiri
    EURA NOVA, BE
  • Albert Bifet
    Télécom Paris Tech, FR
  • Alessandro Margara
    Politecnico di Milano, IT

PROGRAM COMMITTEE MEMBERS (TBC)

  • Oscar Romero,
    UPC Barcelona Tecg, ES
  • Fabricio Enembreck
    Pontifícia Universidade Católica do Paraná, BR
  • José del Campo Ávila
    Universidad de Málaga, ES
  • Riccardo Tomasini
    Institute National des Sciences Appliquées (INSA) Lyon, FR