Introduction

Introduction
Over the past nine years, the landscape of stream processing research has evolved significantly, driven by advancements in real-time analytics, AI integration, and distributed architectures. In response to these shifts, we are pleased to present the 9th Workshop on Stream Processing, Stream-based AI & Stream Data Management in Big Data at IEEE Big Data 2025, bringing together the latest contributions in the field.

Stream processing remains a cornerstone of the Big Data ecosystem, with its applications expanding across IoT, AI-driven decision-making, financial technology, and cybersecurity. Innovations in serverless computing, edge stream analytics, continual learning, evolving graphs and event-driven data mesh architectures are now defining the future of real-time computing, reflecting the dynamic progress in this domain.

Key Evolutions in Stream Processing
Stream processing systems have become central components of modern data-intensive architectures, playing a central role in managing and analyzing continuous data flows [18, 19]. These systems increasingly integrate specialized frameworks for data management and analytics, enabling efficient real-time integration across diverse applications [20].

A key evolution in this space is the incorporation of advanced data management functionalities, allowing seamless interaction between transactional and analytical workloads [21]. One notable example is the deep integration of stateful stream processing with Hybrid Transactional and Analytical Processing (HTAP). This paradigm enables real-time analytics on streaming data with minimal latency by unifying "hot" (real-time) and "cold" (historical) data streams. Concepts such as Stream Tables and Materialized Views are leveraged to facilitate low-latency decision-making, demonstrating how stream processing is evolving beyond traditional event processing into a foundational pillar of modern data infrastructure [1,2].

New initiatives in data mesh architectures are also driving real-time data products and self-serve streaming infrastructures [3].

Parallel to this, AI-powered stream processing has reached a new level of maturity, with adaptive streaming machine learning models that address concept drift, support incremental learning, and perform real-time anomaly detection [4,6,7]. Emerging frameworks like Apache Flink, Ray Streaming [9,5] extend streaming pipelines to natively support machine learning workflows.

Graph-Based Stream Processing and Temporal Analytics
Graph-based stream processing is a crucial component of real-time analytics, particularly for applications requiring temporal graph analytics, such as fraud detection, cybersecurity, social network analysis, and supply chain optimization [10, 11]. Advances in this field enable real-time anomaly detection, link prediction, and dynamic knowledge graph construction, enhancing the scalability of streaming event processing.

Recent breakthroughs in graph neural networks (GNNs) have further expanded real-time stream processing capabilities, supporting tasks such as social network monitoring [12], dynamic fraud detection, and network anomaly detection.

Federated Learning, Edge Stream Processing, and Privacy-Aware Architectures
Federated learning and edge computing are reshaping real-time analytics, reducing reliance on centralized cloud architectures. Privacy-preserving stream analytics has become essential [8], enabling IoT devices to process data locally while ensuring secure and distributed intelligence.

Continual Learning in Stream-Based Applications
Continual learning [15, 16, 17] enables models to assimilate new information while preserving previously acquired knowledge. In the context of stream processing and real-time analytics, this capability is used for adapting to evolving data patterns and concept drift—situations where the statistical properties of the data change over time. Implementing continuous learning within streaming architectures allows systems to update predictive models on-the-fly, ensuring sustained accuracy and responsiveness. This adaptability is particularly useful in dynamic environments such as financial markets, cybersecurity, and IoT applications, where real-time decision-making hinges on the ability to learn from continuous data flows.

New Frontiers: Digital Twins, Decentralized Streaming, and Smart Systems
Stream processing in digital twins is a rapidly growing field, enabling real-time data pipelines for industrial automation, smart cities, and healthcare applications [13]. Similarly, decentralized streaming architectures are gaining traction, particularly in blockchain monitoring, decentralized finance (DeFi) analytics [14], and secure streaming frameworks.

With these evolving trends, the 2025 workshop will serve as a premier venue for researchers and industry leaders to discuss the latest advances in real-time streaming, event processing, and AI-driven stream analytics. We invite contributions to:

Stateful stream processing & hybrid transactional-analytical models
AI-powered stream mining, anomaly detection, and federated learning
Temporal graph streaming, graph neural networks, and dynamic event detection
Edge-based streaming analytics and privacy-preserving architectures
Real-time data products in data mesh and event-driven architectures
Streaming digital twins for industrial automation and smart cities
Decentralized streaming for blockchain, cybersecurity, and DeFi analytics

This workshop will bring together leading researchers and practitioners to explore cutting-edge advancements and shape the future of real-time data streaming.

REFERENCES

[1] Matthias J. Sax, Guozhang Wang, Matthias Weidlich, and Johann-Christoph Freytag. 2018. Streams and Tables: Two Sides of the Same Coin. In Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics (BIRTE '18).
[2] Nico Kruber, A Journey to Beating Flink's SQL Performance. https://www.ververica.com/blog/a-journey-to-beating-flinks-sql-performance, February 2020.
[3] Worapol Alex Pongpech (2023). A Distributed Data Mesh Paradigm for an Event-based Smart Communities Monitoring Product. Procedia Computer Science, 220, 584-591.
[4] Albert Bifet, Ricard Gavaldà, Geoffrey Holmes, Bernhard Pfahringer. Machine Learning for Data Streams: with Practical Examples in MO. SBN electronic: 9780262346047. 2018.
[5] Sonia Horchidan, Emmanouil Kritharakis, Vasiliki Kalavri, and Paris Carbone. 2022. Evaluating model serving strategies over streaming data. In Proceedings of the Sixth Workshop on Data Management for End-To-End Machine Learning (DEEM '22).
[6] Nasir, W., & Jack, H. (2025). Real-Time Machine Learning Pipelines: Optimizing Stream Processing for Scalable AI Applications. ResearchGate.
[7] Alam, M. A., Nabil, A. R., & Mintoo, A. A. (2024). Real-Time Analytics in Streaming Big Data: Techniques and Applications. Journal of Science and Technology. ResearchGate.
[8] C. B. Mawuli et al., "FedStream: Prototype-Based Federated Learning on Distributed Concept-Drifting Data Streams," in IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 53, no. 11, pp. 7112-7124, Nov. 2023, doi: 10.1109/TSMC.2023.3293462.
[9] Frank Sifei Luan, Ziming Mao, Ron Yifeng Wang, Charlotte Lin, Amog Kamsetty, Hao Chen, Cheng Su, Balaji Veeramani, Scott Lee, SangBin Cho, Clark Zinzow, Eric Liang, Ion Stoica, & Stephanie Wang. (2025). The Streaming Batch Model for Efficient and Fault-Tolerant Heterogeneous Execution. Arxiv. https://arxiv.org/abs/2501.12407.
[10] Algubelli, B. R., & Malikireddy, S. K. R. (2020). Deep Graph Neural Networks for Detecting Anomalies in Large-Scale Data Streams. ResearchGate.
[11] Bo Yan, Cheng Yang, Chuan Shi, Yong Fang, Qi Li, Yanfang Ye, and Junping Du. 2023. Graph Mining for Cybersecurity: A Survey. ACM Trans. Knowl. Discov. Data 18, 2, Article 47.
[12] Yao Ma, Ziyi Guo, Zhaocun Ren, Jiliang Tang, and Dawei Yin. 2020. Streaming Graph Neural Networks. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '20).
[13] A. B. A. Alaasam, G. Radchenko and A. Tchernykh, "Stateful Stream Processing for Digital Twins: Microservice-Based Kafka Stream DSL," 2019 International Multi-Conference on Engineering, Computer and Information Sciences (SIBIRCON), Novosibirsk, Russia, 2019, pp. 0804-0809, doi: 10.1109/SIBIRCON48586.2019.8958367.
[14] Li, D., Zhang, K., Wang, L. et al. A Geth-based real-time detection system for sandwich attacks in Ethereum. Discov Computing 27, 11 (2024). https://doi.org/10.1007/s10791-024-09445-6.
[15] Alchemist: Towards the Design of Efficient Online Continual Learning System. Y Huang, Y Liu, HS Gunawi, B Li, C Hwang - arXiv preprint arXiv:2503.01066, 2025.
[16] Besnard, Q., & Ragot, N. (2024). Continual Learning for Time Series Forecasting: A First Survey. Engineering Proceedings, 68(1), 49. https://doi.org/10.3390/engproc2024068049.
[17] Shin, J., Lee, S., You, Y., Park, J., Oh, J., & Yi, M. (2025). Accelerating Manufacturing Prototyping: A Continual Learning Approach for Imbalanced Sequential Image Generation. In the proceedings of the 4th Annual AAAI Workshop on AI to Accelerate Science and Engineering (AI2ASE).
[18] A. Margara et al. "A model and survey of distributed data-intensive systems", ACM CSur, 2023.
[19] M. Fragkoulis et al., A survey on the evolution of stream processing systems, VLDB Journal, 2024.
[20] L. De Martini, A. Margara, "Safe Shared State in Dataflow Systems", DEBS 2024.
[21] K. Psarakis et al., "Transactional Cloud Applications Go with the (Data)Flow", CIDR 2025.

Research Topics

The topics of interest include but are not limited to:

Stateful Stream Processing
and Hybrid Transactional-Analytical Processing (HTAP).
AI-Enhanced Stream Mining
and Anomaly Detection
in Real-Time Systems.
Federated Learning
for Distributed Stream Processing Architectures.
Temporal Graph Streaming
for Dynamic Event Recognition
and Predictive Analytics.
Edge-Based Streaming Analytics
and Low-Latency Decision-Making.
Privacy-Preserving Stream Processing
and Secure Data Sharing.
Event-Driven Architectures
and Stream Processing
in Data Mesh Frameworks.
Real-Time Data Products
and Continuous Query Optimization
in Stream SQL.
Streaming Digital Twins
for Industrial Automation
and Predictive Maintenance.
IoT-Enabled Stream Processing
for Smart Cities
and Intelligent Transportation.
Decentralized Streaming Architectures
for Blockchain and Distributed Finance (DeFi).
Real-Time Blockchain Analytics
for Fraud Detection
and Smart Contract Monitoring.
Online Learning.
Complex Event Processing (CEP)
and Complex Event Recognition (CER)
in Streaming Environments.
High-Performance Stream Processing Frameworks
and Next-Gen Distributed Systems.
Multimodal Streaming Data Fusion
for Cross-Domain Real-Time Insights.
Scalable Knowledge Graphs
and Semantic Reasoning
in Stream Analytics.

Keynotes

Keynote 1:Scaling Agentic Workflows with Apache Beam- Danny McCormick

Use of LLMs and agents is steadily growing in prominence and importance across the data processing ecosystem and more broadly across all of software. Today, though, many agentic prototypes fail to reach production for familiar reasons, including: overly complex resource management, the inability to colocate the correct context, high deployment costs, and a lack of guardrails for non-deterministic agents. Agentic workflows can solve some of these problems by providing boundaries for an agent to operate within, but there is a lack of scalable tools to do this well.

This talk will discuss why streaming data processing systems are broadly well positioned to safely scale and deploy agentic workflows. It will then explore how Apache Beam can be used to build and scale complex, agentic workflows, focusing on recent advancements in Beam ML and infrastructure, and possible future areas of investment.

About the Speaker

Danny McCormick is a staff software engineer at Google and a member of the Apache Beam Project Management Committee (PMC). Within Beam, he is particularly focused on Beam ML, the Yaml SDK, and improving Beam’s infrastructure. Prior to working on Beam at Google, Danny worked at GitHub where he helped launch GitHub Actions.

Keynote 2: CapyMOA, Adaptive Machine Learning for Data Streams and Online Continual Learning - Heitor Gomes

Machine learning is increasingly popular, with numerous applications across different scenarios. In this talk, I will explore problems where the data stream abstraction is particularly beneficial, such as situations where the data is evolving (i.e., experiencing concept drifts), and show how these same challenges underpin online continual learning, where models must continuously learn, adapt, and retain useful knowledge as new data arrives. Furthermore, I will introduce some specific challenges this setting presents, addressing both theoretical issues and practical considerations. Additionally, I will introduce our open-source software efforts, specifically capymoa, designed to facilitate further discoveries and advancements in the field.

About the Speaker

Heitor Murilo Gomes is a Senior Lecturer and leads the Adaptive AI Lab at Victoria University of Wellington (VUW) in New Zealand. His research centres on machine learning for data streams, with interests spanning ensemble methods, semi-supervised learning, concept drift, and the intersection of online continual learning and streaming data. Heitor is a core developer and project lead for CapyMOA and a maintainer of MOA (Massive Online Analysis), contributing broadly to open-source tools for online and stream learning. Prior to joining VUW, he was Co-Director of the AI Institute at the University of Waikato, where he supervised PhD students, led research projects, and taught the “Data Stream Mining’’ course (2020–2022). More information: personal page.

Information

IMPORTANT DATES

SUBMISSION DEADLINE: October 5, 2025
DECISION NOTIFICATION: November 4, 2025
CAMERA-READY SUBMISSION DEADLINE: November 23, 2025
Workshop: December 8-11, 2025

PUBLICATIONS

Your paper should be written in English and formatted to IEEE Computer Society Proceedings Manuscript Formatting Guidelines (Templates). The length of the paper should not exceed 6 pages.

All accepted papers will be published in the Workshop Proceedings by the IEEE Computer Society Press

SUBMIT PAPER

PROGRAM CO-CHAIRS

Sabri Skhiri
EURA NOVA, BE
Albert Bifet
Télécom Paris Tech, FR
Alessandro Margara
Politecnico di Milano, IT

PROGRAM COMMITTEE MEMBERS (TBC)

Oscar Romero,
UPC Barcelona Tecg, ES
Fabricio Enembreck
Pontifícia Universidade Católica do Paraná, BR
José del Campo Ávila
Universidad de Málaga, ES
Riccardo Tomasini
Institute National des Sciences Appliquées (INSA) Lyon, FR

9th Workshop
on Stream Processing,
Stream-based AI and
& Stream Data Management
in Big Data

COLOCATED WITH
THE 2025 IEEE INTERNATIONAL
CONFERENCE ON BIG DATA