Introduction

For eight years in a row, we are happy to announce the new edition of the workshop. Stream processing and real-time analytics have become some of the most important topics in Big Data. Noticeably, the industry tends to develop more robust, powerful and intelligent stream processing applications. IoT applications, online predictive maintenance, fraud detection for instant payments, scoring of consumers on websites and shops, claims analysis and cost estimates, image processing for surveillance, food, and agriculture, etc. are only a few potential applications of real-time stream processing and analytics.

The introduction of stateful stream processing [9,14,16,29] has enabled the development of a new kind of real-time applications. Indeed, hot and cold data have been combined into a single real-time data flow using the concept of Stream Tables [15,16,18, 30]. The concept of duality between Streams and Tables is not recent. It was first introduced in 2003 as a “Relation to Stream” transformation, called STREAM [20]. However, it is only with the emergence of state management [14,31] that Stream Tables can now be used in real-time and in a completely distributed manner while still guaranteeing exactly once semantics and recovery mechanisms [28].

Furthermore, stateful stream processing has been applied in data management using Stream & Complex Event Processing (CEP) or Composite Event Recognition (CER) [33]. New architecture patterns were proposed to resolve data pipelines and data management within the enterprise. For instance, the authors in [11,12] proposed new designs for the Extract, Transform and Load (ETL) steps based on stream processing. Thus, by breaking down silos between Enterprise Data Warehouses (EDW) and Big Data lakes [13], doors have been opened to completely redesign the way data are transported, stored and used within the Big Data environment. More recently, Friedman et al. described in [21] how a Data Hub can be implemented to store and distribute data within an enterprise context.

In the past few years, researchers and practitioners in the area of data stream management and CEP/CER [1, 2, 3, 4, 5] have developed systems to process unbounded streams of data and quickly detect situations of interest. Nowadays, big data technologies provide a new ecosystem to foster research in this area [6]. Highly scalable distributed stream processors, the convergence of batch and stream engines, and the emergence of state management & stateful stream processing (such as Apache Spark [9], Apache Flink [10], Kafka Stream [18, 19], Google dataflow [17], Microsoft Trill [26] opened up new opportunities for highly scalable and distributed real-time analytics. Going further, these technologies also provide solid-foundation algorithms complementary to the CEP/CER in the use cases required by the industry. As a result, with the stateful nature of stream processors [14], stream SQL statements [27] can be applied directly in the streaming engine and dynamic tables can be created [12, 15, 18].

Besides, formalisms for reasoning on durative events have appeared in the past, and they were introduced for improving CER [22, 23, 24]. This led to the introduction of Stream Reasoning for improving Stream Mining tasks, autonomous cars or drones and many other use cases [32]..

For the present workshop, and following the discussion above, submissions studying scalable online learning, incremental learning on stream processing infrastructures, Complex event processing and Composite event recognition are welcomed. We also encourage submissions on data stream management, data architecture using stream processing and the Internet of Things (IoT) data streaming. Additionally, we appreciate submissions studying the usage of stream processing in new innovative architectures.

After the success of the first seven editions of this workshop, co-located with the IEEE Big Data since 2016, this last edition will be an excellent opportunity to bring together actors from academia and industry to discuss, explore and define new opportunities and use cases. The workshop will benefit both researchers and practitioners interested in the latest researches in real-time and stream processing. It will showcase prototypes or products leveraging big data technologies as well as online learning models, efficient algorithms for scalable CEP/CER and context detection engines, and also new architectures leveraging stream processing.

Finally, as our workshop places emphasis on reproducibility, we also encourage authors to make available all data used for empirical evaluations, the related software as well as clear instructions for reproducing the presented experiments. This can be added as a form of supplementary material. The reviewers will be encouraged to consider this material.

Research Topics

The topics of interest include but are not limited to:

Keynotes

Keynote 1:Serverless Stream Processing on Apache Flink: Early Lessons Learned - Konstantin Knauf

Over the past year, our team at Confluent has been dedicated to building the first truly serverless stream processing offering on top of Apache Flink. While Apache Flink is the de-facto standard in the field and has a proven track record at companies such as Netflix, Goldman Sachs and Apple, building a serverless solution on top came with unique challenges on every layer of the stack: from programming interfaces and observability to runtime and scheduling.

In this session, I will talk about what it means to us to be a “truly serverless” stream processor and discuss the solutions we have already implemented to reach this bar. We will also explore how our work is reflected in the Apache Flink Open Source community and share our vision for the project.

About the Speaker

Konstantin is a member of the Apache Flink PMC, long-term contributor to the project and Group Product Manager at Confluent. He joined the company early this year as part of the acquisition of Immerok which he had co-founded with a group of long-term community members earlier last year. Formerly, as Head of Product at Ververica, Konstantin supported multiple teams working on Apache Flink in both discovery as well as delivery. Before that he was leading the pre-sales team at Ververica, helping their clients as well as the Open Source Community to get the most out of Apache Flink.

Keynote 2:Combining expressivity and performance in distributed data processing - Luca De Martini & Alessandro Margara

Modern distributed platforms for scalable data analysis offer a high-level programming model that results in simple and concise definitions of the processing tasks, abstracting away most of the concerns associated to concurrency and distribution, but at the cost of a large performance gap with custom programs that use low-level primitives to control distribution and resource usage. To investigate if it is possible to reduce this performance gap without affecting ease of use we developed Noir, a novel platforms for data analysis that combines a simple programming model, similar but more expressive than state-of-the-art solutions, with an efficient engine that benefits from the performance and safety characteristics of the Rust language.

In this talk, we will describe the key design and implementation choices in Noir that contribute to such level of performance.

About the Speakers

Luca De Martini is a PhD Student at Politecnico di Milano. He received his Dr. Eng. degree in Computer Science and Engineering from Politecnico di Milano. His research interests include distributed systems, high-performance computing and computer security. His current research focuses on the performance aspects of distributed systems.

Alessandro Margara is an Associate Professor at Politecnico di Milano. He received his PhD in Information Technology from Politecnico di Milano and was a postdoctoral researcher at the Vrije Universiteit (VU) Amsterdam and at the Università della Svizzera italiana (USI). His research interests include event stream processing and software engineering for distributed and data-intensive systems. He leads the distributed software engineering group in Politecnico di Milano, which focuses on building efficient abstractions for large scale distributed systems.

Information

IMPORTANT DATES

SUBMISSION DEADLINE
October 20, 2023 (extended)
DECISION NOTIFICATION
November 16, 2022
CAMERA-READY
SUBMISSION DEADLINE
November 22, 2023
Workshop
December 15-18, 2023

PUBLICATIONS

Your paper should be written in English and formatted to IEEE Computer Society Proceedings Manuscript Formatting Guidelines (Templates). The length of the paper should not exceed 6 pages.

All accepted papers will be published in the Workshop Proceedings by the IEEE Computer Society Press

SUBMIT PAPER

PROGRAM CO-CHAIRS

  • Sabri Skhiri
    EURA NOVA, BE
  • Albert Bifet
    Télécom Paris Tech, FR
  • Alessandro Margara
    Politecnico di Milano, IT

PROGRAM COMMITTEE MEMBERS (TBC)

  • Till Rohrmann,
    Ververica/Alibaba, GE/CN
  • Vijay Raghavan
    University of Louisiana, US
  • Raju Gottumukkala
    University of Louisiana, US
  • Jian Chen,
    University of North Alabama, US
  • Nam-Luc Tran,
    SWIFT, BE
  • Guido Salvaneschi,
    TU Darmstadt, GE
  • Fabricio Enembreck
    Pontifícia Universidade Católica do Paraná, BR
  • José del Campo Ávila
    Universidad de Málaga, ES
  • Amine Ghrab,
    EURA NOVA, BE
  • Thomas Peel,
    GSK, BE
  • Oscar Romero,
    UPC Barcelona Tecg, ES
  • Hai-Ning Liang,
    Xi’an Jiaotong-Liverpool University, CN