Introduction

For seven years in a row, we are happy to announce the new edition of the workshop. Stream processing and real-time analytics have become some of the most important topics in Big Data. Noticeably, the industry tends to develop more robust, powerful and intelligent stream processing applications. IoT applications, online predictive maintenance, fraud detection for instant payments, scoring of consumers on websites and shops, claims analysis and cost estimates, image processing for surveillance, food, and agriculture, etc. are only a few potential applications of real-time stream processing and analytics.

The recent introduction of stateful stream processing [9,14,16,29] has enabled the development of a new kind of real-time applications. Indeed, hot and cold data have been combined into a single real-time data flow using the concept of Stream Tables [15,16,18, 30]. The concept of duality between Streams and Tables is not recent. It was first introduced in 2003 as a “Relation to Stream” transformation, called STREAM [20]. However, it is only with the emergence of state management [14,31] that Stream Tables can now be used in real-time and in a completely distributed manner while still guaranteeing exactly once semantics and recovery mechanisms [28].

Furthermore, stateful stream processing has been applied in data management using Stream & Complex Event Processing (CEP) or Composite Event Recognition (CER) [20]. New architecture patterns were proposed to resolve data pipelines and data management within the enterprise. For instance, the authors in [11,12] proposed new designs for the Extract, Transform and Load (ETL) steps based on stream processing. Thus, by breaking down silos between Enterprise Data Warehouses (EDW) and Big Data lakes [13], doors have been opened to completely redesign the way data are transported, stored and used within the Big Data environment. More recently, Friedman et al. described in [21] how a Data Hub can be implemented to store and distribute data within an enterprise context.

In the past few years, researchers and practitioners in the area of data stream management and CEP/CER [1, 2, 3, 4, 5] have developed systems to process unbounded streams of data and quickly detect situations of interest. Nowadays, big data technologies provide a new ecosystem to foster research in this area [6]. Highly scalable distributed stream processors, the convergence of batch and stream engines, and the emergence of state management & stateful stream processing (such as Apache Spark [9], Apache Flink [10], Kafka Stream [18, 19], Google dataflow [17], Microsoft Trill [26]) opened up new opportunities for highly scalable and distributed real-time analytics. Going further, these technologies also provide solid-foundation algorithms complementary to the CEP/CER in the use cases required by the industry. As a result, with the stateful nature of stream processors [14], stream SQL statements [27] can be applied directly in the streaming engine and dynamic tables can be created [12, 15, 18].

Besides, formalisms for reasoning on durative events have appeared in the past, and they were introduced for improving CER [22, 23, 24]. This led to the introduction of Stream Reasoning for improving Stream Mining tasks, autonomous cars or drones and many other use cases [32].

For the present workshop, and following the discussion above, submissions studying scalable online learning, incremental learning on stream processing infrastructures, Complex event processing and Composite event recognition are welcomed. We also encourage submissions on data stream management, data architecture using stream processing and the Internet of Things (IoT) data streaming. Additionally, we appreciate submissions studying the usage of stream processing in new innovative architectures.

After the success of the first six editions of this workshop, co-located with the IEEE Big Data since 2016, this last edition will be an excellent opportunity to bring together actors from academia and industry to discuss, explore and define new opportunities and use cases. The workshop will benefit both researchers and practitioners interested in the latest research in real-time and stream processing. It will showcase prototypes or products leveraging big data technologies as well as online learning models, efficient algorithms for scalable CEP/CER and context detection engines, and also new architectures leveraging stream processing.

Finally, as our workshop places emphasis on reproducibility, we also encourage authors to make available all data used for empirical evaluations, the related software as well as clear instructions for reproducing the presented experiments. This can be added as a form of supplementary material. The reviewers will be encouraged to consider this material.

REFERENCES

[1] E. Alevizos, A. Skarlatidis, A. Artikis, and G. Paliouras. “Probabilistic complex event recognition: A survey”. ACM Comput. Surv., 50(5):71:1– 71:31, 2017.
[2] Cugola, Gianpaolo, and Alessandro Margara. "Complex event processing with T-REX" Journal of Systems and Software 85.8: 1709-1728. 2012.
[3] I. Kolchinsky, I. Sharfman, and A. Schuster. “Lazy evaluation methods for detecting complex events”. In Proceedings of the 9th ACM International Conference on Distributed Event-Based Systems, DEBS 15, page 3445. ACM, 2015.
[4] Abadi, Daniel J et al. "The Design of the Borealis Stream Processing Engine." CIDR 4: 277-289. 2015.
[5] Agrawal, Jagrati et al. "Efficient pattern matching over event streams." Proceedings of the 2008 ACM SIGMOD international conference on Management of data 9 Jun. 2008: 147-160.
[6] N. Giatrakos, E. Alevizos, A. Artikis, A. Deligiannakis, and M. Garofalakis. “Complex event recognition in the big data era: A survey.” VLDB Journal, 2019.
[7] Confluent blog post: Event Sourcing, CQRS, Stream Processing and Apache Kafka: What’s the connection?
[8] Confluent blog post: A practical guide to build a stream data platform
[9] Matei Zaharia and al.: “Discretized Streams: Fault-Tolerant Streaming Computation at Scale”. Proceedings of the SOSP Conference. 2013
[10] Paris Carbone and al.: “Apache Flink™: Stream and Batch Processing in a Single Engine”. In the Bulletin of the IEEE Computer Society Technical Committee on Data Engineering. 2015
[11] Neha Narkhede, ETL is dead, Long Live Streams . December 2016
[12] Tathagata Das, Real-time Streaming ETL with Structured Streaming in Apache Spark 2.1. January 2017
[13] Michael Ambrust, Databricks Delta: A Unified Data Management System for Real-time Big Data October 2017.
[14] Paris Carbone and al., “State Management in Apache Flink™, Consistent Stateful Distributed Stream Processing”. In the proceeding of VLDB 2017.
[15] Fabian Hueske, Continuous Queries on Dynamic Tables. April 2017.
[16] Nico Kruber,A Journey to Beating Flink's SQL Performance February 2020.
[17] Tyler Akidau and al. "The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing". In the Proceedings of the VLDB Endowment, vol. 8, pp. 1792-1803. 2015.
[18] KStream Concepts, KTables, consulted in March 2020.
[19] Abhishek Gupta, Learn stream processing with Kafka Streams: Stateless operations, March 2020.
[20] Arasu and al. , “STREAM: The Stanford Data Stream Management System”. In the proceedings of SIGMOD 2003.
[21] Ted Friedman and al., “Implementing the Data Hub: Architecture and Technology Choices”. Gartner Report, August 2018.
[22] Foundation of Composite Event Recognition - Daghstul Seminar, February 2020.
[23] Artikis, A., Sergot, M.J., Paliouras, G.: An event calculus for event recognition. IEEE Trans. Knowl. Data Eng. 27(4), 895–908. 2015.
[24] Daniele Dell'Aglio, Emanuele Della Valle, Frank van Harmelen, Abraham Bernstein: Stream reasoning: A survey and outlook. Data Sci. 1(1-2): 59-83. 2017.
[25] Harald Beck, Minh Dao-Tran, Thomas Eiter: LARS: A Logic-based framework for Analytic Reasoning over Streams. Artif. Intell. 261: 16-70. 2018
[26] Chandramouli, Badrish & Goldstein, Jonathan & Barnett, M. & Deline, Robert & Fisher, D. & Platt, John & Terwilliger, James & Wernsing, J., “Trill: A high-performance incremental query processor for diverse analytics". 2014. 8. 401-412.
[27] Edmon Begoli, Tyler Akidau, Fabian Hueske, Julian Hyde, Kathryn Knight, and Kenneth Knowles. “One SQL to Rule Them All - an Efficient and Syntactically Idiomatic Approach to Management of Streams and Tables”. In Proceedings of the 2019 International Conference on Management of Data (SIGMOD '19)
[28] G. van Dongen and D. V. D. Poel, "A Performance Analysis of Fault Recovery in Stream Processing Frameworks," in IEEE Access, vol. 9, pp. 93745-93763, 2021, doi: 10.1109/ACCESS.2021.3093208.
[29] Stateful computation - Apache Flink [Online]. Available at https://flink.apache.org
[30] Matthias J. Sax, Guozhang Wang, Matthias Weidlich, and Johann-Christoph Freytag. 2018. “Streams and Tables: Two Sides of the Same Coin”. In Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics (BIRTE '18). Association for Computing Machinery, New York, NY, USA,
[31] Fragkoulis, Marios, Paris Carbone, Vasiliki Kalavri and Asterios Katsifodimos. “A Survey on the Evolution of Stream Processing Systems.” ArXiv abs/2008.00842 (2020).
[32] Daniel Alvarez-Coello, Daniel Wilms, Adnan Bekan, Jorge Marx Gómez. “Towards a Data-Centric Architecture in the Automotive Industry”. 2021. Procedia Computer Science, Volume 181, Pages 658-663, ISSN 1877-0509.

Research Topics

The topics of interest include but are not limited to:

New stream
processing architecture
for big data.
Complex Event Processing (CEP)
for big data, pattern
matching engines
for big data.
Composite Event Recognition (CER).
Language for streaming applications.
Stream Reasoning.
Scalable real-time
decision algorithms.
Scalable stream
processing architecture,
algorithms or models.
Stream mining.
Online & incremental learning.
Stream SQL and other
continuous query
languages on big data
frameworks.
Data pipelines & Data management with Streams.
Stream ETL and Real-Time Data Warehouse.
Stream Mining and algorithms.
Online & Incremental Learning and algorithms.
New or innovative architecture pattern leveraging stream processing.
IoT analytics.

Keynotes

Keynote 1: SQL Extensions To Support Streaming Data

For 40 years SQL has been the dominant language for data access and manipulation. Now that an increasing proportion of data is being processed in a streaming way, tool vendors (commercial and open source) have begun using SQL-like syntax in their event stream processing tools. Over the last couple of years, several of these vendors - including AWS, Confluent, Google, IBM, Microsoft, Oracle, Snowflake and SQLstream - have got together with the Data Management group at INCITS (who maintain the SQL standard) to work on streaming extensions. This talk will look at:

Why is this happening?
Who is involved?
How does the process work?
What progress has been made?
When can we expect to see a standard?

About the Speaker: Fabian Hueske works as a software engineer on streaming things at Snowflake. He is a PMC member of Apache Flink and one of the three original authors of the Stratosphere research system from which Apache Flink was forked in 2014. Fabian was a co-founder of data Artisans (now Ververica), a Berlin-based startup devoted to fostering Flink. He holds a PhD in computer science from TU Berlin and is the author of O'Reilly's "Stream Processing with Apache Flink".

Keynote 2: A story of batch and stream processing at Google and elsewhere

The batch and stream processing space today is rich with alternatives. Most systems have coalesced around SQL or fluent unified APIs to process streams of data. The current state of affairs is the result of a history of experimentation and exploration of different intuitions. This talk is an attempt to connect the dots from systems and APIs built inside Google and elsewhere. Specifically, this talk looks at systems like MapReduce, FlumeJava, Photon and Millwheel - and how the insights from these systems lead to the Dataflow model and Apache Beam.

About the Speaker: Pablo Estrada is a PMC Member for Apache Beam, and a Software Engineer in Google Cloud Dataflow for the last 6 years. He holds a B.S. from the National University of Mexico (UNAM) and an M.S. from Seoul National University.

Keynote 3: Govern your streams through a data products foundation

With the rise of streaming technology, more and more real-time data pipelines are created alongside all the existing project or enterprise data pipelines. This brings additional pressure on the data governance function that in most companies was already overwhelmed. We argue that a data products foundation is necessary to handle this increasingly heterogeneous data landscape and provide the required environment to govern data efficiently and address complex problems like purpose-based access rights, consent management, meta-data driven stream management or data lineage visualization.

About the Speaker: Marc Delbaere is a B2B software executive with a solid data background. He has built strong businesses from the ground up in entrepreneurial mode and managed more mature businesses at global level for large corporations (IBM and SWIFT). He drove the creation from the ground up of a SAAS platform (MyStandards) which has become the market default for managing financial exchange formats between the banks and their enterprise customers. As Global Head of Corporates at SWIFT, he launched and managed several global initiatives, most lately an international payment pre-validation service. He joined Digazu in 2021 as CEO. Marc Delbaere studied business administration and engineering at the Solvay Business School and the University of Brussels, where he specialized in actuarial sciences.

Programme

The workshop is held on Wednesday December 15

Time	Title	Author(s)
9:00 - 09:40	Keynote 1: SQL Extensions To Support Streaming Data	TFabian Hueske - Snowflake
9:40 - 10:20	Keynote 2: A story of batch and stream processing at Google and elsewhere	Pablo Estrada - Google
10:20 - 11:00	Keynote 3: Govern your streams through a data products foundation	Marc Delbaere - Digazu
11:20 - 11:30	Coffee Break
11:30 - 11:50	BigD329 Effective Weighted k-Nearest Neighbors for Dynamic Data Streams	Maroua Bahri,
11:50 - 12:10	BigD665 Prequential Model Selection for Time Series Forecasting based on Saliency Maps	Shivani Tomar, seshu Tirupathi, Dhaval Vinodbhai Salwala, Ivana Dusparic, Elizabeth Daly
12:10 - 12:30	BigD359 An LSTM Encoder-Decoder Approach for Unsupervised Online Anomaly Detection in Machine Learning Packages for Streaming Data.	Nabil Belacel, Rene Richard, Zhicheng Xu
12:30 - 12:50	BigD287Are Concept Drift Detectors Reliable Alarming Systems? - A Comparative Study	Lorena Poenaru-Olaru, Luis Cruz, Arie van Deursen, Jan Rellermeyer
12:50 - 13:10	BigD 657Region-based Sub-Snapshot (RegSnap): Enhanced Fault Tolerance in Distributed Stream Processing with Partial Snapshot	Takdir Takdir, Hiroyuki Kitagawa
13:10 - 13:15	Closing Remarks

Accepted Papers

Dynamic Ensemble Size Adjustment for Memory Constrained Mondrian Forest, Martin Khannouz and Tristan Glatard. Department of Computer Science and Software Engineering Concordia University, Montreal, Quebec, Canada.
Effective Weighted k-Nearest Neighbors for Dynamic Data Streams. Maroua Bahri, Inria Paris France.
Prequential Model Selection for Time Series Forecasting based on Saliency Maps. Shivani Tomar, seshu Tirupathi, Dhaval Vinodbhai Salwala, Ivana Dusparic, Elizabeth Daly. IBM Research Ireland & Trinity College Dublin.
An LSTM Encoder-Decoder Approach for Unsupervised Online Anomaly Detection in Machine Learning Packages for Streaming Data. Nabil Belacel, Rene Richard, Zhicheng Xu. National Research Council Canada, Canada.
Are Concept Drift Detectors Reliable Alarming Systems? - A Comparative Study. Lorena Poenaru-Olaru, Luis Cruz, Arie van Deursen, Jan Rellermeyer. TU Delft, Netherlands & Leibniz University Hannover, Germany.
Region-based Sub-Snapshot (RegSnap): Enhanced Fault Tolerance in Distributed Stream Processing with Partial Snapshot. Takdir Takdir, Hiroyuki Kitagawa. University of Tsukuba Japan & International Institute for Integrative Sleep Medicine, Japan.
EXOS: Explaining Outliers in Data Streams. Egawati Panjei, Le Gruenwald, Eleazar LealAbinash Borah, Carlos Sanchez. The University of Oklahoma, United States & University of Minnesota Duluth, United States & Apple Inc., United States.

Information

IMPORTANT DATES

SUBMISSION DEADLINE: October 10, 2022 (extended)
DECISION NOTIFICATION: November 10, 2022
CAMERA-READY SUBMISSION DEADLINE: November 20, 2022
Workshop: December 17-20, 2022

PUBLICATIONS

Your paper should be written in English and formatted to IEEE Computer Society Proceedings Manuscript Formatting Guidelines (Templates). The length of the paper should not exceed 6 pages.

All accepted papers will be published in the Workshop Proceedings by the IEEE Computer Society Press

SUBMIT PAPER

PROGRAM CO-CHAIRS

Sabri Skhiri
EURA NOVA, BE
Albert Bifet
Télécom Paris Tech, FR
Alessandro Margara
Politecnico di Milano, IT

PROGRAM COMMITTEE MEMBERS

Till Rohrmann,
Ververica/Alibaba, GE/CN
Vijay Raghavan
University of Louisiana, US
Raju Gottumukkala
University of Louisiana, US
Jian Chen,
University of North Alabama, US
Nam-Luc Tran,
SWIFT, BE
Guido Salvaneschi,
TU Darmstadt, GE
Fabricio Enembreck
Pontifícia Universidade Católica do Paraná, BR
José del Campo Ávila
Universidad de Málaga, ES

Amine Ghrab,
EURA NOVA, BE
Thomas Peel,
GSK, BE
Oscar Romero,
UPC Barcelona Tecg, ES
Hai-Ning Liang,
Xi’an Jiaotong-Liverpool University, CN

7th Workshop
on Real-time Stream Analytics,
Stream Mining, CER/CEP
& Stream Data Management
in Big Data

COLOCATED WITH
THE 2022 IEEE INTERNATIONAL
CONFERENCE ON BIG DATA