Stream processing and real-time analytics have become some of the most important topics of Big Data. Noticeably, the industry tends to develop more robust, powerful and intelligent stream processing applications. Fraud detection for instant payments, scoring of consumers on websites and shops, claims analysis and cost estimates, image processing for surveillance, food, and agriculture, etc,are only some potential applications of real-time stream processing and analytics.
The recent introduction of stateful stream processing [9, 14,16] has enabled the development of a new kind of real-time applications. Indeed, hot and cold data have been combined into a single real-time data flow using the concept of Stream Tables [16, 15]. We have to notice that the concept of duality between Streams and Tables is not recent. It was first introduced in 2003 as “Relation to Stream” transformation, called STREAM [18]. However, it is only with the emergence of state management [14] that Stream Tables can now be used in real time and in a completely distributed manner.
Furthermore, stateful stream processing has been applied in data management using Stream & Complex Event Processing (CEP). New architecture patterns were proposed to resolve data pipelines and data management within the enterprise. For instance, the authors in [11,12] proposed new designs for the Extract, Transform and Load (ETL) steps based on stream processing. Thus, by breaking down silos between Enterprise data warehouses (EDW) and Big Data lakes [13], doors have been opened to completely redesign the way data are transported, stored and used within the Big Data environment. More recently, Friedman et al. describe how a Data Hub can be implemented to store and distribute data within an enterprise context.
In the past few years, researchers and practitioners in the area of data stream management [1, 2, 3] and CEP [4, 5, 6] have developed systems to process unbounded streams of data and quickly detect situations of interest. Nowadays, big data technologies provide a new ecosystem to foster research in this area. Highly scalable distributed stream processors, the convergence of batch and stream engines, and the emergence of state management & stateful stream processing (such as Apache Spark [9], Apache Flink [10], Kafka Stream [17]) opened up new opportunities for highly scalable and distributed real-time analytics. Going further, these technologies also provide solid-foundation algorithms complementary to the CEP in the use cases required by the industry. Finally, with the stateful nature of stream processors [14], stream SQL statements can be applied directly in the streaming engine and dynamic tables can be created [12, 15, 16].
For the present workshop, and following the discussion above, submissions studying scalable online learning, and incremental learning on stream processing infrastructures are welcomed. We also encourage submissions on data stream management, data architecture using stream processing and the Internet of Things (IoT) data streaming. Additionally, we appreciate submissions studying the usage of stream processing in new innovative architectures.
After the success of the first three editions of this workshop, co-located with the IEEE Big Data 2016, 2017 and 2018, this fourth edition will be an excellent opportunity to bring together actors from academia and industry to discuss, explore and define new opportunities and use cases. The workshop will benefit both researchers and practitioners interested in the latest research in real-time and stream processing. It will showcase prototypes or products leveraging big data technologies as well as models, efficient algorithms for scalable CEP and context detection engines, and also new architectures leveraging stream processing.