Introduction

Stream Processing and Real-time analytics has become one of the most important topic in Big Data. The emergence of new business cases have created a need to develop more robust, more performant, more intelligent Stream Processing applications and analytics. Today, we can find use cases in various industries such as Bank with real-time fraud detection, real time credit risk scoring; Insurance with real-time claim quotation, telematics; Retails with real-time marketing; Telecom with adaptive SLA in network, real-time customer care and many other examples.

These last two years, we have seen arriving another usage of Stream & complex event processing: the data management. New architecture patterns have been proposed to resolve data pipeline and data management within enterprise. In [11,12], the authors describe a way to redesign ETL (Extract Transform and Load) using Stream processing.This opened the door to completely redesign the way the data are transported, stored and used within Big Data environment by breaking down silos between EDW and Big Data lake as shown by [13].

In the past years, researchers and practitioners in the area of data stream management [1, 2, 3] and Complex Event Processing (CEP) [4, 5, 6] have developed systems to process unbounded streams of data and quickly detect situations of interest.

Nowadays, big data technologies provide a new ecosystem to foster research in this area. Highly scalable distributed stream processors, the convergence of batch and stream engines and the emergence of state management & statull stream processing (such as Apache Spark [9] or Apache Flink [10]) open new doors for highly scalable and distributed real-time analytics. Going further, those technologies also provide a solid foundation for real-time analytics algorithms that are complementary to the CEP in the use cases required by the industry. Finally, the Stateful nature of Stream Processors [14] allows to apply Stream SQL statement directly in the streaming engine and creating Dynamic tables [16 ,12].

As a result, we encourage submissions studying scalable online learning and incremental learning on stream processing infrastructure. In addition, we also encourage submissions on Data Stream management, data architectures using Stream processing and Internet of Things data streaming, Finally, we also encourage submissions studying the usage of stream processing in new innovative architectures.

After the success of the first and the second edition, this workshop, co-located with the IEEE Big Data 2016 & 2017, this third edition is an excellent opportunity to gather together actors from academia and industry to discuss, to explore and to refine new opportunities and use cases in the area. The workshop will benefit to both researchers and practitioners interested in the latest researches in real-time and stream processing. The workshop will showcase prototypes or products leveraging big data technologies as well as models, efficient algorithms for scalable complex event processors and context detection engines, or new architecture leveraging stream processing.

REFERENCES

[1] Abadi, Daniel J et al. "The Design of the Borealis Stream Processing Engine." CIDR 4 Jan. 2005: 277-289.
[2] Abadi, Daniel J et al. "Aurora: a new model and architecture for data stream management." The VLDB Journal—The International Journal on Very Large Data Bases 12.2 (2003): 120-139.
[3] Chandrasekaran, Sirish et al. "TelegraphCQ: continuous dataflow processing." Proceedings of the 2003 ACM SIGMOD international conference on Management of data 9 Jun. 2003: 668-668.
[4] Cugola, Gianpaolo, and Alessandro Margara. "Complex event processing with T-REX." Journal of Systems and Software 85.8 (2012): 1709-1728.
[5] Agrawal, Jagrati et al. "Efficient pattern matching over event streams." Proceedings of the 2008 ACM SIGMOD international conference on Management of data 9 Jun. 2008: 147-160.
[6] Brenna, Lars et al. "Cayuga: a high-performance event processing engine." Proceedings of the 2007 ACM SIGMOD international conference on Management of data 11 Jun. 2007: 1100-1102.
[7] Confluent blog post: Event Sourcing, CQRS, Stream Processing and Apache Kafka: What’s the connection ?
[8] Confluent blog post: A practical guide to build a stream data platform
[9] Matei Zaharia andal: “Discretized Streams: Fault-Tolerant Streaming Computation at Scale”. Proceedings of the SOSP Conference. 2013
[10] Paris Carbone and al. : “Apache Flink™: Stream and Batch Processing in a Single Engine”. In the Bulletin of the IEEE Computer Society Technical Committee on Data Engineering. 2015
[11] Neha Narkhede, ETL is dead, Long Live Streams . December 2016
[12] Tathagata DAS, Real-time Streaming ETL with Structured Streaming in Apache Spark 2.1. January 2017
[13] Michael Ambrust, Databricks Delta: A Unified Data Management System for Real-time Big Data October 2017.
[14] Paris Carbone and al., “State Management in Apache Flink™, Consistent Stateful Distributed Stream Processing”. In the proceeding of VLDB 2017.
[16] Fabian Hueske, Continuous Queries on Dynamic Tables. April 2017.

Research Topics

The topics of interest include but are not limited to:

New stream
processing architecture
for big data.
Complex Event Processing
for big data, pattern
matching engines
for big data.
Scalable real-time
decision algorithms.
Scalable stream
processing architecture,
algorithms or models.
Stream SQL and other
continuous query
languages on big data
frameworks.
Data pipelines & Data management with Streams.
Stream ETL and Real-Time Data Warehouse.
Algorithms for stream
mining or incremental
mining.
New or innovative architecture pattern leveraging stream processing
IoT analytics & stream mining

Keynote 1

Fabian Hueske, Data Artisans

Unified Processing of Static and Streaming Data with SQL on Apache Flink.

SQL is the lingua franca of data processing and everybody working with data knows SQL. While in the past, most open-source stream processing frameworks only provided Java or Scala-based APIs, stream processing with SQL is recently gaining a lot of attention because is makes stream processing accessible to a wider audience and significantly reduces the effort to solve common use cases.

About three years ago, the Apache Flink community started working on adding support for SQL. Today, thousands of continuous SQL queries power production systems in Alibaba, Huawei, Lyft, and Uber. Flink follows the approach of leveraging ANSI SQL syntax and semantics for processing static and streaming data. Unified syntax and semantics are important for various reasons, including existing user expertise, query portability, and the ability to efficiently bootstrap query state or backfill results from recorded data in case of failures. In this talk, I will explain Flink's approach in detail and highlight its benefits. Moreover, I'll discuss the challenges that arise when queries are continuously evaluated on infinite input.

Keynote 2

Sabri Skhiri, EURA NOVA

The challenge of Data Management in the Big Data Era & its underlying Enterprise architecture shift

Most of the digital transformation programs aim at either (1) generating new revenues, (2) better selling or better operating or (3) reducing cost. Data are at the core of these these 3 drivers. However, during the last 5 years we have seen emerging a new range of business and regulatory need to manage these data: real-time interaction with users, applications require to embed predictive models and data consumption in their business logic, self service data, data lineage, auditing, GDPR in EU and many more. As a result, the key question is "how the enterprise architecture should evolve to support these new needs ?" and especially what kind of data architecture to put in place.

The keynote will first describe the main challenges in terms of data to implement a digital strategy. We are going to see what exactly "a data architecture" is & how we can tackle the challenges. Finally, the “high-level” technical side will give a view on how we could implement such a data architecture and what are the products available on the market.

Programme

The workshop is held on Monday December 10

Time	Title	Author(s)
09:00 - 09:05	Workshop Opening
09:05 - 10:00	Workshop Keynote 1: Unified Processing of Static and Streaming Data with SQL on Apache Flink	Fabian Hueske, Data Artisans
10:00 - 10:30	Workshop Keynote 1: The new challenge of Data Management & Enterprise architecture shift it requires	Sabri Skhiri, EURA NOVA
10:30 - 11:00	Coffee Break
11:00 - 11:25	A Scalable and Robust Framework for Data Stream Ingestion	Haruna Isah and Farhana Zulkernine
11:25 - 11:50	Edge Computing architecture to support Real Time Analytic applications - A State-of-the-art within the application area of Smart Factory and Industry 4.0	Sebastian Trinks and Carsten Felden
11:50 - 12:15	Distributed Real Time Link Prediction on Graph Streams	Satya Katragadda, Raju Gottumukkala, Murali Pusala, and Vijay Raghavan
12:15 - 14:00	Lunch Break
14:00 - 14:25	Streaming Pattern Matching with Bounded Delay	Hossein Hamooni and Abdullah Mueen
14:25 - 14:50	Using Information in Access Logs for Large Scale User Identity Linkage	Leila Jalali, Narayanan Krishnamoorthy, and Rahul Biswas
14:50 - 15:15	Streaming Algorithm for Big Data Logistic Regression	Baijian Yang, Mengyao Wang, Zhenzhi Xu, and Tonglin Zhang
15:15 - 15:45	Coffee Break
15:45 - 16:10	Efficient Dynamic Time Warping for Big Data Streams	Rafael M. Martins and Andreas Kerren
16:10 - 16:35	Using Candlestick Charting and Dynamic Time Warping for Data Behavior Modeling and Trend Prediction for MWSN in IoT	Concepcion Sanchez Aleman, Niki Pissinou, Sheila Alemany, and Georges Kamhoua
16:35 - 17:00	A multi-dimensional extension of the Lightweight Temporal Compression Method	Bo Li, Omid Sarbishei, Hosein Nourani, and Tristan Glatard
17:00	Closing Remarks

Information

IMPORTANT DATES

SUBMISSION DEADLINE: October 10, 2018
DECISION NOTIFICATION: November 1, 2018
CAMERA-READY SUBMISSION DEADLINE: November 15, 2018

PUBLICATIONS

Your paper should be written in English and formatted to IEEE Computer Society Proceedings Manuscript Formatting Guidelines (Templates). The length of the paper should not exceed 6 pages.

All accepted papers will be published in the Workshop Proceedings by the IEEE Computer Society Press

SUBMIT PAPER

PROGRAM CO-CHAIRS

Sabri Skhiri
EURA NOVA, BE
Albert Bifet
Télécom Paris Tech, FR
Alessandro Margara
Politecnico di Milano, IT

PROGRAM COMMITTEE MEMBERS

Amine Ghrab
EURA NOVA, BE
Fabian Hüske
Data Artisans, DE
Fabricio Enembreck
Pontifícia Universidade
Católica do Paraná, BR
Guido Salvaneschi
TU Darmstadt, DE
Jian Chen
University of North Alabama, US
José del Campo Ávila
Universidad de Málaga, ES

Nam-Luc Tran
SWIFT, BE
Oscar Romero
UPC Barcelona, ES
Peter Beling
University of Virginia, US
Raju Gottumukkala
University of Louisiana,US
Thomas Peel
EURA NOVA, FR
Vijay Raghavan
University of Louisiana, US

Third Workshop
on Real-time
& Stream Analytics
in Big Data
& Stream Data Management

COLOCATED WITH
THE 2018 IEEE INTERNATIONAL
CONFERENCE ON BIG DATA