Skip to main content
To KTH's start page To KTH's start page

Congratulations Mihhail Matskin!

Published Jan 31, 2022

Mihhail Matskin, your paper "Locality-Aware Workflow Orchestration for Big Data” written in cooperation with Norwegian colleagues at SINTEF and NTNU has been selected as Best Conference Paper at the MEDES 2021 conference. The paper was written as a part of the Horizon 2020 project - DataCloud. Could you tell us a bit about the project and the paper?

“DataCloud is a HORIZON 2020 project, which runs between 2021 and 2023 and aims to develop a novel paradigm for Big Data processing over heterogeneous resources on the Computing Continuum. The Computing Continuum federates Cloud services with Edge and Fog computing paradigms. The emergence of the Edge and Fog computing paradigm has shifted data processing from centralised infrastructures to heterogeneous and geographically distributed infrastructures. The Computing Continuum enables new opportunities for supporting Big Data processing, but requires efficient management of heterogeneous and untrusted resources.”

“The core concept of the DataCloud project is Big Data pipelines, whose complete lifecycle is supported by processing capabilities. Our main objective is to develop the ‘DataCloud toolbox’ comprising of new languages, methods, infrastructures and software prototypes for discovering, defining, simulating, deploying, and adapting Big Data pipelines on heterogeneous and untrusted resources in a manner that makes execution of Big Data pipelines traceable, trustable, manageable, analyzable and optimizable. The aim is to lower the technological entry barriers to the incorporation of Big Data pipelines in organizations' business processes and make them accessible to a wider set of stakeholders (such as start-ups and SMEs) regardless of the hardware infrastructure.”

“Work on the project covers a broad spectrum of research including, in particular, discovering value from Dark Data (i.e., data collected but not used), data locality management, graphical Domain-Specific Languages, pipelines simulation, optimisation, adaptation and blockchain technology for resource management.”

“We have 11 partners in the project from 9 European countries which include universities, research organizations, large companies and SMEs. We are going to evaluate the DataCloud solutions on five business cases provided by the DataCloud business partners, which cover a broad spectrum of Big Data pipeline applications: Smart mobile marketing campaigns, Automatic live sports content annotation, Digital health system, Predicting deformations in ceramics and Analytics of manufacturing assets.”

“The paper which received the best paper award at the MEDES2021 conference is a part of the work done in the context of the DataCloud project on data localisation. It considers data locality to reduce the performance penalties from data transfers among remote data centres during data processing. Existing big data processing solutions provide limited support for handling data locality and are inefficient in processing small and frequent events specific to the edge environments. The paper proposes a novel architecture and a proof-of-concept implementation for software container-centric big data workflow orchestration that puts data locality at the forefront.”

The paper was co-written with researchers from SINTEF and NTNU. How would you describe your cooperation? What was the best part?

“I have been connected to the Norwegian research environment for many years. Before moving to KTH 20 years ago, I worked at NTNU as a professor and then continued as an adjunct professor for a long period. However, it is interesting that our current collaboration started from another source. One of my colleagues from SINTEF, Dr Dumitru Roman, was an opponent to my Doctoral student. After the defence, we discussed that it would be nice to materialise our mutual research interests via informal joint supervision of our students, both on Master and Doctoral levels. It was not clear if this would work with remote communication but suddenly (because of covid-19), it became the only communication channel, and weekly meetings become a routine.”

“We started with master student supervision, and it worked smoothly. We published several joint papers during the last few years, including two journal papers. Also, at that time, the DataCloud project proposal was submitted (SINTEF is a project coordinator) and accepted, receiving 15 out of 15 points in the evaluation. Now our collaboration is more formalised in the framework of the project, and financing helps us to employ new doctoral students. I think that the main factor of success in our collaboration was motivation and commitment to a joint work because of our mutual and complementary research interests.”

“I would like to take the opportunity of answering your question also to say that in addition to motivation (which is important), we need to have more formalised ways to make our collaborations with external partners more sustainable. In particular, a spectrum of affiliation types (not only via employment) could be a good solution.”

What do you think are the most exciting future research directions within your research field?

“There is much exciting research in the field which is broad enough. For me, the most promising research directions are automation of Big Data pipelines construction via extracting knowledge from different data sources and reuse of accumulated design knowledge.”