First International Workshop on Composable Data Management Systems, 2022
Workshop Venue: Sydney, Australia - Co-located with VLDB 2022
Workshop Date: 9th September 2022
Overview
There has been a steady increase in the number of data management systems (DM Systems) over the past decade, mainly driven by the rise in the diversity of the data workload. As a response, several specialized DM systems were built to handle the diversity in the types of data (key-value, relational, graph, documents, temporal data, geo-spatial data, machine learning features, etc), diversity in the query workload (analytical, operational, ML training, stream processing, real-time), diversity in the storage (tape, disk, flash, in-memory; co-located vs disaggregated; etc), and resource ownership (cloud vs on-premise). Highly specialized domains like machine learning and AI also required novel features which were not offered by traditional DM systems. This led to a highly fragmented DM systems landscape centered around three pivots: (i) user experience, (ii) experience of the engineers building the DM systems, and (iii) data modeling.
The end user experience, mainly the query language and API offered by each DM system, is not uniform across systems. Even the semantics of the same built-in function is not guaranteed to be the same across DM systems. Unlike ANSI SQL which standardizes the way the end user interacts with a relational DBMS, there is no standard for designing language support for the specialized DM systems. This is because of the differences in the data modeling and the schema support of the DM systems. This also makes migration of users from one vendor to another non-trivial and time consuming.
Oftentimes development teams operate under very tight timelines to build solutions driven by pressing business needs, e.g, to put a new product on the market or beat a competitor to a new market. This leads to solutions that are custom to solve the specific problem at hand. Hence, a rise in the diversity of use cases leads to a rise in the number of specialized DM systems, which in turn poses a manageability and maintenance problem for the development teams down the road. Most attempts at unification and convergence of such systems is an afterthought, at best, and when seriously done will run into challenges from poor architectural choices made earlier (e.g, monolithic system design).
Due to the diversity of the data and the motivation to accelerate building DM systems, data models adopted by the DM systems have little in common. This will hinder the ability of engineers building the systems from sharing common components across systems as well as having a uniform language and API support. This also contributes to the diversity in user experience.
The central theme of this workshop is best expressed by a question: is there a way to build composable data management systems that facilitate unification and convergence of
Having common or translatable data models is probably the first step in this direction where the data model of one system can be easily mapped to that of the other. This also provides an opportunity to standardize language and semantics of the operations supported. Building modular systems by disaggregating the capabilities with clear input and output protocols on the lines of microservices / API can greatly improve the productivity of building complex DM systems for a particular use case. Open sourcing the basic building blocks offers great opportunities to share and build 'plug-n-play' type systems. For example, a typical DM system requires a query language parser, a query optimizer, a query execution engine and a transaction coordinator. Though the specific implementation of these modules differs from system to system, the core functionality that is common to all the systems can be offered by kernel modules. The specific DM system can reuse these modules and build on top of them by configuring and customizing the modules.
Moving away from building monolithic systems solves the problem of "one size does not fits all". But building specialized DM systems in silos leads to lack of reusable components across the stack. This workshop offers a forum to discuss alternatives for building specialized DM systems that can be developed in a methodical and mature fashion with unified user experience and reusability as the main objectives. We welcome submissions with this broad theme: research papers that present new ideas, vision papers that discuss the future of the area, or industry papers that describe specific use-case.
Topics
- Unified data models that cater to a wide range of data shapes, types and use cases
- Standard language support. Support for both deferred mode execution (SQL) and eager/interactive development (e.g. REPL, host language execution (Python/PyTorch)) Uniform operation semantics for interoperability across systems
- Engineering modular DM system components like parser, query optimizer, execution engine and transaction coordinator
- Query Processing and Optimization Techniques for composable DM systems
- Query Execution Techniques for composable DM systems
- Standardization of exchange protocols and API for different modules of DM systems
- Open source efforts for sharing composable DM modules