Second International Workshop on Composable Data Management Systems, 2023

Workshop Venue: Vancouver, Canada - Co-located with VLDB 2023

Workshop Date: 28th August 2023


Invited Talks


Title: Innovating with storage formats to push the limits in a hybrid database
Speaker: Eugene Kogan, Software Architect, SingleStore

Speaker Bio: Eugene's relatively short 22+ years in databases took him from Microsoft SQL Server to Google Spanner, Google BigQuery, Snowflake, and, finally, to a Software Architect role at SingleStore. His biggest interests in the RDBMS field have been semi-structured data processing, data replication, distributed query coordination and execution, and, lately, hybrid system architectures that include transactional, analytical and complex event processing in a single database.

Abstract: Data storage formats and also ways a query processor interacts with data structures in different storage tiers is one of the main design areas databases use to add value for customers and become more competitive. In this talk I'll explore SingleStore's proprietary storage formats and indexing techniques. I will also discuss how the data structures are used by the query processor, and illustrate that in many cases indexes and mixed column- and row-centric data representation do not lead to query planning complexity. SingleStore's case is interesting because of the variety of requirements to data storage and access methods, relative maturity and continued investments. SingleStore is a distributed hybrid transactional and analytical database that efficiently supports queries with low, medium and high selectivity, as well as DML with small, sparse and bulk data updates. I'll discuss reasoning for prior design decisions, and report on new unpublished work.



Title: Holistic Extensibility for Integrated Data Analysis Pipelines in DAPHNE
Speakers: Patrick Damme, Postdoctoral Researcher, Technische Universität Berlin

Speaker Bio: Patrick Damme is a postdoc researcher in the DAMS Lab research group at TU Berlin and the BIFOLD in Berlin, Germany. His research interests are centered around database systems, machine learning systems, and techniques for making complex analyses of large data volumes efficient, scalable, and simple. He is one of the main contributors to DAPHNE, an open and extensible system infrastructure for integrated data analysis pipelines, which he started working on in his previous occupation as a postdoc at Graz University of Technology and the co-located Know-Center in Graz, Austria. Patrick received his PhD in computer science from TU Dresden, Germany, where he was a part of the Dresden Database Systems Group.

Abstract: Integrated data analysis (IDA) pipelines combining data management and query processing, machine learning, and high-performance computing become increasingly important in practice. Deploying IDA pipelines is cumbersome and quickly leads to overheads and hardware under-utilization due to the suboptimal integration of existing systems. At the same time, today’s hardware challenges lead to increasing specialization in terms of compute and storage devices as well as the data representation, which further complicates the deployment. To address these issues, in the DAPHNE project, we build an open and extensible system infrastructure for IDA pipelines, aiming for increased productivity and performance. End users define IDA pipelines in a domain-specific language offering a unified experience over relational and linear algebra, augmented by high-level data science primitives. Internally, an IDA pipeline runs through an optimizing compiler and is efficiently executed in local or distributed environments exploiting heterogeneous hardware and computational storage. Researchers may easily extend many aspects of DAPHNE including custom data representations, operator kernels, optimizer passes, and runtime schedulers to experiment with new specialized approaches in a full-fledged system. In this talk, I will give an overview of our ongoing work in DAPHNE and on its holistic extensibility in particular.