DuckDB, an open source DB used by Google, Facebook and Airbnb

DuckDB, the DBMS used by Google, Facebook and Airbnb

DuckDB is a SQL OLAP database management system in the making

Recently the release of the new version of DuckDB 0.5.0 was announced, which is a developing analytics database management system (DBMS) used by Google, Facebook, and Airbnb.

DuckDB is a high-performance analytical database system. It is designed to be fast, reliable, and easy to use. DuckDB provides a rich dialect of SQL, with support far beyond basic SQL. DuckDB supports arbitrary and nested correlated subqueries, window functions, collations, complex types (arrays, structs), and more.

Among its main characteristics, the following stand out:

  • Simple installation
  • Integrated: no server management
  • Single file storage format
  • Fast analytical processing
  • Fast transfer between R/Python and RDBMS
  • It does not depend on any external state. For example, separate configuration files, environment variable.
  • Single file storage format
  • Composable interface. Fluent SQL Programmatic API
  • Fully ACID via MVCC

About DuckDB 0.5.0

Among the novelties is "out of core", which aims to solve the problems that may arise when the data being processed is larger than the memory by proposing intermediate results.

The new version uses Adaptive Radix Tree (ART) indexes to apply restrictions and speed up query filters. Until now, indexes were not persistent, leading to issues such as loss of index information and long reload times for data-constrained tables.

ART it is, in essence, an attempt to apply vertical and horizontal compression to create compact index structures. Intents are tree-like data structures, where each level of the tree contains information about some part of the data set. They are usually illustrated by character strings.

The project also added join order optimization, a common problem in analytical databases. Hyoun Park, CEO and Chief Analyst at Amalgam Insights, said that DuckDB's differentiation comes from the fact that it is a small application that works within code-based workflows to quickly scan large stores of data.

“DuckDB can often run queries directly on the data with no intermediate processing, which improves processing. From a purely technological point of view, it is somewhat similar to Actian Vector, which also takes a columnar vectorized OLAP query approach, although Actian is designed to fetch data rather than work on a process or load a specific job. »

DuckDB Labs provides advice and support. Co-founder and CEO Hannes Mühleisen, who also co-wrote the code and maintains the project, said he was inspired by SQLite, the serverless OLTP database engine, where he saw an opportunity for a similar approach, but for analytics.

DuckDB is also often used as part of an analytics or management stack. larger data. For example, if someone builds a custom application that collects data and then wants to create an SQL interface, they first had to copy the data and move it to another system, which could cause synchronization issues, he explained.

Download and get

It is important to mention that the home page clearly states that it should not be used for "large client/server installations for centralized enterprise data storage".

The project is working on the release of version 1.0, after which it will no longer be possible to make changes. The works of the academics of the Center for Mathematics and Theoretical Computer Science Centrum Wiskunde & Informatica in Amsterdam, DuckDB is integrated into a host process, it is worth noting that there is no DBMS server software to install, update or maintain.

For example, the DuckDB Python package can run queries directly on data from the Python software library, without importing or copying data. DuckDB is written in C++, is free and open source under the MIT license.

You can learn more about it as well as consult the installation manual, In the following link.


Leave a Comment

Your email address will not be published. Required fields are marked with *

*

*

  1. Responsible for the data: AB Internet Networks 2008 SL
  2. Purpose of the data: Control SPAM, comment management.
  3. Legitimation: Your consent
  4. Communication of the data: The data will not be communicated to third parties except by legal obligation.
  5. Data storage: Database hosted by Occentus Networks (EU)
  6. Rights: At any time you can limit, recover and delete your information.