DuckDB 0.6.0 has now been released and includes improvements to disk writing, data loading and more.

DuckDB, the DBMS used by Google, Facebook and Airbnb

DuckDB is a SQL OLAP database management system in the making

The release of the new version of the DBMS was announced DuckDB 0.6.0, version in which data compression has been improved, in addition to the fact that new functions have been added, as well as storage improvements, among other things.

DuckDB combine SQLite properties such as compactness, the ability to connect in the form of an integrated library, storage of the database in a single file and a convenient CLI interface, with tools and optimizations for performing analytical queries covering a significant part of the stored data, for example , which perform aggregation of all table contents or merge multiple large tables.

Main new features of DuckDB 0.6.0

In this new version that is presented, it is highlighted that work continued on improving the storage format, Besides that a disk write mode has been implemented, where when a large data set is loaded in one transaction, the data is compressed and streamed to a file from the database without waiting for the COMMIT command to commit the transaction.

Another of the changes that stands out in the new version is that added support for parallel loading of data into separate tables, which can significantly increase loading speed on multicore systems. For example, in the old version, loading a database with 150 million rows on a 10-core CPU took 91 seconds, and in the new version, this operation takes 17 seconds. There are two modes of parallel loading: with record order preservation and without order preservation.

For data compression, the FSST algorithm is used (Static Symbols Quick Table), which allows you to pack data within rows using a common dictionary of type matches. The application of the new algorithm allowed to reduce the size of the test database from 761 MB to 251 MB.

To compress numbers (DOUBLE and FLOAT) the Chimp and Patas algorithms are proposed. Compared to the previous Gorillas algorithm, Chimp provides a higher level of compression and faster decompression. The Patas algorithm lags behind Chimp in terms of compression, but is significantly faster in decompression speed, which is about the same as reading uncompressed data.

It also stands out that it was added an experimental ability to load data from CSV files in multiple parallel streams (SET experimental_parallel_csv=true), which significantly reduces load time for large CSV files. For example, when the option was enabled, the download time for a 720MB CSV file was reduced from 3,5 seconds to 0,6 seconds.

Of the other changes that stand out from this new version:

  • The possibility of parallel execution of index creation and management operations has been implemented.
  • SQL provides the ability to form queries that start with the word "FROM" instead of "SELECT". In this case, the query is assumed to start with "SELECT *".
  • Added support for the "COLUMNS" expression in SQL, allowing you to perform an operation on multiple columns without duplicating the expression.
  • Optimized memory consumption. By default on the Linux platform, the jemalloc library is used for memory management. Significantly improved performance of hash merge operations when memory is limited.
  • Added ".mode duckbox" output mode to the CLI, discarding center columns based on the lines width of the terminal window). With the ".maxrows X" parameter, you can also limit the number of output rows.
  • The CLI provides context-aware input autocompletion (keywords, table names, functions, column names, and file names input is completed).
  • The CLI is enabled by default to display a query progress indicator.

finally if you are interested in knowing more about it, you can check the details In the following link.


Leave a Comment

Your email address will not be published. Required fields are marked with *

*

*

  1. Responsible for the data: AB Internet Networks 2008 SL
  2. Purpose of the data: Control SPAM, comment management.
  3. Legitimation: Your consent
  4. Communication of the data: The data will not be communicated to third parties except by legal obligation.
  5. Data storage: Database hosted by Occentus Networks (EU)
  6. Rights: At any time you can limit, recover and delete your information.