Databricks Introduces Delta Sharing, an Open Source Protocol for Secure Data Sharing

Databricks the inventor and maintainer of Apache Spark, introduced several innovations for its Unified Analytics Platform at its Data + AI Summit 2021 user conference, including highlights the launch of a new open source project called "Delta Sharing" which provides a open protocol for secure data sharing between organizations in real time, regardless of the platform on which the data resides.

Delta sharing is included within the Delta Lake project, a table storage layer that the company released to open source in late 2019. The platform has already garnered support from a broad set of data providers, including Nasdaq, Amazon Web Services, Microsoft, Google, and Tableau Software.

Data sharing has become critical in the modern economy as companies seek to securely exchange data with their customers, suppliers, and partners. For example, a retailer may want to publish sales data for their suppliers in real time, or a supplier may want to share inventory in real time. But until now, data sharing has been very limited because the sharing solutions are tied to a single provider. This creates friction for both data providers and consumers, who naturally run different platforms.

Today, we launched a new open source project that simplifies sharing between organizations: Delta Sharing, an open protocol for the secure real-time exchange of large data sets, enabling the secure exchange of data between products for the first time. We are developing Delta Sharing with partners from the world's leading software and data providers.

Databricks said it hopes to address the inefficiency of processes often necessary manuals for organizations to exchange data with customers, partners and suppliers. Historically, data sharing products have been tied to a single vendor or commercial product, limiting collaboration between organizations using different platforms.

"The main way companies have shared with others is by going through a cumbersome process or using a rigid existing system that everyone must use," said Arsalan Tavakoli (pictured), co-founder and senior vice president of field engineering at Databricks.

Bringing together multiple data sources is also a chore. "You can't just give everyone access," he said. “You want access controls, auditing and version control. There is no way to do that today.

Delta Sharing limits vendor dependency and it enables a broader and more diverse set of use cases than has been previously possible, the company said. Unity Catalog that can be used in SQL, visual analysis tools, and programming languages ​​such as Python and R. Delta Sharing also enables organizations to share existing data sets on a large scale in Apache Parquet and Delta Lake Formats in real time without the need. of copies.

Delta Sharing is the fifth major open source project launched by Databricks, after Apache Spark, Delta Lake, MLflow for machine learning, and Koalas, which implements the pandas DataFrame application program interface on Spark. The project is being donated to the Linux Foundation.

Also Databricks also highlighted «Unity Catalog« a standardized data catalog and what is compatible with "Delta Sharing". Unity Catalog has a new interface that will facilitate the discovery and management of all the databases of a company, with a complete view of the data in the clouds and the existing catalogs, of course in the Lakehouse platform of Databricks.

Unity Catalog offers a single security model, based on ANSI SQL, to streamline deployment and standardize governance in the clouds. The tool also can be integrated into existing data catalogs of Alation, Collibra, Privacera and Immuta, so that the respective client can build on the existing one and establish a centralized and future-proof governance model without high migration costs.

Finally if you are interested in knowing more about it, you can check the details in the following link.


Leave a Comment

Your email address will not be published. Required fields are marked with *

*

*

  1. Responsible for the data: AB Internet Networks 2008 SL
  2. Purpose of the data: Control SPAM, comment management.
  3. Legitimation: Your consent
  4. Communication of the data: The data will not be communicated to third parties except by legal obligation.
  5. Data storage: Database hosted by Occentus Networks (EU)
  6. Rights: At any time you can limit, recover and delete your information.