Databricks Integration

Format-Preserving Encryption for Databricks

Encrypt and decrypt sensitive data in Databricks SQL and Apache Spark with Cyphera UDFs. Works with Unity Catalog, cluster libraries, and notebooks.

What It Does

Cyphera for Databricks adds format-preserving encryption to your Spark environment. Upload the Cyphera JAR as a cluster library or to a Unity Catalog Volume, register the UDFs, and call cyphera_protect to encrypt and cyphera_access to decrypt from SQL or DataFrame code. Data Protection Headers are embedded in the output so access needs no configuration name. Encrypted values keep their original format — SSNs stay SSNs, phone numbers stay phone numbers.

Quick Example

SQL

SELECT cyphera_protect('ssn', '123-45-6789');
-- → 'T01948-37-2150' (DPH-formatted, format preserved)

SELECT cyphera_access(cyphera_protect('ssn', '123-45-6789'));
-- → '123-45-6789'

Register UDFs in a notebook

# Python
spark._jvm.io.cyphera.databricks.CypheraRegistrar.registerAll(spark._jsparkSession)

// Scala
io.cyphera.databricks.CypheraRegistrar.registerAll(spark)

How It Works

Build the JAR with Maven and deploy it to your Databricks workspace. You have two options: upload as a cluster library for automatic loading, or place it in a Unity Catalog Volume for more controlled access. Once the JAR is available, register the UDFs from any notebook and they become available across your SQL and DataFrame workloads.

Read Full Docs View on GitHub