Format-Preserving Encryption for Databricks
Encrypt and decrypt sensitive data in Databricks SQL and Apache Spark with Cyphera UDFs. Works with Unity Catalog, cluster libraries, and notebooks.
What It Does
Cyphera for Databricks adds format-preserving encryption to your Spark environment. Upload the Cyphera JAR as a cluster library or to a Unity Catalog Volume, register the UDFs, and call cyphera_protect to encrypt and cyphera_access to decrypt from SQL or DataFrame code. Data Protection Headers are embedded in the output so access needs no configuration name. Encrypted values keep their original format — SSNs stay SSNs, phone numbers stay phone numbers.
Quick Example
SQL
SELECT cyphera_protect('ssn', '123-45-6789'); -- → 'T01948-37-2150' (DPH-formatted, format preserved) SELECT cyphera_access(cyphera_protect('ssn', '123-45-6789')); -- → '123-45-6789'
Register UDFs in a notebook
# Python spark._jvm.io.cyphera.databricks.CypheraRegistrar.registerAll(spark._jsparkSession) // Scala io.cyphera.databricks.CypheraRegistrar.registerAll(spark)
How It Works
Build the JAR with Maven and deploy it to your Databricks workspace. You have two options: upload as a cluster library for automatic loading, or place it in a Unity Catalog Volume for more controlled access. Once the JAR is available, register the UDFs from any notebook and they become available across your SQL and DataFrame workloads.