Databases and Tables in Azure Databricks
It’s fairly simple to work with Databases and Tables in Azure Databricks. Let’s start off by outlining a couple of concepts.
A database in Azure Databricks is a collection of tables and a table is a collection of structured data. Tables in Databricks are equivalent to DataFrames in Apache Spark. This means that:
- You can cache, filter and perform any operations on tables that are supported by DataFrames.
- You can also query tables using the Spark API’s and Spark SQL.
There are two types of tables in Databricks:
- Global Tables. These are available across all clusters. In Auzre Databricks, Global tables are registered to the Hive metastore.
- Local Tables. These are only available to the cluster to which it was created on and there are not registered to the Hive metastore. These are also known as temp tables or views.
In this blog post, I’m going to do a quick walk through on how easy it is to create tables, read them and then delete them once you’re done with them. If you want to follow along, you’ll need an Azure subscription and you’ll need to create a Databricks instance.
Creating tables
This is pretty easy to do in Databricks. You can either create tables using the UI tool…