FDM hero

Announcing the FaunaDB Data Manager

We’re pleased to announce the highly anticipated release of the FaunaDB Data Manager (FDM). The FDM can assist with a variety of import and export tasks, including:

  1. Copying documents, collections, indexes, functions, and roles from one FaunaDB database, at any particular point in time, to another FaunaDB database
  2. Importing and updating data from:
    1. A local directory using JSON or CSV files
    2. An existing FaunaDB database, including at a specific point in time for auditing, version control, or data recovery purposes
    3. Importing A SQL database, such as MySQL or Postgres, that is accessible over a JDBC connection
    4. An AWS S3 bucket using JSON or CSV files
  3. Exporting and backing up data to: 
    1. A local directory (as JSON files)
  4. Simple ETL (i.e., data formatting): 
    1. Changing a field or column's name and/or data type
    2. Setting a primary field (i.e., Ref column)
    3. Setting the import time of a document
    4. Ignoring fields
  5. Creating a new database pre-filled with demo data, indexes, and roles for testing purposes

Limitations

  • This tool is currently in Preview mode. We do not advise you write production code depending on it as potentially breaking changes might still be introduced.
  • Child databases are not processed. To process a child database, run the FaunaDB Data Manager with an admin key for that child database.
  • Keys and tokens are not copied. Since the secret for a key or token is only provided on initial creation, it is not possible to recreate existing keys and tokens. You would need to create new keys and tokens in the target database.
  • GraphQL schema metadata is not fully processed. This means that if you import an exported database, or copy one {server} database to another, you need to import an appropriate GraphQL schema into the target database in order to run GraphQL queries.
  • Schema documents have an upper limit of 10,000 entries per type. If a source database contains more than 10,000 collections, indexes, functions, or roles, only the first 10,000 of each type are processed and the remainder are ignored.
  • When exporting a FaunaDB database to the local filesystem, only collections and their associated documents are exported. A copy of the schema documents describing collections, indexes, functions, and roles is copied to the file fauna_schema. Currently, that schema file cannot be used during import.
  • FaunaDB imposes collection-naming rules, specifically that collections cannot be named any of the following: events, set, self, documents, or _. Unfortunately, the FaunaDB Data Manager does not have the ability to rename collections during processing. If your CSV, JSON, or JDBC sources have filenames/tables that use these reserved names, processing terminates with an error.
  • While the FaunaDB Data Manager works on Windows, only limited testing has been done on that platform. You may experience unexpected platform-specific issues. We certainly plan to expand our Windows testing for the FaunaDB Data Manager for future releases

Prerequisites

The FaunaDB Data Manager requires Java 11, or higher. To find the version of java currently being used, execute:

java -version

A recent version of Java can be downloaded here

Installing the FDM

Start by downloading the FDM zip file here.

Extract the zip file, then open your terminal and navigate to the unzipped FDM directory:

cd fdm-1.14

If you are running on Windows, ensure that the JAVA_HOME environment variable points to the folder where JDK has been installed, and add Java to your path:

set JAVA_HOME=C:\Program Files\Java\jdk-13.0.2
set PATH=%PATH%;%JAVA_HOME%\bin

And type the following command to output the help menu and make sure it is installed properly:

./fdm --help

As you should now see from the example commands printed in your terminal, all import/export methods follow the same basic format:

./fdm -source <arg> -dest <arg>

In the above command, -source defines the data source and -dest defines the destination. The subsequent arguments differ based on the source and destination types.

When reading data from a file, directory, or AWS bucket, the FDM examines each file’s contents and auto-detects if the file is supported. File extensions such as .json or .csv are ignored, and only the file contents are used to determine the type. If the type of file is not supported, the file is ignored.

Now, let's look at some specific examples!

Creating your demo databases and keys

For this tutorial, we will create two demo databases that we can copy and transform data between.

First, you need to create a new API key by following these steps: 

  1. Log into https://dashboard.fauna.com/
  2. Click on an existing database, or create a new database
  3. Click the “Security” button in the left navigation bar
  4. Click the “New Key” button
  5. Ensure that the “Role” field is set to “Admin”
  6. Enter a name for the key, e.g. “FDM”
  7. Click the “Save” button
  8. Save the displayed secret: it is only displayed once, and you may need to use it multiple times. If you lose it, you would have to create a new key.

Next, to populate a demo database, paste the displayed key secret into the following command, and run it in your terminal from within the FDM directory:

./fdm -demo REPLACE_WITH_SECRET

The output provides you with a key secret that can be used to access the newly created "fdm-demo" database:

Database <fdm-demo> created with secret <fnADnJHGHwACEw1QnhsY-C000000lSyB4Z000000>

Note: the key secrets used in this tutorial have been masked.

Next, let's create our second demo database and generate a secret for it. Visit https://dashboard.fauna.com/ and click "New database". Name your new database "fdm-demo-2" and click the checkbox to "Pre-populate with demo data". Click "Save". You should now be in your second demo database "fdm-demo-2".

Now, click "Security" in the left sidebar. Then, click "New key". Keep the Role option as "Admin", optionally give the key a name such as “key-for-fdm-demo”, and click "Save". Copy the secret displayed on the next screen and save it for other examples in this tutorial.

Note: You can use either an admin or server key; however, the roles from a FaunaDB database accessed via server key will not be imported for security reasons.

Copying data between FaunaDB databases

Now, we are ready to copy our data from one demo database to another.

To do so, use the following command, replacing the source key with the "fdm-demo" database secret from the terminal output in the previous section, and the destination key with the "fdm-demo-2" secret that we just generated:

./fdm -source key=fnADnJHGHwACEw1QnhsY-C000000lSyB4Z000000 -dest key=fnADnJW5CcACE6qfPsMOf111111R-l8ez111111

Now, visit https://dashboard.fauna.com/collections/@db/fdm-demo-2 to see that fdm-demo's collections have been copied into the fdm-demo-2 database.

If we had been copying from JSON or CSV files rather than a FaunaDB database, the FDM would have used the file's base name as the destination collection name.

Appending or Updating Documents

Not only can the FDM add new documents to a collection, it can also add to an existing document's event history, providing update functionality. This is controlled by two characteristics that every document possesses: the reference, and a timestamp denoting creation time. The reference and timestamp can be provided to the FDM by the input data and/or the command line argument `-format`. If no timestamp is provided, then the current date and time is used. If no reference is provided, the FDM generates a unique reference for the document before it inserts the document to the destination collection.

If a reference is provided, then three different outcomes are possible:

  1. If a document with the reference does not exist in the collection, then the FDM inserts the document into the destination collection.
  2. If a document with the provided reference does exist in the collection, but no timestamp was provided to the FDM, the default timestamp of “now” is used and a new version of the document is inserted.
  3. If a document with both the reference and timestamp already exists in the collection, the document’s history is modified at the point in time specified by the timestamp.

Example of updating a document

Let's return to the two databases from the example above, and change some of the data in the source database. To modify the data, visit a document in the fdm-demo database, for example: https://dashboard.fauna.com/collections/documents-edit/address/100/@db/fdm-demo. Next, modify the data, e.g., by changing the State from WA to NY.

Now run the same command from the last section to update fdm-demo-2 with the most recent data from fdm-demo:

./fdm -source key=fnADnJHGHwACEw1QnhsY-C000000lSyB4Z000000 -dest key=fnADnJW5CcACE6qfPsMOf111111R-l8ez111111

Now visit https://dashboard.fauna.com/collections/address/@db/fdm-demo-2 to see that the data has been updated! Additionally, the source data's document references have been retained.

Example of adding documents to a collection

In this example, we will add new documents to the collection "storehouses". To do this quickly, let's return to fdm-demo and rename the "address" collection to "storehouses" by visiting https://dashboard.fauna.com/collections-edit/address/@db/fdm-demo.

Now, run the last command yet again:

./fdm -source key=fnADnJHGHwACEw1QnhsY-C000000lSyB4Z000000 -dest key=fnADnJW5CcACE6qfPsMOf111111R-l8ez111111

Finally, visit https://dashboard.fauna.com/collections/storehouses/@db/fdm-demo-2 to see that documents with new references have been inserted into the storehouses collection.

Importing data from non-FaunaDB sources

For all of the import examples in this section, we will set the destination as "dryrun". This is useful for checking the data format before actually copying anything over, especially if you're doing any data transformations and need to confirm that you've defined all of your fields properly. Once you feel confident in the import, substitute “dryrun” for your destination database secret.

Importing from a local file or directory

To import a file or directory, simply specify the path argument. For example:

./fdm -source path=/work/fauna/fdm/load/ -dest dryrun
./fdm -source path=/work/fauna/fdm/load/data1000_basic.json -dest dryrun

Importing over a JDBC connection

In this case, you first need to download the JDBC driver. Here are handy links for the MySQL and Postgres drivers.

Then, in the example below, replace the value of the "jdbc=" argument with the path to your driver, which will always be a jar file. Replace the value of the "driver=" argument with the driver name as chosen by the vendor. Replace the value of the "url=" argument with the connection string. Next, add your username and password. Finally, you can optionally choose a source database and/or table. If any of your JDBC arguments contain an embedded equals (i.e =), then use the fdm.props configuration file instead of the command line.

<tt>./fdm -source jdbc=/workfauna/JDBC/mysql-connector-java-8.0.18.jar  driver=com.mysql.cj.jdbc.Driver url=jdbc:mysql://localhost:3306/demo user=root password=MY_PASSWORD database=demo table=tab2 -dest dryrun</tt>

Importing from an AWS S3 bucket

In this case, you just need the path to your AWS bucket and your AWS environment variables, which will likely be already set in your environment. Here is an example of what that looks like. The only argument you should need to replace below is the bucket path.

<tt>./fdm -source aws=${AWS_SECRET_ACCESS_KEY} id=${AWS_ACCESS_KEY_ID} region=${AWS_DEFAULT_REGION} bucket=//faunadb-work/my_database -dest dryrun</tt>

Transforming data and ETL

In all of the import, export, and copy options described above, we can perform the following transformations:

  1. Change a field or column's name
  2. Change a field or column's data type
  3. Set a primary field (i.e., reference ID)
  4. Set the creation time of a document
  5. Ignore fields

All transformations follow the same format:

<field-name>[->new-name]:<field-type>[(date_format)],...

Please see the official FDM documentation for all of the supported field types.

Let's return to our demo databases again, this time transforming two of the field names (lname and mgr), and changing the manager field to a string:

<tt>./fdm -source key=fnADnJHGHwACEw1QnhsY-C000000lSyB4Z000000 -dest key=fnADnJW5CcACE6qfPsMOf111111R-l8ez111111 -format "lname->last_name,mgr->manager:string"</tt>

Now, visit https://dashboard.fauna.com/collections/employee/@db/fdm-demo-2 to see that the two field names have changed, and the manager number has been wrapped in quotation marks to denote that it is now a string.

Copying a FaunaDB from a specific point in time

You can also optionally define a point in time with the `pit` argument, and export that database snapshot to a local folder, or another database. For example, try editing the timestamp in the command below to reflect the local date and time at which you initially created the "fdm-demo" database for this tutorial:

./fdm -source key=fnADnJHGHwACEw1QnhsY-C000000lSyB4Z000000 pit="2020-03-17T09:00:00" -dest key=fnADnJW5CcACE6qfPsMOf111111R-l8ez111111

Now visit https://dashboard.fauna.com/collections/employee/@db/fdm-demo-2 to see that all of our changes have been reverted.

The FDM automatically reads the following time/date formats in the `pit` argument:

"dd-MM-yyyy", "yyyy-MM-dd", "MM/dd/yyyy", "yyyy/MM/dd", "yyyyMMddTHHmm", "dd-MM-yyyyTHH:mm", "yyyy-MM-ddTHH:mm", "MM/dd/yyyyTHH:mm", "yyyy/MM/ddTHH:mm", "yyyyMMddTHHmmss", "dd-MM-yyyyTHH:mm:ss", "yyyy-MM-ddTHH:mm:ss", "MM/dd/yyyyTHH:mm:ss", "yyyy/MM/ddTHH:mm:ss"

This is useful for purposes like auditing, version control, and data recovery. To find out more information about this capability, read more about FaunaDB’s temporal features.

Backup your database yesterday at midnight

Regularly backing up a database to local storage is a common request. The command below provides the data in JSON format to local storage at a consistent point in time (i.e., midnight yesterday). The command takes three arguments: the key to the database being backed up, yesterday's date, and a path to a directory which should hold the data:

Mac

./fdm -source key={database_key} pit=`date -v-1d +%F` -dest path=/work/backup

Linux

./fdm -source key={database_key} pit=`date -d 'yesterday' '+%F'` -dest path=/work/backup

Recovering data from a directory

If your documents have been backed up as JSON in a directory, the FDM can easily restore them. First, create a new empty database and a key to access it. Then, execute the command below:

./fdm -source path=/work/backup -dest key={database_key}

If you must restore to an existing database, then you might have to adjust the schema and data policies. Currently, the FDM only recovers the collections (with default attributes) and the data documents, but it does have a file called “fauna_schema” which contains the definitions of all collections, indexes, functions, and roles.

Conclusion

We’d love to hear your thoughts on how this feature works for your use case in the #fdm channel in our Community Slack. We will continue to ideate on how to bring more functionality to imports, including importing data from a JDBC database like MySQL or Postgres.

With the FDM, users are able to easily import and export both small and large amounts of data into and out of a FaunaDB database. In addition, we hope that the getting started experience of bringing in your own data to FaunaDB will be greatly simplified.