Clone, share and modify data pipeline JSON definitions

Take advantage of data pipelines JSON definitions


Each pipeline's definition is stored as a JSON object. Pipeline definitions are designed to be human-readable and can be exported by clicking the Export button in pipeline details view.

Valid definitions are required to have two nodes: datasets and operations.

{
  "datasets":{},
  "operations":[]
}

For example the following definition loads the user table from the already existing MyDBConn database connection and applies a filter operation. Pipeline definitions are decoupled from data connections. Connection parameters are not stored in and cannot be passed in via definitions. Each connection first needs to be added as normal under 'Data connections'.

{
  "datasets": {
    "user": {
      "loc": "user",
      "jdbcDatasourceName": "MyDBConn"
    }
  },
  "operations": [
    {
      "dataset": "user",
      "type": "filter",
      "condition": "enabled = false"
    },
    {
      "dataset": "user",
      "type": "show",
      "count": "5",
      "count_total": false,
      "truncate_values": false
    }
  ]
}

The type of connection is inferred from the jdbcDatasourceName property. For example, if MyDBConn was a DynamoDB connection, the property name would be dynamodbDatasourceName. To find the property name for any type of connection, load a table and look at the pipeline definition.

Use cases

Environment switching
Since data connections are identified by name, it is possible to create a 'test' pipeline using a small amount of data then create a new 'production' pipeline and update the data connection names accordingly. This is the recommended way to create data pipelines to reduce cost and execution time while designing the pipeline.

Backup
Pipelines can be backed up by exporting their definition. Definitions can be imported via the 'Add Pipeline -> Import definition' option.

Modification
Advanced users can modify a pipeline's definition manually. Careful, you may end up with an invalid definition.

Sharing
Pipeline definitions can be shared. Note that data connections with the same name must exist when importing from definition.

Cloning / Duplication
Rather than having to manually create multiple similar pipelines, cloning is possible either via Export / Import of a definition or by using the convenience function by clicking the 'Duplicate' button in Dashboard or Project view.