Creating a Data Flow

Create a data flow in a project or folder in Data Integration. A data flow is a visual program that represents the flow of data from source data assets, such as a database or flat file, to target data assets, such as a data lake or data warehouse.

Data Integration includes one default project to get you started. To create other projects or folders, see Projects and Folders.

    1. Open the project or folder in which you want to create the data flow.

      For the steps to open the details page of a project or folder, see Viewing the Details of a Project or Viewing the Details of a Folder.

    2. On the project or folder details page, click Data flows.
    3. In the Data flows section, click Create data flow.

      The designer opens in a tab. On the canvas, the Operators panel and Properties panel are open.

    4. On the Details tab in the Properties panel, enter a name and an optional description for the data flow.

      The identifier is a system-generated value based on the name. You can change the value, but after you create and save the data flow, you can't update the identifier.

    5. (Optional) For Project or folder field, click Select and select a different project or folder to save the data flow in.
    6. Drag data flow operators from the Operators panel onto the canvas to design the data flow.

      To be valid, a data flow must have at least one source operator and one target operator. Although Data Integration supports multiple target operators in a data flow, a target operator can have only one inbound port.

      Tip

      When you use a sort operator, apply the sort operator after you apply other operators. Applying the sort operator immediately before the target operator ensures that the data for the target is inserted in the sort order that you want.

    7. To duplicate a source, target, or expression operator, right-click the operator icon and select Duplicate. Then select the duplicated operator and rename the identifier in the Properties panel.

      If the original operator is connected to other operators, the connections aren't copied to the duplicated operator.

    8. Connect the operators on the canvas:
      • Hover over an operator until you see the connector (small circle) on the right side of the operator and then drag the connector to the next operator that you want to connect to. A connection is valid when a line connects the operators after you drop the connector.

        Note

        A connection line symbolizes how data flows from one node to the other. Although you can drag a visible connector from one object to another, you can't have more than one inbound connection line to a filter, expression, aggregate, distinct, sort, and target operator.

      • To insert an operator between two connected operators, right-click the connection line and use the Insert menu.

      • To delete a connection, right-click the line and select Delete.

    9. On the Details tab in the Properties panel, configure basic and required properties for each operator.
      • For information about assigning parameters, and viewing system parameters that are available at runtime, see Using Data Flow Parameters.

      • Where applicable, use the Advanced options tab to specify other properties. For information about advanced properties for each operator, see Using Data Flow Operators.

    10. To save the data flow for the first time, click one of the following buttons:
      • Create: Creates and saves the data flow. You can continue to create and edit the data flow in the designer.
      • Create and close: Creates and saves the data flow, closes the designer, and returns you to the Data flows list on the project or folder details page.
    11. Save periodically while you work in the designer by clicking one of the following buttons:
      • Save: Commits changes since the last save. You can continue editing after saving.
      • Save and close: Commits changes, closes the designer, and returns you to the Data flows list on the project or folder details page.
      • Save as: Commits changes (since the last save) and saves to a copy instead of overwriting the current data flow. You can provide a name for the copy and select a different project or folder for the copy, or save the copy in the same project or folder as the current data flow.
    12. Validate the data flow to check for warnings or errors that could cause issues during runtime. In the designer toolbar, click Validate.

      Data Integration displays the Global validation panel. If warnings or errors are found, click an identifier name in the list of issues to bring the operator with that warning or error into focus on the canvas.

    13. When you finish working in the data flow, click Create and close or Save and close.
    To run the data flow, create an integration task. See Creating an Integration Task.
  • Use the oci data-integration data-flow create command and required parameters to create a data flow:

    oci data-integration data-flow create [OPTIONS]

    For a complete list of flags and variable options for CLI commands, see the Command Line Reference.

  • Run the CreateDataFlow operation to create a data flow.