Creating an OpenSearch Pipeline

Create an OpenSearch pipeline to ingest log data into an OpenSearch cluster with Data Prepper.

    1. On the Pipelines list page, select Create pipeline. If you need help finding the list page, see Listing OpenSearch Pipelines.

    2. On the Create Pipeline panel, enter the following information:
      • Pipeline Name: Enter the name of the pipeline.

      • Compartment: Select the compartment that contains the pipeline from the list.

      • Tags: Apply any tags to the pipeline you want. See Overview of Tagging.

    3. Under YAML, select the one of the following Source Type options:
      • Object Storage.

      • Kafka. Selecting this option requires you to provide networking configuration information later in the pipeline creation process.

    4. Select Generate YAML.

      A default schema is generated within the Pipeline YAML box. Here you can customize it to meet your needs.

      For Object Storage and source coordination YAML configurations, see Object Storage and Source Coordination YAML.

      For Kafka YAML configurations see Kafka YAML.

    5. Under Hardware Configuration, enter the following information:
      • Node count: Enter the number of nodes between 1 and 10.

      • OCPUs: Enter the number of OCPUs between 1 and 32.

      • Memory (GB): Enter the amount of member in gigabytes between 8 and 1024.

    6. (Kafka source type only) Under Network Configuration, note the following:

      This configuration is only required for the Private OCI Streaming service and Self-managed Kafka. For the Public OCI Streaming service, select none.

      If the source needs a reverse connection, then you must configure the OpenSearch pipelines with a reverse connection to pull the data from the data source residing in your subnet. The reverse connection provisioning is fully automatic.

      Enter the following information:

      • Virtual Cloud Network in <compartment>: Select the virtual cloud network (VCN) in the specified compartment you want from the list.

      • Subnet in <compartment>: Select the subnet to the VCN in the specified compartment you want from the list.

      • Network Security Group: Select the Network Security Group you want from the list.

      • Reverse Connection Endpoints: Enter the IP address of the endpoint you want to use for the reverse connection.

      • Domain Name: Enter the fully-qualified domain name (FQDN) of your source. For example:
        streaming.us-phoenix-1.oci.oraclecloud.com:9092
      Note

      This configuration is only required for Private OCI Streaming Service and Self managed Kafka. In-case of Public OCI Streaming Service, please select none.

    7. Select Dry Run to validate the provided configurations.

      After you have succeeded, you can continue with actual pipeline creation.

    8. Select Create.
  • Use the oci opensearch pipeline create command and required parameters to create a pipeline:

    oci opensearch pipeline create --compartment-id compartment_id --data-prepper-configuration-body yaml --display-name display_name 
    --subnet-id subnet_id --memory-gb memory_gb --node-count node_count --ocpu-count ocpu_count --pipeline-configuration-body yaml [OPTIONS]

    data-prepper-configuration-body is the data prepper configuration in YAML format. The command accepts the data prepper configuration as a string or within a .yaml file. If you provide the configuration as a string, each new line must be escaped with ".".

    pipeline-configuration-body is the pipeline configuration in YAML format. The command accepts the pipeline configuration as a string or within a .yaml file. If you provide the configuration as a string, each new line must be escaped with ".".

    For a complete list of parameters and values for CLI commands, see the CLI Command Reference.

  • Run the CreateOpensearchClusterPipeline operation to create a pipeline.