RAG Tool Object Storage Guidelines for Generative AI Agents

Review the following guidelines to prepare Object Storage data for RAG tools in Generative AI Agents.

General Guidelines

To prepare data for Generative AI Agents data sources in OCI Generative AI Agents follow these guidelines:

  • Data Sources: Data for Generative AI Agents must be uploaded as files to an Object Storage bucket.
  • Number of Buckets: Only one bucket is allowed per data source.
  • Supported File Types: Only PDF and txt files are supported.
  • File Size Limit: Each file must be no larger than 100 MB.
  • PDF Contents: PDF files can include images, charts, and reference tables but these must not exceed 8 MB.
  • Chart Preparation: No special preparation is needed for charts, as long as they're two-dimensional with labeled axes. The model can answer questions about the charts without explicit explanations.
  • Table Preparation: Use reference tables with several rows and columns. For example, the agent can read the table on the limits page.
  • URLs: All the hyperlinks present in the PDF documents are extracted and displayed as hyperlinks in the chat response.
  • Data Not Ready: If your data isn't yet available, create an empty folder for the data source and populate it later. This way, you can ingest data into the source after the folder is populated.
Getting Access

Set up the following Object Storage permissions before you proceed.

  • User access to Object Storage files
  • Data ingestion job access to Object Storage files for long-running jobs

See Getting Access for the permissions.

Enhancing Responses with Metadata Filtering

The metadata filtering feature aims to improve response quality by using filter conditions that you define, helping the model generate answers relevant to the content scope.

Review the following options to select one or more methods that works best for you.

Method Location Usage
Include metadata for all the files in a bucket without mentioning the file names. Create a _common.metadata.json file at the Object Storage root level. Use this file for metadata that's common to all files in the bucket. This method helps avoid entering metadata duplicates across objects.
In one file create a metadata entry for each file in a bucket and include the file names. Create an _all.metadata.json file at the Object Storage root level. Use this method if you have a lot of files and creating one file that includes all the file names is more convenient for you than creating one metadata file per file.
Create a metadata file for each file in a bucket. Create a <file-name>.metadata.json file for each file, at the file level. Use this method when metadata differs for each file and there aren't many files to create a metadata file for, or if you're automating the creation of the metadata files.
Add Object Storage metadata headers to each file. Add metadata header through each file'sObject Storage metadata property. Use this method, if you have few metadata properties to include. We recommend you use the other methods with JSON files, because files are easier to update and manage and metadata headers are difficult to update.

For all methods in the preceding table, you must define a metadata schema file called _metadata_schema.json at the Object Storage root level. Here's an example hierarchy of where you save the metadata files.

An image that shows hierarchy for metadata files in Object Storage. The bucket_root has the following files: _all.metadata.json, _common.metadata.json, _metadata_schema.json, file_0.pdf, file_0.pdf.metadata.json, folder_1, and folder_2. Then, folder_1 includes file_1.pdf and file_1.pdf.metadata.json and folder_2 includes file_2.pdf and file_2.pdf.metadata.json.

The following steps show how to format the metadata JSON files using examples.

  1. Create a metadata schema file called _metadata_schema.json and save it at the Object Storage root level. For example:
    {
        "metadataSchema": [
            {
                "name": "field_1",
                "type": "integer"
            },
            {
                "name": "field_2",
                "type": "string"
            },
            {
                "name": "field_3",
                "type": "list_of_string"
            },
            {
                "name": "field_4",
                "type": "double"
            }  
        ]
    }

    Allowed values are integer, string, list_of_string, and double.

    Schema example for integer:
    "publication_year": {
      "type": "integer"         
    }
    Schema example for string:
    "title": {
      "type": "text",
      "fields": {
          "keyword": {
              "type": "keyword"
          },
          "search_as_you_type": {
              "type": "search_as_you_type"
          }
       }
    }
    Schema example for list_of_string:
    "publishers": {
        "type": "text",
        "fields": {
        "keyword": {
            "type": "keyword"
        },
        "search_as_you_type": {
            "type": "search_as_you_type"
        }
      }
    }
    Schema example for double:
    "rating": {
      "type": "double"
    }
  2. (Optional) Create a JSON file called _commmon.metadata.json for metadata common to all files. For example:
    {
        "metadataAttributes": {
            "field_1": value_1,
            "field_2": value_2,
            "field_3": value_3,
            ......,
            "field_n": value_n
        }
    }
  3. (Optional) Create a JSON file called _all.metadata.json and in this file add metadata for each file in the bucket. For example:
    {
        "folder_1/file_1.pdf" : {
            "metadataAttributes": {
                "field_1": value_1,
                "field_2": value_2,
                "field_3": value_3,
                ......,
                "field_n": value_n
            }
        },
        "folder_2/file_2.pdf": {
            "metadataAttributes": {
                "field_1": value_1,
                "field_2": value_2,
                "field_3": value_3,
                ......,
                "field_n": value_n
            }
        }
    }
  4. (Optional) Create a JSON file called <file-name>.metadata.json for each file in the bucket and add the metadata separately in each file. For example:
    {
        "metadataAttributes": {
            "field_1": value_1,
            "field_2": value_2,
            "field_3": value_3,
            ......,
            "field_n": value_n
        }
    }
    Note

    You can't change or remove the metadata fields after the knowledge base data is ingested. You can add new fields up the allowed limit. To remove or update a field, re-create the knowledge base.

Description Limit
Maximum number of entries in _all.metadata.json 10,000
Maximum number of metadata fields that can be specified for each file 20
Maximum number of items in a list_of_string type 10
Maximum length of individual item in a list_of_string type 50
Maximum length of a metadata key in characters 25
Maximum length of metadata value in characters 50
Adding Metadata to an Object Storage Metadata Header
Create an Object Storage bucket and upload source files for RAG responses in OCI Generative AI Agents. Optionally, add a custom URL to each file for citation.
  1. In the navigation bar of the Console, select a region that hosts Generative AI Agents, for example, US Midwest (Chicago). If you don't know which region to select, see Regions with Generative AI Agents.
  2. Open the navigation menu  and select Storage. Under Object Storage & Archive Storage, select Buckets.
  3. Select the compartment in which you want to create a bucket or the compartment that contains the bucket that you want to use. You must already have the following permission to add Object Storage resources to this compartment.
    allow group <your-group-name> to manage object-family in compartment <compartment-with-bucket>
  4. To create a bucket follow these steps:
    1. Select Create Bucket.
    2. Enter a name unique to your region for the bucket.
    3. For other fields, select the Learn more links and then select options that apply to your data. Also see Creating an Object Storage Bucket.
    4. Select Create.
      By default, a new bucket is private. You can change the visibility of a bucket after you create it.
  5. Select the name of the bucket that you want to use.
  6. On the bucket details page, under Objects, select Upload.
  7. (Optional) Select Show Optional Headers and Metadata and then select and enter the following values.
    • Type: Metadata
    • Name: gaas-metadata-filtering-field-<metadata-name>
    • Value: <metadata-value>
    Important

    For the metadata filtering to work, you must use the prefix gaas-metadata-filtering-field- for the metadata Name.

    Object Storage then adds the prepends opc-meta- to the metadata name, so the header is displayed as opc-meta-gaas-metadata-filtering-field-<metadata-name>.

    For example, to add a metadata with the name publication_year, add a metadata header with the name gaas-metadata-filtering-field-publication_year. When you get the details for this file, the metadata name displays as opc-meta-gaas-metadata-filtering-field-publication_year.

    For list values, use the following format:

    _LIST_OF_STRING_|list_value_1|list_value_2, where _LIST_OF_STRING_ is fixed, and each list item is separated by a pipe '|' character. This format is decoded as a list of values: {list_value_1, list_value_2}

  8. Add one or more files for the data source and select Upload.
    Note

    If you add a metadata to an object in step 7, this metadata applies to all the files that you upload in this step. You can't update the metadata property of existing objects. Instead, you can copy a file, add a new metadata to that file, and then delete the old file. To add or update a file with existing metadata use the steps in Assigning a Custom URL to a Citation as guidelines and apply it to metadata filtering.
  9. Add one or more files for the data source and select Upload.
    Note

    • You can't update the metadata property of existing objects. Instead, you can copy a file, add a new metadata to that file, and then delete the old file.

    • You can add filters to your chat conversation with an agent using the metadata filtering after the knowledge base data fromObject Storage and its metadata are ingested. To learn about adding filters, see step 11 in Chatting with Agents in Generative AI Agents. You can also view details of metadata values after you ingest the data in a knowledge base. See the Metadata resource in Getting a Knowledge Base's Details in Generative AI Agents.
Adding Data with Custom URL to an Object Storage Bucket
Create an Object Storage bucket and upload source files for RAG responses in OCI Generative AI Agents. Optionally, add a custom URL to each file for citation.
  1. In the navigation bar of the Console, select a region that hosts Generative AI Agents, for example, US Midwest (Chicago). If you don't know which region to select, see Regions with Generative AI Agents.
  2. Open the navigation menu  and select Storage. Under Object Storage & Archive Storage, select Buckets.
  3. Select the compartment in which you want to create a bucket or the compartment that contains the bucket that you want to use. You must already have the following permission to add Object Storage resources to this compartment.
    allow group <your-group-name> to manage object-family in compartment <compartment-with-bucket>
  4. To create a bucket follow these steps:
    1. Select Create Bucket.
    2. Enter a name unique to your region for the bucket.
    3. For other fields, select the Learn more links and then select options that apply to your data. Also see Creating an Object Storage Bucket.
    4. Select Create.
      By default, a new bucket is private. You can change the visibility of a bucket after you create it.
  5. Select the name of the bucket that you want to use.
  6. On the bucket details page, under Objects, select Upload.
  7. (Optional) Select Show Optional Headers and Metadata and then select and enter the following values.
    • Type: Metadata
    • Name: customized_url_source
    • Value: <Custom-URL-for-the-file>
    Important

    For the citation link override to work, you must use Name: customized_url_source.
  8. Add one or more files for the data source and select Upload.
    Note

    If you added the customized_url_source metadata to an object in step 7, this custom URL applies to all the files that you upload for this object. You can't update the metadata property of existing objects. Instead, you can copy a file, add a new metadata to that file, and then delete the old file. To add or update a file with the customized_url_source metadata, using OCI CLI, see Assigning a Custom URL to a Citation.
Note

Beta Customers:

If you created a knowledge base in the Beta phase, you might need to delete and re-create the data source for the URL handling feature to work.

Assigning a Custom URL to a Citation
When an agent uses the RAG for its responses, you can get citations. By default, the citations point to Object Storage where the files are stored. To reference a URL instead of the file that's being referenced, you can add a custom URL to the metadata object for that file.

This topic shows how to add or update the metadata object through OCI CLI.

  1. Start OCI CLI in an environment or in Cloud Shell. We recommend that you try it in Cloud Shell first to become familiar with the commands.
  2. Get the object name for the file that you want to add a custom URL to:
    oci os object list --bucket-name <the-bucket-name> 
    --file <the-file-name>
    Example output:
    "data": [
        {
          "archival-state": null,
          "etag": "xxx",
          "md5": "xxx==",
          "name": "<the-object-name>",
          "size": 1117630,
          "storage-tier": "Standard",
          "time-created": "2025-03-12T22:21:26.991000+00:00",
          "time-modified": "2025-03-12T22:38:10.217000+00:00"
        },
    Other objects are listed similarly after this comma.

    You can also find the object name in the Console. In the bucket details page, select the Actions menu (Actions Menu) for the object, select View Object Details and copy the name.

    Note

    If a file is in a folder, then the file name and its object name differ. For example, for a file named file1.pdf, its object name could be folder1/file1.pdf. Otherwise, the file name and its object name are the same.
  3. Download the file into the current working directory.

    To add or update a file's metadata object, you replace the file with the same file that has a new metadata object. That's why you're copying the file into the current working directory first.

    oci os object get 
    --bucket-name <the-bucket-name> 
    --file <the-file-name>
    --name <the-object-name>
  4. Find the metadata object values for the current file.
    oci os object head 
    --bucket-name <the-bucket-name> 
    --name <the-object-name>
    Example output:
    {
     some data
    
      "opc-client-request-id": "xxx",
      "opc-meta-key1": "value1",
      "opc-meta-key2": "value2",
      "opc-request-id": "xxx",
     ...
    }
    

    This example shows that the metadata object value is '{"key1":"value1","key2":"value2"}'. The metadata name is saved with a prefix of opc-meta-, but you don't have to add this prefix when you add the metadata name in the next steps. This prefix is added automatically to each metadata name.

  5. Replace the file that's in Object Storage with the same file that's in the current working directory, and add a new metadata object.

    To keep the current metadata and add the custom URL name and values, '{"customized_url_source":"<the-custom-url>" to the metadata object:

    oci os object put 
    --bucket-name <the-bucket-name> 
    --file <the-file-name> 
    --name <the-object-name>
    --force --metadata 
    '{"customized_url_source":"<the-custom-url>",
    "<existing-metadata-name-1>":"<existing-metadata-value-1>"
    "<existing-metadata-name-2>":"<existing-metadata-value-2>"}'

    For example, to keep the metadata names and values displayed in the step 4 example:

    oci os object put 
    --bucket-name <the-bucket-name> 
    --file <the-file-name> 
    --name <the-object-name>
    --force --metadata 
    '{"customized_url_source":"<the-custom-url>",
    "key1":"value1",
    "key2":"value2"}'

    To replace the existing metadata object to only include the custom URL run the following command

    oci os object put 
    --bucket-name <the-bucket-name> 
    --file <the-file-name> 
    --name <the-object-name>
    --force --metadata '{"customized_url_source":"<the-custom-url>"}'
  6. Ensure that the metadata object for the custom URL is replaced.
    oci os object head 
    --bucket-name <the-bucket-name> 
    --name <the-object-name>
    Example output:
    {
     some data
    
      "opc-meta-customized_url_source": "some-new-link",
     ...
    }
    
Important

  • The metadata object that overrides the default citation must have the name, customized_url_source.
  • You can have one metadata object with the name, customized_url_source
  • Each customized_url_source can have only one URL.
  • The commands in step 5 works for both adding and updating the metadata object, because they replace the current metadata object's value.
  • Ensure that you pass the values for the --metadata object with the format shown in the commands in step 5.
Links