Adding Metadata to Object Storage Files for Search Filtering
You can add metadata to Object Storage files before syncing them to a vector store. Metadata helps improve retrieval by letting semantic and hybrid searches filter results by relevant attributes.
For example, you can add metadata such as publication year, title, topic, department, product area, or document type. After the files are synced, those metadata fields can be used to narrow search results to a specific content scope.
Metadata is optional. However, if you want metadata to be available for search filtering, add or update the metadata before you perform data sync. Metadata added after a sync isn’t included in that sync unless you run data sync again.
This topic applies to vector stores that sync unstructured data from Object Storage.
Add or update metadata before you perform the data sync. Metadata added after a sync isn’t included in that sync.
How Metadata Works
Metadata is defined as key-value pairs. To use metadata with Object Storage files, you first define the metadata fields in a schema file. Then, you associate files in the bucket with metadata values.
For all Object Storage metadata methods, you must create a metadata schema file named _metadata_schema.json at the root level of the Object Storage bucket. The schema defines the metadata keys that the service can expect and the value type for each key.
If the _metadata_schema.json file doesn’t exist, metadata isn’t calculated for files in the bucket.
Each metadata field has a name and a type. Supported metadata types are:
integerstringlist_of_stringdouble
Workflow Overview
Use the following workflow to prepare metadata before syncing files to a vector store:
- In a text editor, create a metadata schema file named
_metadata_schema.json. - Define the metadata fields and value types in JSON format.
- Upload
_metadata_schema.jsonto the root level of the Object Storage bucket that contains the files to sync. - Select how to associate metadata values with files:
- Apply common metadata to all files in the bucket.
- Define metadata for several files in one JSON file.
- Define metadata in a separate JSON file for each data file.
- Add metadata by using Object Storage metadata properties.
- Upload the metadata files to the correct location in the Object Storage bucket.
- Perform data sync for the vector store.
Metadata Schema Example
Create a metadata schema file named _metadata_schema.json and save it at the root level of the Object Storage bucket.
{
"metadataSchema": [
{
"name": "publication_year",
"type": "integer"
},
{
"name": "title",
"type": "string"
},
{
"name": "topic",
"type": "list_of_string"
},
{
"name": "rating",
"type": "double"
}
]
}
The metadata names that you use in metadata files must match the names defined in the schema.
Metadata Methods for Object Storage Files
The following table describes the supported methods for adding metadata to files in Object Storage, including where to create each metadata file or header and when to use each method.
| Method | File name and location | When to use |
|---|---|---|
| Define the metadata schema | Create _metadata_schema.json at the root level of the Object Storage bucket. |
Required for all Object Storage metadata file methods. The schema defines the supported metadata keys and value types. |
| Apply common metadata to all files in a bucket | Create _common.metadata.json at the root level of the Object Storage bucket. |
Use when the same metadata applies to all files in the bucket. This method avoids duplicating metadata across files. |
| Define metadata for several files in one JSON file | Create _all.metadata.json at the root level of the Object Storage bucket. |
Use when you have many files and prefer to manage metadata for all files in one JSON file instead of creating one metadata file per file. |
| Define metadata for one file | Create <file-name>.metadata.json at the same level as the corresponding data file. The <file-name>value must match the name of the data file. |
Use when metadata differs by file and you have a small number of files, or when you automate metadata file creation. |
| Add metadata as Object Storage headers | Add metadata by using each file’s Object Storage metadata properties. | Use only when you have a small number of metadata properties. JSON metadata files are recommended because they’re easier to update and manage. |
Metadata File Location Example
The following example shows where to save metadata files in an Object Storage bucket.
bucket_root/
_metadata_schema.json
_common.metadata.json
_all.metadata.json
file_0.pdf
file_0.pdf.metadata.json
folder_1/
file_1.pdf
file_1.pdf.metadata.json
folder_2/
file_2.pdf
file_2.pdf.metadata.json
For file-specific metadata, the metadata file must be saved at the same level as the corresponding data file.
For example, if the data file is saved as:
folder_1/file_1.pdf
the metadata file must be saved as:
folder_1/file_1.pdf.metadata.json
Metadata JSON File Examples
Common Metadata for All Files
Create _common.metadata.json at the root level of the bucket to apply the same metadata to all files in the bucket.
Example:
{
"metadataAttributes": {
"publication_year": 2020,
"topic": [
"cooking",
"health",
"gardening"
],
"rating": 3.3
}
}
Metadata for Several Files
Create _all.metadata.json at the root level of the bucket to define metadata for several files in one JSON file.
Example:
{
"folder_1/file_1.pdf": {
"metadataAttributes": {
"publication_year": 2020,
"title": "Healthy Cooking Guide",
"topic": [
"cooking",
"health"
],
"rating": 4.5
}
},
"folder_2/file_2.pdf": {
"metadataAttributes": {
"publication_year": 2022,
"title": "Gardening Basics",
"topic": [
"gardening"
],
"rating": 4.0
}
}
}
Metadata for One File
Create <file-name>.metadata.json at the same level as the corresponding data file.
For example, to define metadata for file_1.pdf, create a file named file_1.pdf.metadata.json.
Example:
{
"metadataAttributes": {
"publication_year": 2020,
"title": "Healthy Cooking Guide",
"topic": [
"cooking",
"health"
],
"rating": 4.5
}
}
Metadata Limits
The following limits apply to metadata used for search filtering.
| Description | Limit |
|---|---|
Maximum number of entries in _all.metadata.json |
10,000 |
| Maximum number of metadata fields that can be specified for each file | 20 |
Maximum number of items in a list_of_string type |
10 |
Maximum length of each item in a list_of_string type |
50 characters |
| Maximum length of a metadata key | 25 characters |
| Maximum length of a metadata value | 50 characters |
Using Metadata with Data Sync
Add the metadata schema and metadata files before you sync data to the vector store.
After the files are synced, the metadata is available for search filtering. If you add or change metadata after syncing, perform data sync again so that the updated metadata is included in the vector store.
To sync files from Object Storage, see Sync Data to a Vector Store.