Monitor and Troubleshoot Synthetic Data Generation

When generating large amounts of data across many tables, Select AI splits synthetic data generation tasks into smaller chunks and runs tasks in parallel. The status of each chunk is tracked in the SYNTHETIC_DATA$<operation_id>_STATUS table.

Synthetic data generation operations are logged in the tables DBA_LOAD_OPERATIONS and USER_LOAD_OPERATIONS. Use these tables to monitor the DBMS_CLOUD_AI.GENERATE_SYNTHETIC_DATA operation. See Track DBMS_CLOUD Load Operations for more details.

After running synthetic data generation in a given session, you can get the latest <operation_id> from USER_LOAD_OPERATION using the following:
SELECT max(id) FROM user_load_operations;

To view the synthetic data operations running in a different session, use the DBA_LOAD_OPERATIONS view.

View Status of Synthetic Data Generation

The status table of synthetic data generation operation shows the progress of each table and its corresponding chunk. The STATUS_TABLE column in USER_LOAD_OPERATIONS or DBA_LOAD_OPERATIONS shows the status table name. The table name is SYNTHETIC_DATA$<operation_id>_STATUS and it has following columns:

Name Datatype Description

ID

NUMBER

Unique identifier of the record.

NAME

VARCHAR2

Qualified name of the table, such as "ADB_USER"."EMPLOYEES"

BYTES

NUMBER

Desired number of records for this data generation task

ROWS_LOADED

NUMBER

Actual number of records generated.

CHECKSUM

VARCHAR2

Starting value for the primary key during this data generation task.

LAST_MODIFIED

TIMESTAMP WITH TIME ZONE

Timestamp indicating when the record was last modified.

STATUS

VARCHAR2

Status of the data generation task. The valid values are:
  • COMPLETED
  • FAILED
  • PENDING
  • SKIPPED

ERROR_CODE

NUMBER

Error code, if the data generation task fails.

ERROR_MESSAGE

VARCHAR2

Error message provided if the task fails.

END_TIME

TIMESTAMP WITH TIME ZONE

Timestamp marking the end of the data generation task.
Example: Check the Number of Records Generated for Each Table

To check the number of records generated for each table, issue the following:

SELECT name, SUM(rows_loaded) FROM synthetic_data$<operation_id>_status group by name;

Query ROWS_LOADED to confirm how many number of rows are loaded for each chunk, and SUM(ROWS_LOADED) for rows for each table.

BEGIN
    DBMS_CLOUD_AI.GENERATE_SYNTHETIC_DATA(
        profile_name => 'GENAI',
        object_list => '[{"owner": "ADB_USER", "name": "DIRECTOR","record_count":150},
                         {"owner": "ADB_USER", "name": "MOVIE_ACTOR","record_count":300},
                         {"owner": "ADB_USER", "name": "CLASSES", "additional_prompt":"all in fall semester","record_count":5},
                         {"owner": "ADB_USER", "name": "ACTOR","record_count":220},
                         {"owner": "ADB_USER", "name": "MOVIE","record_count":50}]'
    );
END;
/
 
 
-- Check loaded rows for each chunk
SQL> SELECT name, rows_loaded FROM synthetic_data$141_status order by name;
NAME                           ROWS_LOADED
------------------------------------------
"ADB_USER"."ACTOR"                        188
"ADB_USER"."ACTOR"                         32
"ADB_USER"."CLASSES"                        5
"ADB_USER"."DIRECTOR"                     150
"ADB_USER"."MOVIE"                         50
"ADB_USER"."MOVIE_ACTOR"                   38
"ADB_USER"."MOVIE_ACTOR"                  114
"ADB_USER"."MOVIE_ACTOR"                  148
 
 
 
-- Check loaded rows for each table
SQL> SELECT name, SUM(rows_loaded) FROM synthetic_data$141_status group by name;
 
NAME                      SUM(ROWS_LOADED)
------------------------------------------
"ADB_USER"."DIRECTOR"                     150
"ADB_USER"."MOVIE_ACTOR"                  300
"ADB_USER"."CLASSES"                        5
"ADB_USER"."ACTOR"                        220
"ADB_USER"."MOVIE"                         50