Monitor a Storage Server Grid
You can monitor the aggregated status, performance, and other details for all the storage servers in a storage server grid on the Storage Server grid details page.
To go to the Storage Server grid details page:
- Go to the Exadata infrastructure details page.
- In the Storage Server grid section, click the
storage server grid display name.
The Storage Server grid details page is displayed.
The Storage Server grid details page displays charts that provide a visual representation of key storage performance statistics. Here are some specific scenarios where the Exadata storage-specific performance statistics can be leveraged:
- Uneven workload on cells or disks: If a storage server or disk is performing more work than the other storage servers or disks, it may cause performance issues. The storage statistics such as IOPS, throughput, and queue time include cell server statistics, which break down IOPS, throughput, and latencies by type of I/O (read or write) and size of I/O (small or large). The analysis of the metrics in the Performance section help spot outliers when comparing storage servers or disks.
- Noisy neighbor: In an Exadata Database machine where several databases have been consolidated, it's important to identify the databases that could be consuming a significant amount of the I/O bandwidth, thus impacting other databases. You can obtain a holistic view of I/O distribution across databases and other such details in the Top consumers section.
- Configuration differences: The configuration differences across the storage servers could potentially contribute to performance issues. The configuration issues could be differences in flash cache or flash log sizes, or differences in the number of cell disks or grid disks in use. You can monitor the configuration details in the Configuration section.
The performance tuning methodology for Exadata Database machines does not change with the introduction of Exadata storage server metrics. You should first take a look at DB time and find the top consumers of DB time to address performance issues. Only when it has been determined that there may be I/O issues, should you start looking at the storage server metrics. The storage server metrics are not meant to replace but complement existing tools and functionalities such as Performance Hub, ADDM, and SQL Monitoring.
On the Storage Server grid details page, you can:
- Click View connectors to view the connectors used to connect to the storage servers in the storage server grid. In the View connectors panel, click the display name of the connector to go to the External connector details page and view connector details. For information, see View Storage Server Connector Details.
- Click Add tags to add tags to the storage server grid. For information, see Working with Resource Tags.
- View Storage Server grid information, which includes details such as the OCID of the storage server grid, the name of the Exadata Infrastructure, compartment, and the status of the storage servers in the storage server grid. In the Storage Server grid information section, you can also perform tasks such as going to the associated Exadata infrastructure details page, monitoring the total number of open alarms and the number of alarms by severity for the storage servers, and monitoring storage server alerts. Note that the alarms are configured in the Oracle Cloud Infrastructure Monitoring service and the alerts represent events of importance occurring within the storage server and typically indicate that the storage server is compromised or in the danger of failure.
- Click the Tags tab to add, view, edit, or remove tags. For information, see Working with Resource Tags.
- Monitor the storage servers in the storage server grid in the
Summary section for the time period selected in the
Time period drop-down list:
- Monitoring status timeline: Displays the monitoring status of the storage servers during the selected period of time.
- Performance: Displays the number of I/O operations in queue and the response time to I/O read and write operations. The information in these charts is categorized by flash disks and hard disks.
- Exadata Storage Server resource usage: Displays the total CPU, Storage allocated, Memory, and IOPS usage for all the storage servers in the storage server grid.
- I/O activity: Displays the I/O
performance metrics for the storage servers categorized by flash disks and
hard disks. The charts provide an overview of:
- The average I/O utilization across flash disks and hard disks.
- The total I/O requests across flash disks and hard disks, categorized by small read, small write, large read, and large write operations.
- The total I/O throughput across flash disks and hard disks, categorized by small read, small write, large read, and large write operations.
- Capacity: Displays the capacity details
of the different storage types available in the storage servers. The charts
provide an overview of:
- The capacity information for ASM disk groups.
- The total available capacity for flash disks and hard disks and information on how the capacity is allocated.
- The flash cache space usage information for keep objects, non-keep objects, and unused space.
The Summary section is displayed by default on the Storage Server grid details page, however, you can click one of the other options on the left pane under Resources to perform the following tasks:
- Performance: Monitor the performance of the
storage servers by various parameters for the time period selected in the
Time period drop-down list. The metrics displayed in this
section can be utilized to view historic trends and verify that the I/O operations
are not exceeding limitations of current hardware configurations. They are also
useful in determining the root cause for performance issues pertaining to
storage-related bottlenecks.
You can click the icon adjacent to the name of a performance metric chart to hide the chart and only view the charts that are of interest.
The Performance section has the following tabs:
- Flash and hard disk: Displays the
Average per disk data for various I/O performance
metrics for flash and hard disks. You can select the
Total option in the Show
drop-down list to view the total performance information across all disks
for some of these metrics. In addition, you can select the Show
small and large requests check box to further categorize the
data in the graphs by small and large I/O requests, and select the
Show maximum disk limit check box to view the
maximum disk limit.
The following I/O performance metric charts on this tab are categorized by Flash disk and Hard disk:
- Average I/O latency (ms/request): The average time taken in milliseconds to perform an I/O request.
- Average I/O utilization (%): The average I/O utilization.
- Average I/O requests (IOPS) or Total I/O requests (IOPS): The average or total number of I/O requests per second.
- Average I/O throughput (MB/s) or Total I/O throughput (MB/s): The average or total data that's read or written per second.
- Average I/O queue or Total I/O queue: The average or total number of pending I/O requests.
- Flash cache: Displays cache performance
metric charts in which values are averaged across all the cell disks in the
storage server grid.
The following metric charts are categorized by Small requests and Large requests:
- Read requests (IOPS): The number of I/O read requests to flash cache.
- Read throughput (MB/s): The I/O throughput reading blocks from flash cache.
- Read hit ratio (%) for small requests: The flash cache read hit ratio for small requests. If there's no activity, then the default value for hit rate is 100% .
- Read throughput redirected to disk (MB/s) for large requests: The I/O throughput reading scan data from disk, because not all the data is available in flash cache.
In addition to the performance metric charts for Small requests and Large requests, the following metric charts are displayed on the Flash cache tab:
- Write requests (IOPS): The I/O requests per second that write data into flash cache. You can click the options available in the legend to only view I/O write requests by first write, overwrite, disk write, or population write.
- Keep overwrite (MB/s): The megabytes per second pushed out of flash cache because of the space limit for keep objects.
- Read keep hit ratio (%): The flash cache read hit ratio for keep objects. If there's no activity, then the default value for hit rate is 100% .
- Read keep hits and misses: The flash cache read hits and misses for keep objects. You can click the options available in the legend to only view keep misses or keep hits.
- Flash log: Displays the following flash
log performance metric charts in which values are averaged across all the
cell disks in the storage server grid:
- Efficiency (%): The smart flash logging efficiency expressed as a percentage.
- Skipped writes: The number of flash log write operations that were skipped. You can click the options available in the legend to only view the number of writes that were skipped due to the lack of available buffer, slow disk, or the size of the data being greater than available space.
- Actual outliers: The number of redo write operations exceeding the outlier threshold.
- Writes serviced: The number of write operations that were serviced.
- Prevented outliers: The number of redo write operations prevented from exceeding the outlier threshold.
- Write throughput (MB/s): The smart flash log write throughput.
- CPU: Displays the following CPU metric
charts:
- CPU utilization (%): The percentage of CPU utilization in the storage servers.
- Recent SQLs with most CPU activity: The list of SQL statements with the most CPU activity.
- Flash and hard disk: Displays the
Average per disk data for various I/O performance
metrics for flash and hard disks. You can select the
Total option in the Show
drop-down list to view the total performance information across all disks
for some of these metrics. In addition, you can select the Show
small and large requests check box to further categorize the
data in the graphs by small and large I/O requests, and select the
Show maximum disk limit check box to view the
maximum disk limit.
- Top consumers: View information pertaining to
the top consumers across all storage servers in the storage server grid. The metrics
displayed in this section can be utilized to find resource-intensive databases that
may be degrading performance for other databases utilizing the same shared storage
hardware. The ability to pinpoint these resources allows administrators to isolate
the activity to the hardware or review operations on the databases to reduce I/O
throughput.
The Top consumers section has the following tabs:
- Summary: Displays a summary of the top
databases associated with each storage server and bar charts that display
the number of requests, throughput, and IORM queue time. In addition, this
tab displays the following usage metric charts:
- I/O utilization (%): The top consumer databases by I/O utilization, which is categorized by flash disk and hard disk utilization.
- I/O requests and throughput: The top consumer databases by I/O requests and throughput.
- I/O service time per request - Hard disk: The top consumer databases by I/O time per request for hard disks, categorized by I/O response time and IORM wait time.
- I/O service time per request - Flash disk: The top consumer databases by I/O time per request for flash disks, categorized by I/O response time and IORM wait time.
- Details: Displays the details of all the
top consumer databases categorized by Hard disk and
Flash disk, however, you can select a particular
database in the Database drop-down list to view
information for an individual top consumer database. You can also click a
single database in the legend to view details for a single top consumer
database.
- Average IORM wait time: The top consumer databases for average IORM wait time per request.
- I/O: The top consumer databases by I/O utilization, I/O requests, and I/O throughput.
In addition to the top consumer database charts categorized by Hard disk and Flash disk, the IORM objective chart is displayed, which provides a historical view of the IORM objective or the optimization mode for IORM.
- Flash cache: Displays the flash cache
space usage metrics for the top consumer databases.
- Current flash cache space usage for top 5 databases: The current flash cache space usage details for the top five consumer databases. You can click the options available in the legend to only view certain flash cache space usage details such as Current size or Soft limit in the chart.
- Historical flash cache space usage: The historical statistics specific to flash cache space usage by top consumer databases. This chart displays usage information for all databases by default, however, you can select an option in the Database drop-down list to view information for an individual top consumer database.
- Summary: Displays a summary of the top
databases associated with each storage server and bar charts that display
the number of requests, throughput, and IORM queue time. In addition, this
tab displays the following usage metric charts:
- Storage Servers: Monitor the lifecycle state,
status, and resource usage details such as storage utilization and allocation. You
can monitor a fine-grained performance summary of the flash disks and hard disks in
the storage server grid and the storage server alerts, which represent events of
importance occurring within the storage servers. In addition, I/O
activity breakdown displays the I/O activity breakdown and disk
utilization by flash disk and hard disk across all storage servers. By default, all
I/O activity is displayed, however, you have the option of selecting a specific I/O
performance metric in the Column group drop-down list to
filter the information. For example, if you select the I/O
latency option, the I/O latency for small read, small write, large
read, and large write operations is displayed.
The information provided in this section enables you to narrow the scope when investigating performance issues with storage servers and pinpoint hot spots of I/O if it's isolated to a single storage server. Also, it helps infer if the intermittent slowness is the result of issues pertaining to a single storage server that is causing average latency to dip for the environment.
You can click the icon adjacent to Flash disk or Hard disk to view the individual storage servers and monitor I/O activity at the storage server level.
- Disks: Monitor the I/O activity breakdown and
disk utilization for flash disks and hard disks. By default, all I/O activity is
displayed, however, you have the option of selecting a specific I/O performance
metric in the Column group drop-down list to filter the
information.
The information provided in this section enables you to further reduce the scope from storage servers to individual components of the storage server, and check if a single disk is close to failure causing multiple issues for operations within the storage server.
Click the icon adjacent to Flash disk or Hard disk to view the individual disks and monitor I/O activity at the individual disk level. - Configuration: View the configuration information
for the storage servers.
- Storage server : View the storage server
details on the following tabs:
- Storage server model: Displays the model, CPU cores, and memory of the storage servers.
- Storage server version: Displays the cell, RPM, release, and kernel versions of the storage servers.
- IORM objective: Displays the IORM objective value or the optimization mode for the storage servers. For information on valid IORM objective values, see About IORM Objective.
- Storage: View the total storage information, which includes flash cache and flash log usage, and the number of hard disks, flash disks, and grid disks. Click the icon adjacent to Total to view the storage information for individual storage servers.
- Disks: View disk configuration information categorized by Cell disks and Grid disks. View cell disk and grid disk information, which includes the number of disks, disk type, disk size, and total size.
- Storage server : View the storage server
details on the following tabs: