Hospital DB with 2 node RAC
| Question ID: 4058 | |
| Created By: | 2010-JUL-07 04:42:48 [Muddu5640] |
| Updated By: | 2010-JUL-07 21:05:43 [Vitaliy] |
| Status: | Open |
| Severity: | Normal |
| Read Only: | No |
|
11546
2010-JUL-07 04:42:48
|
||||
|
Hello, I have recently joined a company who supports the total IT infrastructure of a hospital. We have training, QA and prod environments. Oracle 11g acts as the database and RHEL 5.2 as the OS platform. 2 node RAC configured with Vertias Cluster as the front end app. The Db server is connected to a SAN-EMC cluster storage. I just want to gain some advice on the best way to monitor these databases and some good tips and tricks acting as a daily checklist so that the servers are always up and running without comprimising on the performance. We are not allowed to install OEM and hence no GUID or grid control GUI to monitor the dbs... :-( Please advice! Thanks Mohammed
|
11550
2010-JUL-07 21:04:03
|
||||
|
OEM / Grid control is what you really need here ... It will monitor the critical aspects of the database and clusterware. It's really difficult to rewrite OEM yourself especially just because someone decided OEM cannot be installed in this environment (WHY NOT?). If however you are inclined to roll your own then I would suggest the following events/things to monitor (get ready to become very familiar with ksh|bash/awk/sed/sqlplus -- you'll need it): ## Mandatory Infrastructure Events ## - RMAN backup logs - Database Instance Alert Logs for corruption and errors - Clusterware Logs - Critical DB Services UP/DOWN (tnsping + sqlplus to them) - Node Up/Down - Database UP/DOWN - Wait events in ASH - Wait events in AWR - Wait events in v$session_wait - Space issues in Tablespaces, OS level, ASM Groups ## Mandatory Application (in the Database) Events ## - SQL not using BIND variables - HIGH Sort usage - Table fragmentation - Fast growing segments - High REDO usage - High UNDO usage - Runaway processes/sessions - Reaching MAX sessions/procs - Segments with Next extent too big - Segments Reaching MAX extents - Unauthorized sessions connecting to production - Invalid Objects ## Optional if you are responsible for end-to-end solution ## - Periodically hit user facing entry point (i.e. HTTP app) - Middle-tier node UP/DOWN - Application server logs - HTTP server logs These are some of the things that come to mind right away looking at it from a distance ... HTH, - Vitaliy
[edited by: Vitaliy at 21:05 (CST) on Jul. 07, 2010]