溫馨提示×

您好,登錄后才能下訂單哦!

密碼登錄×
登錄注冊(cè)×
其他方式登錄
點(diǎn)擊 登錄注冊(cè) 即表示同意《億速云用戶服務(wù)條款》

oracle IO性能問題故障診斷案例

發(fā)布時(shí)間:2020-08-08 09:55:07 來源:ITPUB博客 閱讀:200 作者:達(dá)芬奇的夢(mèng) 欄目:關(guān)系型數(shù)據(jù)庫(kù)

 

一業(yè)務(wù)系統(tǒng)在白天業(yè)務(wù)時(shí)間出現(xiàn)了嚴(yán)重了IO性能問題,下面是下午業(yè)務(wù)高峰時(shí)間(3-5)的awr報(bào)告
oracle IO性能問題故障診斷案例

從等待事件來看主要都是與IO相關(guān)
oracle IO性能問題故障診斷案例

oracle IO性能問題故障診斷案例

oracle IO性能問題故障診斷案例

從上面可以看到除了幾個(gè)語句的邏輯讀很高,其實(shí)物理不是很高,每秒產(chǎn)生的重做日志以及物理讀也不高.

檢查磁盤IO

rx6600-1:[/]#sar -d 1 10

HP-UX rx6600-1 B.11.23 U ia64    07/15/14

16:18:45   device   %busy   avque   r+w/s  blks/s  avwait  avserv
16:18:46  c39t0d3  100.00    0.50      18    1130    0.00  132.11
          c41t0d3   83.50    0.50       6     450    0.00  290.78
16:18:47   c3t0d0    0.99    0.50       2      63    0.00    7.12
          c39t0d3   91.09    0.50      10     982    0.00  115.53
          c41t0d3  100.00    0.50      12     586    0.00  291.67
16:18:48   c3t0d0    3.03    0.50       2      32    0.00   15.93
          c39t0d3  100.00    0.50       9    1034    0.00  139.76
          c41t0d3   92.93    0.50       7     388    0.00  310.07
16:18:49   c3t0d0    2.00    0.50       4      64    0.00   19.59
          c39t0d3  100.00    0.50      12    1088    0.00  127.33
          c41t0d3   86.00    0.50       8     416    0.00  251.32
16:18:50   c3t0d0    1.01    0.50       1       2    0.00    8.99
          c39t0d3  100.00    0.50      16     954    0.00  117.10
          c41t0d3  100.00    0.50       9     614    0.00  295.52
16:18:51   c3t0d0    0.99    0.50       1       8    0.00   10.60
          c39t0d3   93.07    0.50      17     913    0.00  110.59
          c41t0d3  100.00    0.50       9     350    0.00  326.92
16:18:52  c39t0d3  100.00    0.50      21    1168    0.00  127.22
          c41t0d3   88.00    0.50      11     544    0.00  252.08
16:18:53   c3t0d0    2.02    0.50       3      48    0.00   18.51
          c39t0d3   88.89    0.50      19    1164    0.00   98.25
          c41t0d3  100.00    0.50      11     630    0.00  324.39
16:18:54   c3t0d0    3.00    0.50       3      20    0.00   12.39
          c39t0d3   95.00    0.50      20     954    0.00  131.90
          c41t0d3   81.00    0.50       9     610    0.00  289.05
16:18:55   c3t0d0    9.00    0.50      11     134    0.00    8.62
          c39t0d3  100.00    0.50      19    1090    0.00  137.20
          c41t0d3  100.00    0.50      11     512    0.00  327.16

Average   c39t0d3   99.50    0.50      16    1048    0.00  123.38
Average   c41t0d3  100.00    0.50       9     510    0.00  296.44
Average    c3t0d0    2.20    0.50       3      37    0.00   12.28
rx6600-1:[/]#sar -d 1 10

HP-UX rx6600-1 B.11.23 U ia64    07/15/14

16:20:04   device   %busy   avque   r+w/s  blks/s  avwait  avserv
16:20:05   c3t0d0    1.00    0.50       1      16    0.00    8.33
          c39t0d3   98.00    0.50      16     928    0.00  114.86
          c41t0d3   98.00    0.50      10     684    0.00  266.43
16:20:06   c3t0d0    1.98    0.50       4      81    0.00    8.57
          c39t0d3   93.07    0.50      19    1251    0.00  128.81
          c41t0d3   91.09    0.50       6     475    0.00  365.83
16:20:07   c3t0d0    2.00    0.50       3      48    0.00    5.87
          c39t0d3   98.00    0.50      23    1216    0.00  113.66
          c41t0d3   98.00    0.50       8     576    0.00  307.92
16:20:08   c3t0d0    1.00    0.50       2      32    0.00    5.36
          c39t0d3  100.00    0.50      21    1132    0.00  118.47
          c41t0d3  100.00    0.50       7     592    0.00  300.71
16:20:09   c3t0d0    6.00    0.58      13     194    2.22   26.05
          c39t0d3   89.00    0.50      17    1152    0.00  123.54
          c41t0d3   87.00    0.50       8     512    0.00  298.26
16:20:10   c3t0d0    3.00    0.50       6      96    0.00   22.78
          c39t0d3   85.00    0.50      17    1136    0.00  114.79
          c41t0d3   98.00    0.50       9     592    0.00  252.52
16:20:11   c3t0d0    1.00    0.50       1       2    0.00    8.04
          c39t0d3  100.00    0.50      17    1216    0.00  138.04
          c41t0d3  100.00    0.50      12     672    0.00  291.69
16:20:12   c3t0d0    2.00    0.50       3      34    0.00    9.24
          c39t0d3   99.00    0.50      16    1024    0.00  122.11
          c41t0d3   88.00    0.50       9     476    0.00  299.79
16:20:13  c39t0d3   91.00    0.50      18    1024    0.00  111.77
          c41t0d3   92.00    0.50       3     384    0.00  396.25
16:20:14  c39t0d3   99.00    0.50      17     892    0.00  132.15
          c41t0d3  100.00    0.50      10     608    0.00  233.54

Average    c3t0d0    1.80    0.53       3      50    0.87   17.64
Average   c39t0d3   96.00    0.50      18    1097    0.00  121.54
Average   c41t0d3  100.00    0.50       8     557    0.00  290.35

在業(yè)務(wù)人員下班后重啟的雙機(jī)軟件,但在啟動(dòng)數(shù)據(jù)庫(kù)時(shí)停在了Completed redo application這一步

SQL> startup
ORACLE instance started.

Total System Global Area 1.0318E+10 bytes
Fixed Size                  2073176 bytes
Variable Size            3238006184 bytes
Database Buffers         7063207936 bytes
Redo Buffers               14700544 bytes
Database mounted.

從alert.log文件中可以看到如下信息:

Tue Jul 15 22:23:29 2014
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Picked latch-free SCN scheme 3
Autotune of undo retention is turned on. 
IMODE=BR
ILAT =61
LICENSE_MAX_USERS = 0
SYS auditing is disabled
ksdpec: called for event 13740 prior to event group initialization
Starting up ORACLE RDBMS Version: 10.2.0.4.0.
System parameters with non-default values:
  processes                = 500
  sessions                 = 555
  __shared_pool_size       = 3154116608
  __large_pool_size        = 16777216
  __java_pool_size         = 33554432
  __streams_pool_size      = 33554432
  sga_target               = 10317987840
  control_files            = /sx_data/ORCL/control01.ctl, /sx_data/ORCL/control02.ctl, /sx_data/ORCL/control03.ctl
  db_block_size            = 8192
  __db_cache_size          = 7063207936
  compatible               = 10.2.0.3.0
  log_archive_dest_1       = LOCATION=/sx_data/arch_ORCL/
  log_archive_format       = %t_%s_%r.dbf
  db_file_multiblock_read_count= 16
  db_recovery_file_dest    = /oracle/flash_recovery_area
  db_recovery_file_dest_size= 2147483648
  undo_management          = AUTO
  undo_tablespace          = UNDOTBS1
  undo_retention           = 39600
  fast_start_parallel_rollback= FALSE
  remote_login_passwordfile= EXCLUSIVE
  db_domain                = 
  dispatchers              = (PROTOCOL=TCP) (SERVICE=ORCLXDB)
  local_listener           = ORCL
  job_queue_processes      = 10
  background_dump_dest     = /oracle/admin/ORCL/bdump
  user_dump_dest           = /oracle/admin/ORCL/udump
  core_dump_dest           = /oracle/admin/ORCL/cdump
  audit_file_dest          = /oracle/admin/ORCL/adump
  db_name                  = ORCL
  open_cursors             = 2000
  optimizer_index_cost_adj = 20
  optimizer_index_caching  = 90
  pga_aggregate_target     = 2576351232
PMON started with pid=2, OS id=13613
PSP0 started with pid=3, OS id=13615
MMAN started with pid=4, OS id=13617
DBW0 started with pid=5, OS id=13619
LGWR started with pid=6, OS id=13621
CKPT started with pid=7, OS id=13623
SMON started with pid=8, OS id=13625
RECO started with pid=9, OS id=13627
CJQ0 started with pid=10, OS id=13629
MMON started with pid=11, OS id=13631
Tue Jul 15 22:23:30 2014
starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
MMNL started with pid=12, OS id=13635
Tue Jul 15 22:23:30 2014
starting up 1 shared server(s) ...
Tue Jul 15 22:23:31 2014
ALTER DATABASE   MOUNT
Tue Jul 15 22:23:39 2014
Setting recovery target incarnation to 2
Tue Jul 15 22:23:42 2014
Successful mount of redo thread 1, with mount id 1380841571
Tue Jul 15 22:23:42 2014
Database mounted in Exclusive Mode
Completed: ALTER DATABASE   MOUNT
Tue Jul 15 22:23:42 2014
ALTER DATABASE OPEN
Tue Jul 15 22:23:47 2014
Beginning crash recovery of 1 threads
 parallel recovery started with 7 processes
Tue Jul 15 22:23:50 2014
Started redo scan
Tue Jul 15 22:23:52 2014
Completed redo scan
 336597 redo blocks read, 78835 data blocks need recovery
Tue Jul 15 22:23:52 2014
Started redo application at
 Thread 1: logseq 2270, block 29
Tue Jul 15 22:23:53 2014
Recovery of Online Redo Log: Thread 1 Group 4 Seq 2270 Reading mem 0
  Mem# 0: /sx_data/ORCL/redo04.log
Tue Jul 15 22:23:58 2014
Completed redo application

一直停在Completed redo application這,而這時(shí)的等待事件是checkpoint complete開始以為是并行恢復(fù)慢造成的,就查詢了v$transaction,v$fast_start_transactions但視圖中并沒有進(jìn)行恢復(fù)操作的事務(wù)存在.后來咨詢了老熊,老熊說檢查一下IO情況看是不是存儲(chǔ)出問題了,如是再次檢查存儲(chǔ)IO性能:

rx6600-1:[/]#sar 1 10

HP-UX rx6600-1 B.11.23 U ia64    07/15/14

22:36:41    %usr    %sys    %wio   %idle
22:36:42       2       2      12      84
22:36:43       1       0      12      87
22:36:44       0       0      17      83
22:36:45       0       0      13      87
22:36:46       0       1      12      87
22:36:47       2       1      13      84
22:36:48       1       1      16      82
22:36:49       0       0      12      88
22:36:50       0       0      12      88
22:36:51       0       0      22      78

Average        1       0      14      85

從上面可以看到現(xiàn)在實(shí)際上并沒有業(yè)務(wù)在跑居然還存在IO等待這是不正常的

rx6600-2:[/]#bdf
Filesystem          kbytes    used   avail %used Mounted on
/dev/vg00/lvol3     983040  422504  556176   43% /
/dev/vg00/lvol1    1835008  135048 1686776    7% /stand
/dev/vg00/lvol8    8912896 8535352  374824   96% /var
/dev/vg00/lvol7    7962624 2762312 5159704   35% /usr
/dev/vg00/lvol4     524288   83192  437784   16% /tmp
/dev/vg00/tmplv    2064384   93512 1847942    5% /oratmp
/dev/vg00/orasoft  10256384 3144652 6668390   32% /orasoft
/dev/vg00/oracle   20480000 5480497 14062042   28% /oracle
/dev/vg00/lvol6    9076736 5206384 3840128   58% /opt
/dev/vg00/lvol5     131072   25472  104824   20% /home
/dev/cwjcvg/cwjc_datalv
                   414973952 134188114 263239842   34% /cwjc_data
/dev/sxvg/sx_datalv
                   624689152 298665485 305654147   49% /sx_data

rx6600-2:[/]#time dd if=/dev/zero of=/var/test bs=8k count=100000

下面對(duì)小機(jī)自身的磁盤進(jìn)行IO測(cè)試寫800M的數(shù)據(jù)只要12秒左右

msgcnt 2 vxfs: mesg 001: vx_nospace - /dev/vg00/lvol8 file system full (1 block extent)
I/O error 
47185+0 records in
47184+1 records out

real       11.7
user        0.0
sys         0.8

但是對(duì)EMC存儲(chǔ)進(jìn)行IO測(cè)試寫800M的數(shù)據(jù)只要30多分還沒有完成

rx6600-2:[/]#time dd if=/dev/zero of=/sx_data/test bs=8k count=100000
711856+0 records in
711855+0 records out

real    30:58.4
user        0.5
sys        13.0

這明顯的是存儲(chǔ)出了問題,后面得知管理人員早上10點(diǎn)多發(fā)現(xiàn)了存儲(chǔ)有一個(gè)磁盤損壞了,存儲(chǔ)做的raid 5,有熱備盤.而且還有上百G的數(shù)據(jù)進(jìn)行存儲(chǔ)級(jí)的同步.與出現(xiàn)性能問題的時(shí)間一至。

到此問題原因找到了解決起來也就簡(jiǎn)單了.幸虧問題解決了,第二天有大領(lǐng)導(dǎo)來檢查要不就.......哈哈

向AI問一下細(xì)節(jié)

免責(zé)聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點(diǎn)不代表本網(wǎng)站立場(chǎng),如果涉及侵權(quán)請(qǐng)聯(lián)系站長(zhǎng)郵箱:is@yisu.com進(jìn)行舉報(bào),并提供相關(guān)證據(jù),一經(jīng)查實(shí),將立刻刪除涉嫌侵權(quán)內(nèi)容。

AI