DBA的操作系統(tǒng)內(nèi)核參數(shù)有哪些

發(fā)布時(shí)間：2021-11-05 16:48:57 來源：億速云閱讀：113 作者：柒染欄目：建站服務(wù)器

這篇文章將為大家詳細(xì)講解有關(guān)DBA的操作系統(tǒng)內(nèi)核參數(shù)有哪些，文章內(nèi)容質(zhì)量較高，因此小編分享給大家做個(gè)參考，希望大家閱讀完這篇文章后對(duì)相關(guān)知識(shí)有一定的了解。

DBA不可不知的操作系統(tǒng)內(nèi)核參數(shù)

背景

操作系統(tǒng)為了適應(yīng)更多的硬件環(huán)境，許多初始的設(shè)置值，寬容度都很高。

如果不經(jīng)調(diào)整，這些值可能無法適應(yīng)HPC，或者硬件稍好些的環(huán)境。

無法發(fā)揮更好的硬件性能，甚至可能影響某些應(yīng)用軟件的使用，特別是數(shù)據(jù)庫。

數(shù)據(jù)庫關(guān)心的OS內(nèi)核參數(shù)

512GB 內(nèi)存為例

參數(shù)

fs.aio-max-nr

支持系統(tǒng)

CentOS 6, 7

參數(shù)解釋

aio-nr & aio-max-nr:    
.  
aio-nr is the running total of the number of events specified on the    
io_setup system call for all currently active aio contexts.    
.  
If aio-nr reaches aio-max-nr then io_setup will fail with EAGAIN.    
.  
Note that raising aio-max-nr does not result in the pre-allocation or re-sizing    
of any kernel data structures.    
.  
aio-nr & aio-max-nr:    
.  
aio-nr shows the current system-wide number of asynchronous io requests.    
.  
aio-max-nr allows you to change the maximum value aio-nr can grow to.

推薦設(shè)置

fs.aio-max-nr = 1xxxxxx  
.  
PostgreSQL, Greenplum 均未使用io_setup創(chuàng)建aio contexts. 無需設(shè)置。    
如果Oracle數(shù)據(jù)庫，要使用aio的話，需要設(shè)置它。    
設(shè)置它也沒什么壞處，如果將來需要適應(yīng)異步IO，可以不需要重新修改這個(gè)設(shè)置。

參數(shù)

fs.file-max

支持系統(tǒng)

CentOS 6, 7

參數(shù)解釋

file-max & file-nr:    
.  
The value in file-max denotes the maximum number of file handles that the Linux kernel will allocate.   
.  
When you get lots of error messages about running out of file handles,   
you might want to increase this limit.    
.  
Historically, the kernel was able to allocate file handles dynamically,   
but not to free them again.     
.  
The three values in file-nr denote :      
the number of allocated file handles ,     
the number of allocated but unused file handles ,     
the maximum number of file handles.     
.  
Linux 2.6 always reports 0 as the number of free    
file handles -- this is not an error, it just means that the    
number of allocated file handles exactly matches the number of    
used file handles.    
.  
Attempts to allocate more file descriptors than file-max are reported with printk,   
look for "VFS: file-max limit <number> reached".

推薦設(shè)置

fs.file-max = 7xxxxxxx  
.  
PostgreSQL 有一套自己管理的VFS，真正打開的FD與內(nèi)核管理的文件打開關(guān)閉有一套映射的機(jī)制，所以真實(shí)情況不需要使用那么多的file handlers。     
max_files_per_process 參數(shù)。     
假設(shè)1GB內(nèi)存支撐100個(gè)連接，每個(gè)連接打開1000個(gè)文件，那么一個(gè)PG實(shí)例需要打開10萬個(gè)文件，一臺(tái)機(jī)器按512G內(nèi)存來算可以跑500個(gè)PG實(shí)例，則需要5000萬個(gè)file handler。     
以上設(shè)置綽綽有余。

參數(shù)

kernel.core_pattern

支持系統(tǒng)

CentOS 6, 7

參數(shù)解釋

core_pattern:    
.  
core_pattern is used to specify a core dumpfile pattern name.    
. max length 128 characters; default value is "core"    
. core_pattern is used as a pattern template for the output filename;    
  certain string patterns (beginning with '%') are substituted with    
  their actual values.    
. backward compatibility with core_uses_pid:    
        If core_pattern does not include "%p" (default does not)    
        and core_uses_pid is set, then .PID will be appended to    
        the filename.    
. corename format specifiers:    
        %<NUL>  '%' is dropped    
        %%      output one '%'    
        %p      pid    
        %P      global pid (init PID namespace)    
        %i      tid    
        %I      global tid (init PID namespace)    
        %u      uid    
        %g      gid    
        %d      dump mode, matches PR_SET_DUMPABLE and    
                /proc/sys/fs/suid_dumpable    
        %s      signal number    
        %t      UNIX time of dump    
        %h      hostname    
        %e      executable filename (may be shortened)    
        %E      executable path    
        %<OTHER> both are dropped    
. If the first character of the pattern is a '|', the kernel will treat    
  the rest of the pattern as a command to run.  The core dump will be    
  written to the standard input of that program instead of to a file.

推薦設(shè)置

kernel.core_pattern = /xxx/core_%e_%u_%t_%s.%p    
.  
這個(gè)目錄要777的權(quán)限，如果它是個(gè)軟鏈，則真實(shí)目錄需要777的權(quán)限  
mkdir /xxx  
chmod 777 /xxx  
留足夠的空間

參數(shù)

kernel.sem

支持系統(tǒng)

CentOS 6, 7

參數(shù)解釋

kernel.sem = 4096 2147483647 2147483646 512000    
.  
4096 每組多少信號(hào)量 (>=17, PostgreSQL 每16個(gè)進(jìn)程一組, 每組需要17個(gè)信號(hào)量) ,     
2147483647 總共多少信號(hào)量 (2^31-1 , 且大于4096*512000 ) ,     
2147483646 每個(gè)semop()調(diào)用支持多少操作 (2^31-1),     
512000 多少組信號(hào)量 (假設(shè)每GB支持100個(gè)連接, 512GB支持51200個(gè)連接, 加上其他進(jìn)程, > 51200*2/16 綽綽有余)     
.  
# sysctl -w kernel.sem="4096 2147483647 2147483646 512000"    
.  
# ipcs -s -l    
  ------ Semaphore Limits --------    
max number of arrays = 512000    
max semaphores per array = 4096    
max semaphores system wide = 2147483647    
max ops per semop call = 2147483646    
semaphore max value = 32767

推薦設(shè)置

kernel.sem = 4096 2147483647 2147483646 512000    
.  
4096可能能夠適合更多的場景, 所以大點(diǎn)無妨，關(guān)鍵是512000 arrays也夠了。

參數(shù)

kernel.shmall = 107374182    
kernel.shmmax = 274877906944    
kernel.shmmni = 819200

支持系統(tǒng)

CentOS 6, 7

參數(shù)解釋

假設(shè)主機(jī)內(nèi)存 512GB    
.  
shmmax 單個(gè)共享內(nèi)存段最大 256GB (主機(jī)內(nèi)存的一半，單位字節(jié))      
shmall 所有共享內(nèi)存段加起來最大 (主機(jī)內(nèi)存的80%，單位PAGE)      
shmmni 一共允許創(chuàng)建819200個(gè)共享內(nèi)存段 (每個(gè)數(shù)據(jù)庫啟動(dòng)需要2個(gè)共享內(nèi)存段。  將來允許動(dòng)態(tài)創(chuàng)建共享內(nèi)存段，可能需求量更大)     
.  
# getconf PAGE_SIZE    
4096

推薦設(shè)置

kernel.shmall = 107374182    
kernel.shmmax = 274877906944    
kernel.shmmni = 819200    
.  
9.2以及以前的版本，數(shù)據(jù)庫啟動(dòng)時(shí)，對(duì)共享內(nèi)存段的內(nèi)存需求非常大，需要考慮以下幾點(diǎn)  
Connections:	(1800 + 270 * max_locks_per_transaction) * max_connections  
Autovacuum workers:	(1800 + 270 * max_locks_per_transaction) * autovacuum_max_workers  
Prepared transactions:	(770 + 270 * max_locks_per_transaction) * max_prepared_transactions  
Shared disk buffers:	(block_size + 208) * shared_buffers  
WAL buffers:	(wal_block_size + 8) * wal_buffers  
Fixed space requirements:	770 kB  
.  
以上建議參數(shù)根據(jù)9.2以前的版本設(shè)置，后期的版本同樣適用。

參數(shù)

net.core.netdev_max_backlog

支持系統(tǒng)

CentOS 6, 7

參數(shù)解釋

netdev_max_backlog    
  ------------------    
Maximum number  of  packets,  queued  on  the  INPUT  side,    
when the interface receives packets faster than kernel can process them.

推薦設(shè)置

net.core.netdev_max_backlog=1xxxx    
.  
INPUT鏈表越長，處理耗費(fèi)越大，如果用了iptables管理的話，需要加大這個(gè)值。

參數(shù)

net.core.rmem_default  
net.core.rmem_max  
net.core.wmem_default  
net.core.wmem_max

支持系統(tǒng)

CentOS 6, 7

參數(shù)解釋

rmem_default    
  ------------    
The default setting of the socket receive buffer in bytes.    
.  
rmem_max    
  --------    
The maximum receive socket buffer size in bytes.    
.  
wmem_default    
  ------------    
The default setting (in bytes) of the socket send buffer.    
.  
wmem_max    
  --------    
The maximum send socket buffer size in bytes.

推薦設(shè)置

net.core.rmem_default = 262144    
net.core.rmem_max = 4194304    
net.core.wmem_default = 262144    
net.core.wmem_max = 4194304

參數(shù)

net.core.somaxconn

支持系統(tǒng)

CentOS 6, 7

參數(shù)解釋

somaxconn - INTEGER    
        Limit of socket listen() backlog, known in userspace as SOMAXCONN.    
        Defaults to 128.    
	See also tcp_max_syn_backlog for additional tuning for TCP sockets.

推薦設(shè)置

net.core.somaxconn=4xxx

參數(shù)

net.ipv4.tcp_max_syn_backlog

支持系統(tǒng)

CentOS 6, 7

參數(shù)解釋

tcp_max_syn_backlog - INTEGER    
        Maximal number of remembered connection requests, which have not    
        received an acknowledgment from connecting client.    
        The minimal value is 128 for low memory machines, and it will    
        increase in proportion to the memory of machine.    
        If server suffers from overload, try increasing this number.

推薦設(shè)置

net.ipv4.tcp_max_syn_backlog=4xxx    
pgpool-II 使用了這個(gè)值，用于將超過num_init_child以外的連接queue。     
所以這個(gè)值決定了有多少連接可以在隊(duì)列里面等待。

10.

參數(shù)

net.ipv4.tcp_keepalive_intvl=20    
net.ipv4.tcp_keepalive_probes=3    
net.ipv4.tcp_keepalive_time=60

支持系統(tǒng)

CentOS 6, 7

參數(shù)解釋

tcp_keepalive_time - INTEGER    
        How often TCP sends out keepalive messages when keepalive is enabled.    
        Default: 2hours.    
.  
tcp_keepalive_probes - INTEGER    
        How many keepalive probes TCP sends out, until it decides that the    
        connection is broken. Default value: 9.    
.  
tcp_keepalive_intvl - INTEGER    
        How frequently the probes are send out. Multiplied by    
        tcp_keepalive_probes it is time to kill not responding connection,    
        after probes started. Default value: 75sec i.e. connection    
        will be aborted after ~11 minutes of retries.

推薦設(shè)置

net.ipv4.tcp_keepalive_intvl=20    
net.ipv4.tcp_keepalive_probes=3    
net.ipv4.tcp_keepalive_time=60    
.  
連接空閑60秒后, 每隔20秒發(fā)心跳包, 嘗試3次心跳包沒有響應(yīng)，關(guān)閉連接。 從開始空閑，到關(guān)閉連接總共歷時(shí)120秒。

11.

參數(shù)

net.ipv4.tcp_mem=8388608 12582912 16777216

支持系統(tǒng)

CentOS 6, 7

參數(shù)解釋

tcp_mem - vector of 3 INTEGERs: min, pressure, max    
單位 page    
        min: below this number of pages TCP is not bothered about its    
        memory appetite.    
.  
        pressure: when amount of memory allocated by TCP exceeds this number    
        of pages, TCP moderates its memory consumption and enters memory    
        pressure mode, which is exited when memory consumption falls    
        under "min".    
.  
        max: number of pages allowed for queueing by all TCP sockets.    
.  
        Defaults are calculated at boot time from amount of available    
        memory.    
64GB 內(nèi)存，自動(dòng)計(jì)算的值是這樣的    
net.ipv4.tcp_mem = 1539615      2052821 3079230    
.  
512GB 內(nèi)存，自動(dòng)計(jì)算得到的值是這樣的    
net.ipv4.tcp_mem = 49621632     66162176        99243264    
.  
這個(gè)參數(shù)讓操作系統(tǒng)啟動(dòng)時(shí)自動(dòng)計(jì)算，問題也不大

推薦設(shè)置

net.ipv4.tcp_mem=8388608 12582912 16777216    
.  
這個(gè)參數(shù)讓操作系統(tǒng)啟動(dòng)時(shí)自動(dòng)計(jì)算，問題也不大

12.

參數(shù)

net.ipv4.tcp_fin_timeout

支持系統(tǒng)

CentOS 6, 7

參數(shù)解釋

tcp_fin_timeout - INTEGER    
        The length of time an orphaned (no longer referenced by any    
        application) connection will remain in the FIN_WAIT_2 state    
        before it is aborted at the local end.  While a perfectly    
        valid "receive only" state for an un-orphaned connection, an    
        orphaned connection in FIN_WAIT_2 state could otherwise wait    
        forever for the remote to close its end of the connection.    
        Cf. tcp_max_orphans    
        Default: 60 seconds

推薦設(shè)置

net.ipv4.tcp_fin_timeout=5    
.  
加快僵尸連接回收速度

13.

參數(shù)

net.ipv4.tcp_synack_retries

支持系統(tǒng)

CentOS 6, 7

參數(shù)解釋

tcp_synack_retries - INTEGER    
        Number of times SYNACKs for a passive TCP connection attempt will    
        be retransmitted. Should not be higher than 255. Default value    
        is 5, which corresponds to 31seconds till the last retransmission    
        with the current initial RTO of 1second. With this the final timeout    
        for a passive TCP connection will happen after 63seconds.

推薦設(shè)置

net.ipv4.tcp_synack_retries=2    
.  
縮短tcp syncack超時(shí)時(shí)間

14.

參數(shù)

net.ipv4.tcp_syncookies

支持系統(tǒng)

CentOS 6, 7

參數(shù)解釋

tcp_syncookies - BOOLEAN    
        Only valid when the kernel was compiled with CONFIG_SYN_COOKIES    
        Send out syncookies when the syn backlog queue of a socket    
        overflows. This is to prevent against the common 'SYN flood attack'    
        Default: 1    
.  
        Note, that syncookies is fallback facility.    
        It MUST NOT be used to help highly loaded servers to stand    
        against legal connection rate. If you see SYN flood warnings    
        in your logs, but investigation shows that they occur    
        because of overload with legal connections, you should tune    
        another parameters until this warning disappear.    
        See: tcp_max_syn_backlog, tcp_synack_retries, tcp_abort_on_overflow.    
.  
        syncookies seriously violate TCP protocol, do not allow    
        to use TCP extensions, can result in serious degradation    
        of some services (f.e. SMTP relaying), visible not by you,    
        but your clients and relays, contacting you. While you see    
        SYN flood warnings in logs not being really flooded, your server    
        is seriously misconfigured.    
.  
        If you want to test which effects syncookies have to your    
        network connections you can set this knob to 2 to enable    
        unconditionally generation of syncookies.

推薦設(shè)置

net.ipv4.tcp_syncookies=1    
.  
防止syn flood攻擊

15.

參數(shù)

net.ipv4.tcp_timestamps

支持系統(tǒng)

CentOS 6, 7

參數(shù)解釋

tcp_timestamps - BOOLEAN    
        Enable timestamps as defined in RFC1323.

推薦設(shè)置

net.ipv4.tcp_timestamps=1    
.  
tcp_timestamps 是 tcp 協(xié)議中的一個(gè)擴(kuò)展項(xiàng)，通過時(shí)間戳的方式來檢測過來的包以防止 PAWS(Protect Against Wrapped  Sequence numbers)，可以提高 tcp 的性能。

16.

參數(shù)

net.ipv4.tcp_tw_recycle  
net.ipv4.tcp_tw_reuse  
net.ipv4.tcp_max_tw_buckets

支持系統(tǒng)

CentOS 6, 7

參數(shù)解釋

tcp_tw_recycle - BOOLEAN    
        Enable fast recycling TIME-WAIT sockets. Default value is 0.    
        It should not be changed without advice/request of technical    
        experts.    
.  
tcp_tw_reuse - BOOLEAN    
        Allow to reuse TIME-WAIT sockets for new connections when it is    
        safe from protocol viewpoint. Default value is 0.    
        It should not be changed without advice/request of technical    
        experts.    
.  
tcp_max_tw_buckets - INTEGER  
        Maximal number of timewait sockets held by system simultaneously.  
        If this number is exceeded time-wait socket is immediately destroyed  
        and warning is printed.   
	This limit exists only to prevent simple DoS attacks,   
	you _must_ not lower the limit artificially,   
        but rather increase it (probably, after increasing installed memory),    
        if network conditions require more than default value.

推薦設(shè)置

net.ipv4.tcp_tw_recycle=0    
net.ipv4.tcp_tw_reuse=1    
net.ipv4.tcp_max_tw_buckets = 2xxxxx    
.  
net.ipv4.tcp_tw_recycle和net.ipv4.tcp_timestamps不建議同時(shí)開啟

17.

參數(shù)

net.ipv4.tcp_rmem  
net.ipv4.tcp_wmem

支持系統(tǒng)

CentOS 6, 7

參數(shù)解釋

tcp_wmem - vector of 3 INTEGERs: min, default, max    
        min: Amount of memory reserved for send buffers for TCP sockets.    
        Each TCP socket has rights to use it due to fact of its birth.    
        Default: 1 page    
.  
        default: initial size of send buffer used by TCP sockets.  This    
        value overrides net.core.wmem_default used by other protocols.    
        It is usually lower than net.core.wmem_default.    
        Default: 16K    
.  
        max: Maximal amount of memory allowed for automatically tuned    
        send buffers for TCP sockets. This value does not override    
        net.core.wmem_max.  Calling setsockopt() with SO_SNDBUF disables    
        automatic tuning of that socket's send buffer size, in which case    
        this value is ignored.    
        Default: between 64K and 4MB, depending on RAM size.    
.  
tcp_rmem - vector of 3 INTEGERs: min, default, max    
        min: Minimal size of receive buffer used by TCP sockets.    
        It is guaranteed to each TCP socket, even under moderate memory    
        pressure.    
        Default: 1 page    
.  
        default: initial size of receive buffer used by TCP sockets.    
        This value overrides net.core.rmem_default used by other protocols.    
        Default: 87380 bytes. This value results in window of 65535 with    
        default setting of tcp_adv_win_scale and tcp_app_win:0 and a bit    
        less for default tcp_app_win. See below about these variables.    
.  
        max: maximal size of receive buffer allowed for automatically    
        selected receiver buffers for TCP socket. This value does not override    
        net.core.rmem_max.  Calling setsockopt() with SO_RCVBUF disables    
        automatic tuning of that socket's receive buffer size, in which    
        case this value is ignored.    
        Default: between 87380B and 6MB, depending on RAM size.

推薦設(shè)置

net.ipv4.tcp_rmem=8192 87380 16777216    
net.ipv4.tcp_wmem=8192 65536 16777216    
.  
許多數(shù)據(jù)庫的推薦設(shè)置，提高網(wǎng)絡(luò)性能

18.

參數(shù)

net.nf_conntrack_max  
net.netfilter.nf_conntrack_max

支持系統(tǒng)

CentOS 6

參數(shù)解釋

nf_conntrack_max - INTEGER    
        Size of connection tracking table.    
	Default value is nf_conntrack_buckets value * 4.

推薦設(shè)置

net.nf_conntrack_max=1xxxxxx    
net.netfilter.nf_conntrack_max=1xxxxxx

19.

參數(shù)

vm.dirty_background_bytes   
vm.dirty_expire_centisecs   
vm.dirty_ratio   
vm.dirty_writeback_centisecs

支持系統(tǒng)

CentOS 6, 7

參數(shù)解釋

==============================================================    
.  
dirty_background_bytes    
.  
Contains the amount of dirty memory at which the background kernel    
flusher threads will start writeback.    
.  
Note: dirty_background_bytes is the counterpart of dirty_background_ratio. Only    
one of them may be specified at a time. When one sysctl is written it is    
immediately taken into account to evaluate the dirty memory limits and the    
other appears as 0 when read.    
.  
==============================================================    
.  
dirty_background_ratio    
.  
Contains, as a percentage of total system memory, the number of pages at which    
the background kernel flusher threads will start writing out dirty data.    
.  
==============================================================    
.  
dirty_bytes    
.  
Contains the amount of dirty memory at which a process generating disk writes    
will itself start writeback.    
.  
Note: dirty_bytes is the counterpart of dirty_ratio. Only one of them may be    
specified at a time. When one sysctl is written it is immediately taken into    
account to evaluate the dirty memory limits and the other appears as 0 when    
read.    
.  
Note: the minimum value allowed for dirty_bytes is two pages (in bytes); any    
value lower than this limit will be ignored and the old configuration will be    
retained.    
.  
==============================================================    
.  
dirty_expire_centisecs    
.  
This tunable is used to define when dirty data is old enough to be eligible    
for writeout by the kernel flusher threads.  It is expressed in 100'ths    
of a second.  Data which has been dirty in-memory for longer than this    
interval will be written out next time a flusher thread wakes up.    
.  
==============================================================    
.  
dirty_ratio    
.  
Contains, as a percentage of total system memory, the number of pages at which    
a process which is generating disk writes will itself start writing out dirty    
data.    
.  
==============================================================    
.  
dirty_writeback_centisecs    
.  
The kernel flusher threads will periodically wake up and write `old' data    
out to disk.  This tunable expresses the interval between those wakeups, in    
100'ths of a second.    
.  
Setting this to zero disables periodic writeback altogether.    
.  
==============================================================

推薦設(shè)置

vm.dirty_background_bytes = 4096000000    
vm.dirty_expire_centisecs = 6000    
vm.dirty_ratio = 80    
vm.dirty_writeback_centisecs = 50    
.  
減少數(shù)據(jù)庫進(jìn)程刷臟頁的頻率，dirty_background_bytes根據(jù)實(shí)際IOPS能力以及內(nèi)存大小設(shè)置

20.

參數(shù)

vm.extra_free_kbytes

支持系統(tǒng)

CentOS 6

參數(shù)解釋

extra_free_kbytes    
.  
This parameter tells the VM to keep extra free memory   
between the threshold where background reclaim (kswapd) kicks in,   
and the threshold where direct reclaim (by allocating processes) kicks in.    
.  
This is useful for workloads that require low latency memory allocations    
and have a bounded burstiness in memory allocations,   
for example a realtime application that receives and transmits network traffic    
(causing in-kernel memory allocations) with a maximum total message burst    
size of 200MB may need 200MB of extra free memory to avoid direct reclaim    
related latencies.    
.  
目標(biāo)是盡量讓后臺(tái)進(jìn)程回收內(nèi)存，比用戶進(jìn)程提早多少kbytes回收，因此用戶進(jìn)程可以快速分配內(nèi)存。

推薦設(shè)置

vm.extra_free_kbytes=4xxxxxx

21.

參數(shù)

vm.min_free_kbytes

支持系統(tǒng)

CentOS 6, 7

參數(shù)解釋

min_free_kbytes:    
.  
This is used to force the Linux VM to keep a minimum number    
of kilobytes free.  The VM uses this number to compute a    
watermark[WMARK_MIN] value for each lowmem zone in the system.    
Each lowmem zone gets a number of reserved free pages based    
proportionally on its size.    
.  
Some minimal amount of memory is needed to satisfy PF_MEMALLOC    
allocations; if you set this to lower than 1024KB, your system will    
become subtly broken, and prone to deadlock under high loads.    
.  
Setting this too high will OOM your machine instantly.

推薦設(shè)置

vm.min_free_kbytes = 2xxxxxx     # vm.min_free_kbytes 建議每32G內(nèi)存分配1G vm.min_free_kbytes
.  
防止在高負(fù)載時(shí)系統(tǒng)無響應(yīng)，減少內(nèi)存分配死鎖概率。

22.

參數(shù)

vm.mmap_min_addr

支持系統(tǒng)

CentOS 6, 7

參數(shù)解釋

mmap_min_addr    
.  
This file indicates the amount of address space  which a user process will    
be restricted from mmapping.  Since kernel null dereference bugs could    
accidentally operate based on the information in the first couple of pages    
of memory userspace processes should not be allowed to write to them.  By    
default this value is set to 0 and no protections will be enforced by the    
security module.  Setting this value to something like 64k will allow the    
vast majority of applications to work correctly and provide defense in depth    
against future potential kernel bugs.

推薦設(shè)置

vm.mmap_min_addr=6xxxx    
.  
防止內(nèi)核隱藏的BUG導(dǎo)致的問題

23.

參數(shù)

vm.overcommit_memory   
vm.overcommit_ratio

支持系統(tǒng)

CentOS 6, 7

參數(shù)解釋

==============================================================    
.  
overcommit_kbytes:    
.  
When overcommit_memory is set to 2, the committed address space is not    
permitted to exceed swap plus this amount of physical RAM. See below.    
.  
Note: overcommit_kbytes is the counterpart of overcommit_ratio. Only one    
of them may be specified at a time. Setting one disables the other (which    
then appears as 0 when read).    
.  
==============================================================    
.  
overcommit_memory:    
.  
This value contains a flag that enables memory overcommitment.    
.  
When this flag is 0,   
the kernel attempts to estimate the amount    
of free memory left when userspace requests more memory.    
.  
When this flag is 1,   
the kernel pretends there is always enough memory until it actually runs out.    
.  
When this flag is 2,   
the kernel uses a "never overcommit"    
policy that attempts to prevent any overcommit of memory.    
Note that user_reserve_kbytes affects this policy.    
.  
This feature can be very useful because there are a lot of    
programs that malloc() huge amounts of memory "just-in-case"    
and don't use much of it.    
.  
The default value is 0.    
.  
See Documentation/vm/overcommit-accounting and    
security/commoncap.c::cap_vm_enough_memory() for more information.    
.  
==============================================================    
.  
overcommit_ratio:    
.  
When overcommit_memory is set to 2,   
the committed address space is not permitted to exceed   
      swap + this percentage of physical RAM.    
See above.    
.  
==============================================================

推薦設(shè)置

vm.overcommit_memory = 0    
vm.overcommit_ratio = 90    
.  
vm.overcommit_memory = 0 時(shí) vm.overcommit_ratio可以不設(shè)置

24.

參數(shù)

vm.swappiness

支持系統(tǒng)

CentOS 6, 7

參數(shù)解釋

swappiness    
.  
This control is used to define how aggressive the kernel will swap    
memory pages.    
Higher values will increase agressiveness, lower values    
decrease the amount of swap.    
.  
The default value is 60.

推薦設(shè)置

vm.swappiness = 0

25.

參數(shù)

vm.zone_reclaim_mode

支持系統(tǒng)

CentOS 6, 7

參數(shù)解釋

zone_reclaim_mode:    
.  
Zone_reclaim_mode allows someone to set more or less aggressive approaches to    
reclaim memory when a zone runs out of memory. If it is set to zero then no    
zone reclaim occurs. Allocations will be satisfied from other zones / nodes    
in the system.    
.  
This is value ORed together of    
.  
1       = Zone reclaim on    
2       = Zone reclaim writes dirty pages out    
4       = Zone reclaim swaps pages    
.  
zone_reclaim_mode is disabled by default.  For file servers or workloads    
that benefit from having their data cached, zone_reclaim_mode should be    
left disabled as the caching effect is likely to be more important than    
data locality.    
.  
zone_reclaim may be enabled if it's known that the workload is partitioned    
such that each partition fits within a NUMA node and that accessing remote    
memory would cause a measurable performance reduction.  The page allocator    
will then reclaim easily reusable pages (those page cache pages that are    
currently not used) before allocating off node pages.    
.  
Allowing zone reclaim to write out pages stops processes that are    
writing large amounts of data from dirtying pages on other nodes. Zone    
reclaim will write out dirty pages if a zone fills up and so effectively    
throttle the process. This may decrease the performance of a single process    
since it cannot use all of system memory to buffer the outgoing writes    
anymore but it preserve the memory on other nodes so that the performance    
of other processes running on other nodes will not be affected.    
.  
Allowing regular swap effectively restricts allocations to the local    
node unless explicitly overridden by memory policies or cpuset    
configurations.

推薦設(shè)置

vm.zone_reclaim_mode=0    
.  
不使用NUMA

26.

參數(shù)

net.ipv4.ip_local_port_range

支持系統(tǒng)

CentOS 6, 7

參數(shù)解釋

ip_local_port_range - 2 INTEGERS  
        Defines the local port range that is used by TCP and UDP to  
        choose the local port. The first number is the first, the  
        second the last local port number. The default values are  
        32768 and 61000 respectively.  
.  
ip_local_reserved_ports - list of comma separated ranges  
        Specify the ports which are reserved for known third-party  
        applications. These ports will not be used by automatic port  
        assignments (e.g. when calling connect() or bind() with port  
        number 0). Explicit port allocation behavior is unchanged.  
.  
        The format used for both input and output is a comma separated  
        list of ranges (e.g. "1,2-4,10-10" for ports 1, 2, 3, 4 and  
        10). Writing to the file will clear all previously reserved  
        ports and update the current list with the one given in the  
        input.  
.  
        Note that ip_local_port_range and ip_local_reserved_ports  
        settings are independent and both are considered by the kernel  
        when determining which ports are available for automatic port  
        assignments.  
.  
        You can reserve ports which are not in the current  
        ip_local_port_range, e.g.:  
.  
        $ cat /proc/sys/net/ipv4/ip_local_port_range  
        32000   61000  
        $ cat /proc/sys/net/ipv4/ip_local_reserved_ports  
        8080,9148  
.  
        although this is redundant. However such a setting is useful  
        if later the port range is changed to a value that will  
        include the reserved ports.  
.  
        Default: Empty

推薦設(shè)置

net.ipv4.ip_local_port_range=40000 65535    
.  
限制本地動(dòng)態(tài)端口分配范圍，防止占用監(jiān)聽端口。

27.

參數(shù)

  vm.nr_hugepages

支持系統(tǒng)

CentOS 6, 7

參數(shù)解釋

==============================================================  
nr_hugepages  
Change the minimum size of the hugepage pool.  
See Documentation/vm/hugetlbpage.txt  
==============================================================  
nr_overcommit_hugepages  
Change the maximum size of the hugepage pool. The maximum is  
nr_hugepages + nr_overcommit_hugepages.  
See Documentation/vm/hugetlbpage.txt  
.  
The output of "cat /proc/meminfo" will include lines like:  
......  
HugePages_Total: vvv  
HugePages_Free:  www  
HugePages_Rsvd:  xxx  
HugePages_Surp:  yyy  
Hugepagesize:    zzz kB  
.  
where:  
HugePages_Total is the size of the pool of huge pages.  
HugePages_Free  is the number of huge pages in the pool that are not yet  
                allocated.  
HugePages_Rsvd  is short for "reserved," and is the number of huge pages for  
                which a commitment to allocate from the pool has been made,  
                but no allocation has yet been made.  Reserved huge pages  
                guarantee that an application will be able to allocate a  
                huge page from the pool of huge pages at fault time.  
HugePages_Surp  is short for "surplus," and is the number of huge pages in  
                the pool above the value in /proc/sys/vm/nr_hugepages. The  
                maximum number of surplus huge pages is controlled by  
                /proc/sys/vm/nr_overcommit_hugepages.  
.  
/proc/filesystems should also show a filesystem of type "hugetlbfs" configured  
in the kernel.  
.  
/proc/sys/vm/nr_hugepages indicates the current number of "persistent" huge  
pages in the kernel's huge page pool.  "Persistent" huge pages will be  
returned to the huge page pool when freed by a task.  A user with root  
privileges can dynamically allocate more or free some persistent huge pages  
by increasing or decreasing the value of 'nr_hugepages'.

推薦設(shè)置

如果要使用PostgreSQL的huge page，建議設(shè)置它。    
大于數(shù)據(jù)庫需要的共享內(nèi)存即可。

28.

參數(shù)

  fs.nr_open

支持系統(tǒng)

CentOS 6, 7

參數(shù)解釋

nr_open:
This denotes the maximum number of file-handles a process can
allocate. Default value is 1024*1024 (1048576) which should be
enough for most machines. Actual limit depends on RLIMIT_NOFILE
resource limit.
它還影響security/limits.conf 的文件句柄限制，單個(gè)進(jìn)程的打開句柄不能大于fs.nr_open，所以要加大文件句柄限制，首先要加大nr_open

推薦設(shè)置

對(duì)于有很多對(duì)象（表、視圖、索引、序列、物化視圖等）的PostgreSQL數(shù)據(jù)庫，建議設(shè)置為2000萬，
例如fs.nr_open=20480000

數(shù)據(jù)庫關(guān)心的資源限制

1. 通過/etc/security/limits.conf設(shè)置，或者ulimit設(shè)置

2. 通過/proc/$pid/limits查看當(dāng)前進(jìn)程的設(shè)置

#        - core - limits the core file size (KB)  
#        - memlock - max locked-in-memory address space (KB)  
#        - nofile - max number of open files  建議設(shè)置為1000萬 , 但是必須設(shè)置sysctl, fs.nr_open大于它，否則會(huì)導(dǎo)致系統(tǒng)無法登陸。
#        - nproc - max number of processes  
以上四個(gè)是非常關(guān)心的配置  
....  
#        - data - max data size (KB)  
#        - fsize - maximum filesize (KB)  
#        - rss - max resident set size (KB)  
#        - stack - max stack size (KB)  
#        - cpu - max CPU time (MIN)  
#        - as - address space limit (KB)  
#        - maxlogins - max number of logins for this user  
#        - maxsyslogins - max number of logins on the system  
#        - priority - the priority to run user process with  
#        - locks - max number of file locks the user can hold  
#        - sigpending - max number of pending signals  
#        - msgqueue - max memory used by POSIX message queues (bytes)  
#        - nice - max nice priority allowed to raise to values: [-20, 19]  
#        - rtprio - max realtime priority

數(shù)據(jù)庫關(guān)心的IO調(diào)度規(guī)則

1. 目前操作系統(tǒng)支持的IO調(diào)度策略包括cfq, deadline, noop 等。

/kernel-doc-xxx/Documentation/block  
-r--r--r-- 1 root root   674 Apr  8 16:33 00-INDEX  
-r--r--r-- 1 root root 55006 Apr  8 16:33 biodoc.txt  
-r--r--r-- 1 root root   618 Apr  8 16:33 capability.txt  
-r--r--r-- 1 root root 12791 Apr  8 16:33 cfq-iosched.txt  
-r--r--r-- 1 root root 13815 Apr  8 16:33 data-integrity.txt  
-r--r--r-- 1 root root  2841 Apr  8 16:33 deadline-iosched.txt  
-r--r--r-- 1 root root  4713 Apr  8 16:33 ioprio.txt  
-r--r--r-- 1 root root  2535 Apr  8 16:33 null_blk.txt  
-r--r--r-- 1 root root  4896 Apr  8 16:33 queue-sysfs.txt  
-r--r--r-- 1 root root  2075 Apr  8 16:33 request.txt  
-r--r--r-- 1 root root  3272 Apr  8 16:33 stat.txt  
-r--r--r-- 1 root root  1414 Apr  8 16:33 switching-sched.txt  
-r--r--r-- 1 root root  3916 Apr  8 16:33 writeback_cache_control.txt

如果你要詳細(xì)了解這些調(diào)度策略的規(guī)則，可以查看WIKI或者看內(nèi)核文檔。

從這里可以看到它的調(diào)度策略

cat /sys/block/vdb/queue/scheduler   
noop [deadline] cfq

修改

echo deadline > /sys/block/hda/queue/scheduler

或者修改啟動(dòng)參數(shù)

grub.conf  
elevator=deadline

從很多測試結(jié)果來看，數(shù)據(jù)庫使用deadline調(diào)度，性能會(huì)更穩(wěn)定一些。

關(guān)于DBA的操作系統(tǒng)內(nèi)核參數(shù)有哪些就分享到這里了，希望以上內(nèi)容可以對(duì)大家有一定的幫助，可以學(xué)到更多知識(shí)。如果覺得文章不錯(cuò)，可以把它分享出去讓更多的人看到。

向AI問一下細(xì)節(jié)

DBA的操作系統(tǒng)內(nèi)核參數(shù)有哪些

DBA不可不知的操作系統(tǒng)內(nèi)核參數(shù)

背景

數(shù)據(jù)庫關(guān)心的OS內(nèi)核參數(shù)

數(shù)據(jù)庫關(guān)心的資源限制

數(shù)據(jù)庫關(guān)心的IO調(diào)度規(guī)則

猜你喜歡

最新資訊

相關(guān)推薦

相關(guān)標(biāo)簽