Docker對(duì)JVM的限制有哪些

發(fā)布時(shí)間：2021-10-23 16:25:35 來(lái)源：億速云閱讀：139 作者：柒染欄目：云計(jì)算

今天就跟大家聊聊有關(guān)Docker對(duì)JVM的限制有哪些，可能很多人都不太了解，為了讓大家更加了解，小編給大家總結(jié)了以下內(nèi)容，希望大家根據(jù)這篇文章可以有所收獲。

首先說(shuō)一個(gè)老生常談的限制：我們?cè)趯?duì)Docker中的Java應(yīng)用使用諸如jmap等命令時(shí)常常會(huì)報(bào)錯(cuò)：

Can't attach to the process: ptrace(PTRACE_ATTACH, ..).

這個(gè)主要是因?yàn)橄駄stack、jmap等工具主要是通過(guò)兩種方式來(lái)實(shí)現(xiàn)的:

Attach機(jī)制（也可以叫做Vitural Machine.attach()，主要是用通過(guò)Socket 與目標(biāo)JVM的Attach Listener線程進(jìn)行交互。
Serviceability Agent(其實(shí)也是一種Attach，在Linux中要靠系統(tǒng)調(diào)用ptrace來(lái)實(shí)現(xiàn)).

而 Docker 自 1.10 版本開(kāi)始，默認(rèn)的 seccomp 配置文件中禁用了 ptrace，所以一些通過(guò)SA進(jìn)行的操作如：jmap -heap就會(huì)報(bào)錯(cuò)，而Docker官方也給出了解決方法：

使用–cap-add=SYS_PTRACE明確添加指定功能：[docker run --cap-add=SYS_PTRACE ...]
關(guān)閉 seccomp /將ptrace添加到允許的名單中：docker run --security-opt seccomp:unconfined ...

除了這個(gè)限制，前一段時(shí)間我在翻JDK的JDK BUG SYSTEM的時(shí)候無(wú)意間發(fā)現(xiàn)了這么一個(gè)Bug:JDK-8140793

getAvailableProcessors may incorrectly report the number of cpus in Docker container

BUG大致描述的現(xiàn)象是，Java在Docker容器中運(yùn)行時(shí)，獲取到的CPU的數(shù)目可能是不正確的。

Docker大家都知道是依托于Cgroups和Namespace的，而Cgroups 是一種 Linux 內(nèi)核功能，可以限制和隔離進(jìn)程的資源使用情況（CPU、內(nèi)存、磁盤(pán) I/O、網(wǎng)絡(luò)等），所以我猜可能是JVM在運(yùn)行時(shí)并沒(méi)有讀取到Docker使用Cgroups進(jìn)行的限制.

繼續(xù)查看這個(gè)BUG，發(fā)現(xiàn)狀態(tài)是RESOLVED，于是繼續(xù)翻找，在官方的Blog中發(fā)現(xiàn)了這么一篇文章

：《Java SE support for Docker CPU and memory limits》（文章關(guān)聯(lián)了反應(yīng)Docker中CPU計(jì)算出錯(cuò)的JDK-8140793、Docker中內(nèi)存限制的增強(qiáng)JDK-8170888、容器檢測(cè)和資源配置使用率增強(qiáng)的JDK-8146115）.

文章中提到在JDK8u121之前的版本中（Java SE 8u121 and earlier），JVM讀取的CPU數(shù)以及內(nèi)存等都是不受到Cgroups限制的數(shù)據(jù)，那么這么做又會(huì)出現(xiàn)什么問(wèn)題呢？據(jù)我所知，在我們不顯式的指明一些參數(shù)的時(shí)候，往往會(huì)用到JVM讀取的數(shù)據(jù)做一些默認(rèn)的配置。比如如果不顯式的指定 -XX:ParallelGCThreads and -XX:CICompilerCount，那么JVM就會(huì)根據(jù)讀到的CPU數(shù)目進(jìn)行計(jì)算來(lái)設(shè)置數(shù)值，如在計(jì)算Parallel GC的Threads數(shù)目的地方runtime\vm_version.cpp（以下基于openJDK1.8 b120）：

if (FLAG_IS_DEFAULT(ParallelGCThreads)) {
    assert(ParallelGCThreads == 0, "Default ParallelGCThreads is not 0");// For very large machines, there are diminishing returns// for large numbers of worker threads.  Instead of// hogging the whole system, use a fraction of the workers for every// processor after the first 8.  For example, on a 72 cpu machine// and a chosen fraction of 5/8// use 8 + (72 - 8) * (5/8) == 48 worker threads.unsigned int ncpus = (unsigned int) os::active_processor_count();return (ncpus <= switch_pt) ?
           ncpus :
          (switch_pt + ((ncpus - switch_pt) * num) / den);
  } else {return ParallelGCThreads;
  }

進(jìn)入到獲取CPU數(shù)目的os::active_processor_count()(linux實(shí)現(xiàn)os_linux.cpp)

int os::active_processor_count() {  // Linux doesn't yet have a (official) notion of processor sets,
  // so just return the number of online processors.
  int online_cpus = ::sysconf(_SC_NPROCESSORS_ONLN);
  assert(online_cpus > 0 && online_cpus <= processor_count(), "sanity check");  return online_cpus;
}

我們發(fā)現(xiàn)確實(shí)是通過(guò)::sysconf(_SC_NPROCESSORS_ONLN)來(lái)讀取的物理機(jī)的CPU，如此看來(lái)GC的線程數(shù)目的計(jì)算就會(huì)出現(xiàn)一定的問(wèn)題,同理JIT compiler threads也會(huì)遇到同樣的問(wèn)題。

而除了CPU的讀取會(huì)出錯(cuò)，內(nèi)存也是如此，我們?cè)诓伙@式的指定一些參數(shù)時(shí)如-Xmx（MaxHeapSize）、-Xms（InitialHeapSize）時(shí)，JVM會(huì)根據(jù)它讀取到的機(jī)器的內(nèi)存大小做一些默認(rèn)的設(shè)置如：

void Arguments::set_heap_size() {  if (!FLAG_IS_DEFAULT(DefaultMaxRAMFraction)) {// Deprecated flagFLAG_SET_CMDLINE(uintx, MaxRAMFraction, DefaultMaxRAMFraction);
  }  const julong phys_mem =
    FLAG_IS_DEFAULT(MaxRAM) ? MIN2(os::physical_memory(), (julong)MaxRAM)
                            : (julong)MaxRAM;  // If the maximum heap size has not been set with -Xmx,
  // then set it as fraction of the size of physical memory,
  // respecting the maximum and minimum sizes of the heap.
  if (FLAG_IS_DEFAULT(MaxHeapSize)) {
    julong reasonable_max = phys_mem / MaxRAMFraction;if (phys_mem <= MaxHeapSize * MinRAMFraction) {      // Small physical memory, so use a minimum fraction of it for the heap  reasonable_max = phys_mem / MinRAMFraction;
    } 
    .
    .
    .
    .
  }
}

其中讀取內(nèi)存的os::physical_memory()讀取也是physical memory，而這在Docker中運(yùn)行可能引發(fā)一系列的錯(cuò)誤比如被OOMKiller給殺掉（參考）.

可見(jiàn)當(dāng)我們使用一些比較老的JDK8版本時(shí)，如果我們沒(méi)有顯式指定一些參數(shù)可能會(huì)遇到一些稀奇古怪的問(wèn)題，我在JDK-8146115中發(fā)現(xiàn)此對(duì)Docker支付的增強(qiáng)已經(jīng)在JDK10中實(shí)現(xiàn)了，使用-XX:+UseContainerSupport可以開(kāi)啟容器支持，而且這一增強(qiáng)已經(jīng)被backport到了JDK8的一些新版本中（JDK8u131之后的版本）.

我下載了新版本的OpenJDK8，翻閱源碼發(fā)現(xiàn)Oracle果然做了相應(yīng)的處理.

原先os::active_processor_count()變成了：

// Determine the active processor count from one of// three different sources://// 1. User option -XX:ActiveProcessorCount// 2. kernel os calls (sched_getaffinity or sysconf(_SC_NPROCESSORS_ONLN)// 3. extracted from cgroup cpu subsystem (shares and quotas)//// Option 1, if specified, will always override.// If the cgroup subsystem is active and configured, we// will return the min of the cgroup and option 2 results.// This is required since tools, such as numactl, that// alter cpu affinity do not update cgroup subsystem// cpuset configuration files.int os::active_processor_count() {  // User has overridden the number of active processors
  if (ActiveProcessorCount > 0) {if (PrintActiveCpus) {
      tty->print_cr("active_processor_count: ""active processor count set by user : %d",
                    ActiveProcessorCount);
    }return ActiveProcessorCount;
  }  int active_cpus;  if (OSContainer::is_containerized()) {
    active_cpus = OSContainer::active_processor_count();if (PrintActiveCpus) {
      tty->print_cr("active_processor_count: determined by OSContainer: %d",
                     active_cpus);
    }
  } else {
    active_cpus = os::Linux::active_processor_count();
  }  return active_cpus;
}

可以清晰的看到，如果有-XX:ActiveProcessorCount參數(shù)則使用參數(shù)，如果沒(méi)有就會(huì)去OSContainer::is_containerized()判斷是否容器化：

inline bool OSContainer::is_containerized() {
  assert(_is_initialized, "OSContainer not initialized");  return _is_containerized;
}

而_is_containerized是由Threads::create_vm調(diào)用OSContainer::init()時(shí)檢查虛擬機(jī)是否運(yùn)行在容器中得來(lái)的（具體方法太長(zhǎng)了）：

/* init
 *
 * Initialize the container support and determine if
 * we are running under cgroup control.
 */void OSContainer::init() {  int mountid;  int parentid;  int major;  int minor;
  FILE *mntinfo = NULL;
  FILE *cgroup = NULL;  char buf[MAXPATHLEN+1];  char tmproot[MAXPATHLEN+1];  char tmpmount[MAXPATHLEN+1];  char tmpbase[MAXPATHLEN+1];  char *p;
  jlong mem_limit;

  assert(!_is_initialized, "Initializing OSContainer more than once");

  _is_initialized = true;
  _is_containerized = false;

  _unlimited_memory = (LONG_MAX / os::vm_page_size()) * os::vm_page_size();  if (PrintContainerInfo) {
    tty->print_cr("OSContainer::init: Initializing Container Support");
  }  if (!UseContainerSupport) {if (PrintContainerInfo) {
      tty->print_cr("Container Support not enabled");
    }return;
  }

  ...........

  _is_containerized = true;

}

方法就是對(duì)一些地方做了檢查，如UseContainerSupport參數(shù)是否開(kāi)啟、/proc/self/mountinfo、/proc/self/cgroup是否可讀等等，如果判斷JVM運(yùn)行在容器中，那么就會(huì)調(diào)用OSContainer::active_processor_count()獲取容器限制的CPU數(shù)目：

/* active_processor_count
 *
 * Calculate an appropriate number of active processors for the
 * VM to use based on these three inputs.
 *
 * cpu affinity
 * cgroup cpu quota & cpu period
 * cgroup cpu shares
 *
 * Algorithm:
 *
 * Determine the number of available CPUs from sched_getaffinity
 *
 * If user specified a quota (quota != -1), calculate the number of
 * required CPUs by dividing quota by period.
 *
 * If shares are in effect (shares != -1), calculate the number
 * of CPUs required for the shares by dividing the share value
 * by PER_CPU_SHARES.
 *
 * All results of division are rounded up to the next whole number.
 *
 * If neither shares or quotas have been specified, return the
 * number of active processors in the system.
 *
 * If both shares and quotas have been specified, the results are
 * based on the flag PreferContainerQuotaForCPUCount.  If true,
 * return the quota value.  If false return the smallest value
 * between shares or quotas.
 *
 * If shares and/or quotas have been specified, the resulting number
 * returned will never exceed the number of active processors.
 *
 * return:
 *    number of CPUs
 */int OSContainer::active_processor_count() {  int quota_count = 0, share_count = 0;  int cpu_count, limit_count;  int result;

  cpu_count = limit_count = os::Linux::active_processor_count();  int quota  = cpu_quota();  int period = cpu_period();  int share  = cpu_shares();
...........
}

通過(guò)注釋發(fā)現(xiàn)，此時(shí)的計(jì)算是通過(guò)cgroup cpu quota & cpu period、cgroup cpu shares得來(lái)的，而Docker可以通過(guò)–cpu-period、–cpu-quota等來(lái)進(jìn)行設(shè)置。

同理，對(duì)于Memory的處理，如果不標(biāo)明-Xmx，JVM可以開(kāi)啟*-XX:+UnlockExperimentalVMOptions*、 -XX:+UseCGroupMemoryLimitForHeap這兩個(gè)參數(shù)，來(lái)使得JVM使用Linux cgroup的配置確定最大Java堆大小。

Arguments::set_heap_size()方法：

void Arguments::set_heap_size() {  if (!FLAG_IS_DEFAULT(DefaultMaxRAMFraction)) {// Deprecated flagFLAG_SET_CMDLINE(uintx, MaxRAMFraction, DefaultMaxRAMFraction);
  }

  julong phys_mem =
    FLAG_IS_DEFAULT(MaxRAM) ? MIN2(os::physical_memory(), (julong)MaxRAM)
                            : (julong)MaxRAM;  // Experimental support for CGroup memory limits
  if (UseCGroupMemoryLimitForHeap) {// This is a rough indicator that a CGroup limit may be in force// for this processconst char* lim_file = "/sys/fs/cgroup/memory/memory.limit_in_bytes";
    FILE *fp = fopen(lim_file, "r");if (fp != NULL) {
      julong cgroup_max = 0;      int ret = fscanf(fp, JULONG_FORMAT, &cgroup_max);      if (ret == 1 && cgroup_max > 0) {// If unlimited, cgroup_max will be a very large, but unspecified// value, so use initial phys_mem as a limitif (PrintGCDetails && Verbose) {          // Cannot use gclog_or_tty yet.  tty->print_cr("Setting phys_mem to the min of cgroup limit ("JULONG_FORMAT "MB) and initial phys_mem ("JULONG_FORMAT "MB)", cgroup_max/M, phys_mem/M);
        }
        phys_mem = MIN2(cgroup_max, phys_mem);
      } else {
        warning("Unable to read/parse cgroup memory limit from %s: %s",
                lim_file, errno != 0 ? strerror(errno) : "unknown error");
      }
      fclose(fp);
    } else {
      warning("Unable to open cgroup memory limit file %s (%s)", lim_file, strerror(errno));
    }
  }
....................
}

JVM會(huì)通過(guò)使用cgroup文件系統(tǒng)中的memory_limit（）值初始化os:：physical_memory（）中的值，不過(guò)我有注意到注釋上有Experimental support的字樣，估計(jì)不太成熟哈哈還只是實(shí)驗(yàn)性質(zhì)的支持。

這么看Java在Docker中運(yùn)行小坑與限制還不少呢，不知道哪里設(shè)置的不好就會(huì)出現(xiàn)一些莫名其妙的問(wèn)題，我們最好還是根據(jù)Docker的配置來(lái)顯式設(shè)置JVM的參數(shù)以避免大部分問(wèn)題。如果還是有問(wèn)題，可以考慮下升級(jí)較高版本的JDK8u，如果成本高不想升級(jí)請(qǐng)參考方案來(lái)外部加載一些庫(kù)進(jìn)行攔截修改

看完上述內(nèi)容，你們對(duì)Docker對(duì)JVM的限制有哪些有進(jìn)一步的了解嗎？如果還想了解更多知識(shí)或者相關(guān)內(nèi)容，請(qǐng)關(guān)注億速云行業(yè)資訊頻道，感謝大家的支持。

向AI問(wèn)一下細(xì)節(jié)

Docker對(duì)JVM的限制有哪些

猜你喜歡

最新資訊

相關(guān)推薦

相關(guān)標(biāo)簽