MAPREDUCE服务 MRS-Executor进程Crash导致Stage重试:问题
问题
在执行大数据量的Spark任务(如100T的TPCDS测试套)过程中,有时会出现Executor丢失从而导致Stage重试的现象。查看Executor的日志,出现“Executor 532 is lost rpc with driver,but is still alive, going to kill it”所示信息,表明Executor丢失是由于JVM Crash导致的。
JVM的关键Crash错误日志,如下:
# # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (sharedRuntime.cpp:834), pid=241075, tid=140476258551552 # fatal error: exception happened outside interpreter, nmethods and vtable stubs at pc 0x00007fcda9eb8eb1
下载MAPREDUCE服务 MRS用户手册完整版