2007~2011/Windows Platform2008. 12. 2. 12:31

BSOD 0x9C STOP ERROR 에 대한 짧은 리포팅입니다. 실제 발생 근본 원인을 찾는 것은 쉽지 않아 보입니다.


[환경]
Windows 2000 Server SP4


[현상]
SYSTEM CRASH 에 따른 커널 메모리 덤프 분석
STOP Error : 0x9C


[원인]
unrecoverable hardware error(메모리 패리티 오류 또는 캐시와 같은 오류)가 프로세서에 의해서 감지되었을 때, 프로세서는 Interrupt 18 (Machine Check Exception)을 생성하여 오류를 운영체제에 전달하게 됩니다.

Machine Check Architecture 와 관련된 내용은 Intel Pentium Pro Family Developer's Manual - Volume 3: Operating System Writer's Manual 을 참조할 수 있습니다. 일반적으로 아래와 같은 요인으로 이 문제가 발생할 수 있습니다.

 1. 시스템 버스 오류
 2. 패리티 또는 ECC(오류 수정 코드) 문제를 포함할 수 있는 메모리 오류
 3. 프로세서나 하드웨어의 캐시 오류
 4. 프로세서의 TLB(Translation Lookaside Buffer) 오류
 5. 다른 특정 CPU 공급업체에서만 발견되는 하드웨어 문제
 6. 특정 공급업체에서만 발견되는 하드웨어 문제
 7. 프로세서 또는 버스 overclocking
 8. 파워 서플라이 잡음, 스트립 과부하, 과전압에 따른 장애
 9. 쿨러 장애에 따른 발열
 10. 손상된 메모리나 호환성에 문제가 있는 메모리 타입 사용



[해결방법]
하드웨어 제조업체에 문의하여 CPU, RAM, 메인보드 등의 하드웨어 시스템을 진단합니다.


[분석결과]
BugCheck 9C, {1, 0, b2000000, 13270}

2: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

MACHINE_CHECK_EXCEPTION (9c)
A fatal Machine Check Exception has occurred.
KeBugCheckEx parameters;
    x86 Processors
        If the processor has ONLY MCE feature available (For example Intel
        Pentium), the parameters are:
        1 - Low  32 bits of P5_MC_TYPE MSR
        2 - Address of MCA_EXCEPTION structure
        3 - High 32 bits of P5_MC_ADDR MSR
        4 - Low  32 bits of P5_MC_ADDR MSR
        If the processor also has MCA feature available (For example Intel
        Pentium Pro), the parameters are:
        1 - Bank number
        2 - Address of MCA_EXCEPTION structure
        3 - High 32 bits of MCi_STATUS MSR for the MCA bank that had the error
        4 - Low  32 bits of MCi_STATUS MSR for the MCA bank that had the error

Arguments:
Arg1: 00000001
Arg2: 00000000
Arg3: b2000000
Arg4: 00013270

Debugging Details:
------------------

   NOTE:  This is a hardware error.  This error was reported by the CPU
   via Interrupt 18.  This analysis will provide more information about
   the specific error.  Please contact the manufacturer for additional
   information about this error and troubleshooting assistance.

   This error is documented in the following publication:

      - IA-32 Intel(r) Architecture Software Developer's Manual
        Volume 3: System Programming Guide

   Bit Mask:

       MA                           Model Specific       MCA
    O  ID      Other Information      Error Code     Error Code
   VV  SDP ___________|____________ _______|_______ _______|______
   AEUECRC|                        |               |              |
   LRCNVVC|                        |               |              |
   ^^^^^^^|                        |               |              |
      6         5         4         3         2         1
   3210987654321098765432109876543210987654321098765432109876543210
   ----------------------------------------------------------------
   1011001000000000000000000000000000000000000000000000000101110101


VAL   - MCi_STATUS register is valid
        Indicates that the information contained within the IA32_MCi_STATUS
        register is valid.  When this flag is set, the processor follows the
        rules given for the OVER flag in the IA32_MCi_STATUS register when
        overwriting previously valid entries.  The processor sets the VAL
        flag and software is responsible for clearing it.

UC    - Error Uncorrected
        Indicates that the processor did not or was not able to correct the
        error condition.  When clear, this flag indicates that the processor
        was able to correct the error condition.

EN    - Error Enabled
        Indicates that the error was enabled by the associated EEj bit of the
        IA32_MCi_CTL register.

PCC   - Processor Context Corrupt
        Indicates that the state of the processor might have been corrupted
        by the error condition detected and that reliable restarting of the
        processor may not be possible.

MEMHIRERR - Memory Hierarchy Error   {TT}CACHE{LL}_{RRRR}_ERR
        These errors match the format 0000 0001 RRRR TTLL

   Concatenated Error Code:
   --------------------------
   _VAL_UC_EN_PCC_MEMHIRERR_75

   This error code can be reported back to the manufacturer.
   They may be able to provide additional information based upon
   this error.  All questions regarding STOP 0x9C should be
   directed to the hardware manufacturer.


2: kd> dt _MCA_EXCEPTION 00013270
hal!_MCA_EXCEPTION
   +0x000 VersionNumber    : ?? // Version number of this record type
   +0x004 ExceptionType    : ?? // MCA or MCE
   +0x008 TimeStamp        : _LARGE_INTEGER // exception recording timestamp
   +0x010 ProcessorNumber  : ?? // processor number
   +0x018 u                : __unnamed
Memory read error 00000185


2: kd> !mca
MCE: Enabled, Cycle Address: 0x0000000000000000, Type: 0x0000000000000000

MCA: Enabled, Banks 0, Control Reg: Not Supported, Machine Check: None.
Bank  Error  Control Register     Status Register
CP F/M/S Manufacturer  MHz Update Signature Features
 0 6,10,4 GenuineIntel  899 0000000100000000 00002fff
 1 6,10,4 GenuineIntel  900 0000000100000000 00002fff
 2 6,10,4 GenuineIntel  900>0000000100000000<00002fff
 3 6,10,4 GenuineIntel  900 0000000100000000 00002fff


[참고자료]
Bug Check 0x9C: MACHINE_CHECK_EXCEPTION
http://msdn.microsoft.com/en-us/library/ms795775.aspx

Bug Check Codes
http://msdn.microsoft.com/en-gb/library/ms789516.aspx

Machine Check Exception Handling for a Pentium Pro Processor
http://msdn.microsoft.com/en-us/library/aa501703.aspx

"0x0000009C (0x00000004, 0x00000000, 0xb2000000, 0x00020151)" 중지 오류 메시지
http://support.microsoft.com/?id=329284



작성자 : Lai Go / 작성일자 : 2008.12.02

Posted by Lai Go