Troubleshooting SQL Cluster Install

The past couple of days I have had a really frustrating time trying to install SQL 2000 in a clustered environment.  We finally got it working.  It turns out that there is a rare bug during SQL setup where setup won't accept a product key.  This ended up being our solution: https://support.microsoft.com/default.aspx?scid=kb;en-us;555496.  We had to do this on both nodes. 

Check out all of the steps that we performed to come to this conclusion.  These steps are a great start to troubleshooting SQL 2000 cluster installations:

S:Subjective

============

Problem Description

==============================

1. SQL Installation is failing on two node cluster.

2. You are able to install SQL fine on the Cluster for one of the Node (STLMOMNODE01) but whenever we try to pull the second Node (STLMOMNODE02) into the Cluster it fails.

Expected Deliverables (What,When)

===================================

1. Install the SQL Cluster fine on the both the Nodes

Error Message

==============================

NA

O:Objective

===========

Environment Information

==============================

First Server:

SQL Server 2000 Enterprise Edition

Windows 2003 Enterprise SP1

2 Node Cluster.

Is the Server Down : False

Multiple Instance

No Replication Configuration

=============

Troubleshooting / Research

==============================

1. Collected the latest Logs and from the sqlstp.log on the Node1(STLMOMNODE01) we have:-

15:55:47 Setup is performing required operations on cluster nodes. This may take a few minutes...

15:55:47 C:\DOCUME~1\SRVMOM\LOCALS~1\Temp\SqlSetup\Bin\remsetup.exe C:\WINNT\remsetup.ini

15:56:21 Process Exit Code: (2) The system cannot find the file specified.

15:56:21 Begin Action : GetRemsetupRetCode

15:56:21 Installation return status on STLMOMNODE02 : 2

15:56:21 End Action : GetRemsetupRetCode

#### SQL Server Remote Setup - Start Time 11/15/05 15:55:47 ####

CThreadPool::RunUntilCompleteHlpr create thread, index=0

CThread::Run thread [0xa4] created for execution.

CThread::Process [0xa4]

CThreadPool::RunUntilCompleteHlpr start thread [0xa4],index=0

Script file copied to '

\\STLMOMNODE02\ADMIN$\STLMOMNODE02_MSSQLSERVER.iss' successfully.

Installing remote service (STLMOMNODE02)...

Running '

\\STLMOMNODE01\D$\ENTERP~1\x86\setup\setupsql.exe k=ClSec k=Rm k=Cl -SMS -s -f "\\STLMOMNODE01\D$\ENTERP~1\x86\setup\setup.ins" -f1 \\STLMOMNODE02\ADMIN$\STLMOMNODE02_MSSQLSERVER.iss -f2 "\\STLMOMNODE02\admin$\setup.log" -e "stpsilnt._ex" -x "C:\"' (STLMOMNODE02) ...

CRemoteProcess::RunUntilComplete [0xa4] exit code: 2

Remote process exit code was '2' (STLMOMNODE02).

CThreadPool::RunUntilCompleteHlpr WaitForMultipleObjects returned: 0

CThreadPool::RunUntilCompleteHlpr signaled thread [0xa4]

Thread [0xa4] exit code: [0x2]

CThreadPool::RunUntilComplete returned 2

CThreadPool::RunUntilComplete execution level=1, need execution: 0

One or more errors occurred while running the remote/unattended setups.

Disconnecting from remote machine (STLMOMNODE02)...

Service removed successfully.

Remote files removed successfully.

#### SQL Server Remote Setup - Stop Time 11/15/05 15:56:21 ####

2. From the sqlstp(n).log on Node2 (STLMOMNODE02 the bad node where is the installation is failing) we have this:-

15:56:18 begin ShowDialogsUpdateMask

15:56:18 nFullMask = 0xb73c0ff, nCurrent = 0x8, nDirection = 1

15:56:18 Updated Dialog Mask: 0x107fc0cf, Disable Back = 0x1

15:56:18 Dialog 0x8 returned: 1

15:56:18 End Action ShowDialogsHlpr

15:56:18 ShowDialogsGetDialog returned: nCurrent=0x40,index=6

15:56:18 Begin Action ShowDialogsHlpr: 0x40

15:56:18 Begin Action: DialogShowSdCDKey

15:56:18 digpid size : 256

15:56:18 [DlgCDKey]

15:56:18 End Action DialogShowSdCDKey

15:56:18 End Action ShowDialogs

15:56:18 Action CleanUpInstall:

15:56:18 StatsGenerate returned: 2

15:56:18 StatsGenerate (0x0,0x20,0xf00000,0x100,1033,0,0x0,0x2000000a,0,0,0

15:56:18 StatsGenerate -1,SRVMOM)

15:56:18 Installation Failed.

3. Stopped the SNMP and Antivirus Services and tried to Pull STLMOMNODE02 back again into the Cluster but it again failed at the exact same point and same errors in the Log files on both the Nodes.

4. Other troubleshooting steps we did:-

<> Checked the MDAC Version, no mismatches.

<> Shared the setup folder and gave it full access to all, started the installation still it's failing with the same error message in the log files.

<> Confirmed that the account through which we are logged in is Admin on both the Nodes.

<> Checked for any special characters in Resource Names.

<> Checked for duplicate recourse names for SQL and Cluster Resources. They are all different.

<> Checked the Network Priority in the Cluster Admin (Private and then Public). That looked good to.

<> We stopped all the non essential Services from both the Nodes and re-booted the Box. Doing this resulted in Cluster Services not coming up STLMOMNODE02 as Network Connection Service was still in start mode. We rebooted the Box but still nothing positive. Network Connection Services is still not coming up.

5. So after stopping the maximum number of non-essential services we tried the installation again from STLMOMNODE01 but it again failed with the exact same error message in the Logs.

6. At this point we decided to un-install SQL in all respects both from STLMOMNODE01 and STLMOMNODE02. We did that removed any of the left over traces from Registry.

7. Got Rid of the earlier Resource Name and created a new group, moved the disk resources to this new Group, failed over the group to STLMOMNODE02.

8. Now we ran the SQL Setup from STLMOMNODE02 (the bad Node ) and one strange thing that we found was that on STLMOMNODE02 during the setup it was asking about the CD Key for the SQL Installation. This was really strange as generally on Cluster, SQL does not ask for the CD Key. If you give in the CD Key its use to give us the error message as "Unable to validate product key ".

9. Found a case https://support.microsoft.com/default.aspx?scid=kb;en-us;555496 and went ahead and created the Key at HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager]

"SafeDllSearchMode"=dword:00000000--->(New added key).

Doing this we were able to get past the issue with CD Key.

10. So the CD key was the culprit in our case. While we were installing from STLMOMNODE01 SQL was getting installed on STLMOMNODE02 as a silent Installation and was just erroring out at the point were it was checking the CD Key.

15:56:18 [DlgCDKey]

15:56:18 End Action DialogShowSdCDKey

15:56:18 End Action ShowDialogs

15:56:18 Action CleanUpInstall:

15:56:18 StatsGenerate returned: 2

15:56:18 StatsGenerate (0x0,0x20,0xf00000,0x100,1033,0,0x0,0x2000000a,0,0,0

15:56:18 StatsGenerate -1,SRVMOM)

15:56:18 Installation Failed.

11. After adding the "SafeDllSearchMode"=dword:00000000 we were able to run the setup fine in all respects. We pulled the other Node also into the Cluster.

12. Installed another Named Instance on the Cluster. Went fine

13. Installed SP3 for the Default Instance it also went through.