HPC Cluster manager is not starting up

Attuchirayil, Ajay 21 Reputation points
2024-03-27T06:05:25.9966667+00:00

Hi,

We have setup an Windows HPC cluster environment for running batch jobs in our Production and Non-Prod environments.

Setup:

HPC installation file used: HPCPack2016Update3-Full-Refresh-v6450.zip Few details about the cluster are below:

  • Windows Server 2016
  • Single Head node configuration
  • Network Topology 5
  • 48 Compute Nodes added to the cluster
  • 5 databases setup in a remote DB server

Issue:

When we try to start the HPC Cluster manager, it does not come up. Normally we open up the 'services.msc' and manually start 'HPC Diagnostic services' and that resolves the issue, but that is not helping now.

Event logs show below error:

[Store] Unable to connect to database, will retry. Exception: System.Data.SqlClient.SqlException (0x80131904): Login failed for user 'AMER(Server_Name)$'.

at System.Data.SqlClient.SqlInternalConnectionTds..ctor(DbConnectionPoolIdentity identity, SqlConnectionString connectionOptions, SqlCredential credential, Object providerInfo, String newPassword, SecureString newSecurePassword, Boolean redirectedUserInstance, SqlConnectionString userConnectionOptions, SessionData reconnectSessionData, DbConnectionPool pool, String accessToken, Boolean applyTransientFaultHandling, SqlAuthenticationProviderManager sqlAuthProviderManager)

at System.Data.SqlClient.SqlConnectionFactory.CreateConnection(DbConnectionOptions options, DbConnectionPoolKey poolKey, Object poolGroupProviderInfo, DbConnectionPool pool, DbConnection owningConnection, DbConnectionOptions userOptions)

at System.Data.ProviderBase.DbConnectionFactory.CreatePooledConnection(DbConnectionPool pool, DbConnection owningObject, DbConnectionOptions options, DbConnectionPoolKey poolKey, DbConnectionOptions userOptions)

at System.Data.ProviderBase.DbConnectionPool.CreateObject(DbConnection owningObject, DbConnectionOptions userOptions, DbConnectionInternal oldConnection)

at System.Data.ProviderBase.DbConnectionPool.UserCreateRequest(DbConnection owningObject, DbConnectionOptions userOptions, DbConnectionInternal oldConnection)

at System.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, UInt32 waitForMultipleObjectsTimeout, Boolean allowCreate, Boolean onlyOneCheckConnection, DbConnectionOptions userOptions, DbConnectionInternal& connection)

at System.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal& connection)

at System.Data.ProviderBase.DbConnectionFactory.TryGetConnection(DbConnection owningConnection, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal oldConnection, DbConnectionInternal& connection)

at System.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions)

at System.Data.SqlClient.SqlConnection.TryOpenInner(TaskCompletionSource`1 retry)

at System.Data.SqlClient.SqlConnection.TryOpen(TaskCompletionSource`1 retry)

at System.Data.SqlClient.SqlConnection.Open()

at Microsoft.Hpc.Diagnostics.Store.DiagnosticsStore.<GetFirstTimeDatabaseConnection>d__23.MoveNext()

ClientConnectionId:36337e63-273d-49c4-8ea5-fbbf387a5513

Error Number:18456,State:1,Class:14

Any suggestions or pointers would be helpful.

Azure HPC Cache
Azure HPC Cache
An Azure service that provides file caching for high-performance computing.
27 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. KarishmaTiwari-MSFT 20,312 Reputation points Microsoft Employee
    2024-03-29T17:04:41.09+00:00

    @Attuchirayil, Ajay This issue is related to HPC Pack. To confirm, you are not using Azure HPC Cache, right?

    • The login failure suggests that the user ‘AMER(Server_Name)$’ lacks the necessary permissions to access the database. Ensure that this user has appropriate permissions (read/write) on the HPC databases.
    • Check the network configuration between the HPC head node and the database server.
    • Ensure that firewalls or network security rules are not blocking communication.
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.