Archive for the ‘Windows 2008 R2’ Category

SCVMM 2008 R2–Error 2912 – An internal error has occurred trying to contact an agent

March 20, 2013

I love the “unknown” error things but as all of us know you can NOT handle all error conditions within application coding. Therefore I would like to provide here an possible resolution which you need to confirm if it applies to your environment.

I faced this error and did some research and found out that this could also be related to BITS (Background Intelligent Transfer Service). BITS is being used excessively by VMM for file transfers.

Error (2912)
An internal error has occurred trying to contact an agent on the vmmmserver.yourdomain.com server.
(Unknown error (0x80041001))

Recommended Action
Ensure the agent is installed and running. Ensure the WS-Management service is installed and running, then restart the agent.

Possible fix:

import-module BitsTransfer
Get-BitsTransfer -AllUsers

Check for the output if you see any BITS jobs which are not owned by "NT AUTHORITY\SYSTEM" and have status "Suspended".

To delete the failed/corrupt BITS job you need to run:
import-module BitsTransfer
$AllJobs = Get-BitsTransfer -AllUsers
Remove-BitsTransfer -BitsJob $AllJobs

You will receive an "Access Denied" but this is ok as SYSTEM owned jobs cannot be deleted:

Remove-BitsTransfer : Access is denied. (Exception from HRESULT: 0x80070005 (E_ACCESSDENIED))

image

List of known issues for Background Intelligent Transfer Service (BITS)
http://support.microsoft.com/kb/331716/en-us

NetApp – New Windows MPIO (V3.5) available

November 8, 2011

NetApp has released a new version of their MPIO DSM (Version 3.5) which does include several fixes and simplifies the deployment for NetApp connected Windows systems.

The new features which are included in version 3.5 are:

The Data ONTAP DSM 3.5 for Windows MPIO includes the following changes:

– Data ONTAP operating in Cluster-Mode is now supported, starting with version 8.1. Note the following about Cluster-Mode support:
  – Asymmetric logical unit access (ALUA) is required for all Fibre Channel (FC) paths and for all iSCSI paths to Cluster-Mode LUNs.
  – Mixed FC and iSCSI paths to the same Cluster-Mode LUN is supported.

New Windows PowerShell cmdlets are available to manage the DSM. The cmdlets replace the dsmcli commands, which are deprecated starting in DSM 3.5. The dsmcli commands will be removed in a future release.
– The Windows Host Utilities are no longer required. The Windows Host Utilities components that enable you to configure Hyper-V systems (mbralign.exe and LinuxGuestConfig.iso) are now included with the DSM. While no longer required, installing the Windows Host Utilities on the same host as the DSM is still supported.
Hyper-V guests running Red Hat Enterprise Linux (RHEL) are now supported. The Interoperability Matrix lists the specific versions supported.
– The number of reboots required to install or upgrade the DSM is reduced. For example, when you install Windows hotfixes, you can wait to reboot the host until after you install or upgrade the DSM.
– The timeout values set by the Data ONTAP DSM are updated based on ongoing testing.
– The options that you use to specify preferred paths for the Round Robin with Subset policy has changed. Note the following changes:
– The options that you use in the graphical user interface (GUI) to specify preferred paths for the Round Robin with Subset policy has changed. You now use the Set Preferred and Clear Preferred options to specify preferred paths. The Set Active and Set Passive options are no longer available for Round Robin with Subset. Note: These changes do not alter how the Round Robin with Subset policy works. Round Robin with Subset is still an "active/active" policy that enables you to specify preferred and non-preferred paths. The changes align the GUI terminology with how the policy works.

Release Notes can be found here: https://now.netapp.com/NOW/knowledge/docs/mpio/win/reldsm35/pdfs/rnote.pdf

NetApp MPIO Version 3.5 for Windows Systems can be found here:

https://now.netapp.com/NOW/download/software/mpio_win/3.5/

If you deploy MPIO DSM V3.5 from NetApp at a fully patched Windows 2008 R2 SP1 system you will get following “error” if you haven’t installed KB2522766 and KB2528357:

image

KB2522766 – The MPIO driver fails over all paths incorrectly when a transient single failure occurs in Windows Server 2008 or in Windows Server 2008 R2
http://support.microsoft.com/kb/2522766/

KB2528357 – Nonpaged pool leak when you disable and enable some storage controllers in Windows 7 or in Windows Server 2008 R2
http://support.microsoft.com/kb/2528357

image

image

After installing KB2522766 an reboot is required.

Also KB2522766 requires an reboot:

image

image

After 2 reboots you can install now MPIO DSM 3.5:

image

In my case, there is already an “older” version of NetApp MPIO DSM installed which will be automatically detected and upgrade from setup:

image

image

image

image

image

Note: In my case this is an Hyper-V server therefore I do install the Hyper-V Guest utilities which are one of the new features and mentioned above in the feature list.

image

image

……Finally Finished 🙂

image

After an 3rd reboot you can now proceed with your SAN configuration.

Please stay tuned for more details around the new utilities and especially around the new Powershell commandlet’s for NetApp which are called “Powershell Toolkit Version 1.6”.

Powershell Toolkit 1.6

http://communities.netapp.com/community/interfaces_and_tools/data_ontap_powershell_toolkit/data_ontap_powershell_toolkit_downloads
http://communities.netapp.com/community/interfaces_and_tools/data_ontap_powershell_toolkit?view=documents

Stay tuned…. 😉

Regards

Ramazan

Ramazan

How to configure NIC Teaming with HP Proliant and Cisco or Procurve Switch Infrastructure?

August 6, 2011

Very often I do need to configure NIC teams with HP hardware with Cisco or Procurve networking infrastructure therefore I would like to share an general overview of the NIC HP teaming capabilities, general teaming algorithms and especially how to configure an Cisco or Procurve switches.

My personal preferred teaming mode here is to use the 802.3ad teaming mode as it provides most redundancy and performance throughput capabilities and is the current industry standard which is well understood by enterprise switches.

image

image

HP generally provides following NIC teaming capabilities and algorithms:

1.1 Network Fault Tolerance (NFT) only – Network Fault Tolerance (NFT) is the foundation of HP ProLiant Network Adapter Teaming. In NFT mode, from two to eight teamed ports are teamed together to operate as a single virtual network adapter. However, only one teamed port—the primary teamed port—is used for both transmit and receive communication with the server. The remaining adapters are considered to be stand-by (or secondary adapters) and are referred to as non-primary teamed ports. Non-primary teamed ports remain idle unless the primary teamed port fails. All teamed ports may transmit and receive heartbeats, including non-primary adapters.

The fault-tolerance feature that NFT represents for HP ProLiant Network Adapter Teaming is the only feature found in every other team type. The foundation of every team type supports NFT.

1.2 Network Fault Tolerance (NFT) with Preference Order – Network Fault Tolerance Only with Preference Order is identical in almost every way to NFT with the only difference being that this team type allows the SA to prioritize the order in which teamed ports should be the primary teamed port. This ability is important in environments where one or more teamed ports are more preferred than other ports in the same team. The need for ranking certain teamed ports better than others can be a result of unequal speeds, better adapter capabilities (for example, higher receive/transmit descriptors or buffers, interrupt coalescence, and so on), or preference for the team’s primary port to be located on a specific switch.

1.3 Transmit Load Balancing (TLB) with Fault Tolerance – Transmit Load Balancing with Fault Tolerance (TLB) is a team type that allows the server to load balance its transmit traffic. TLB is switch independent and supports switch fault tolerance by allowing the teamed ports to be connected to more than one switch in the same LAN. With TLB, traffic received by the server is not load balanced. The primary teamed port is responsible for receiving all traffic destined for the server. In case of a failure of the primary teamed port, the NFT mechanism ensures connectivity to the server is preserved by selecting another teamed port to assume the role.

1.4 Transmit Load Balancing (TLB) with Fault Tolerance and Preference Order – Transmit Load Balancing with Fault Tolerance and Preference Order is identical in almost every way to TLB with the only difference being that this team type allows the SA to prioritize the order in which teamed ports should be the primary teamed port. This ability is important in environments where one or more teamed ports are more preferred than other ports in the same team. The need for ranking certain teamed ports higher than others can be a result of unequal speeds, better adapter capabilities (for example, higher receive/transmit descriptors or buffers, interrupt coalescence, and so on), or preference for the team’s primary port to be located on a specific switch.

1.5 Switch-assisted Load Balancing (SLB) with Fault Tolerance – Switch-assisted Load Balancing with Fault Tolerance (SLB) is a team type that allows full transmit and receive load balancing. SLB requires the use of a switch that supports some form of Port Trunking (for example, EtherChannel, MultiLink Trunking, and so on). SLB does not support switch redundancy because all ports in a team must be connected to the same switch. SLB is similar to the 802.3ad Dynamic team type.

1.6 802.3ad Dynamic with Fault Tolerance – 802.3ad Dynamic with Fault Tolerance is identical to SLB except that the switch must support the IEEE 802.3ad dynamic configuration protocol called Link Aggregation Control Protocol (LACP). In addition, the switch port, to which the teamed ports are connected, must have LACP enabled. The main benefit of 802.3ad Dynamic is that an SA will not have to manually configure the switch. 802.3ad Dynamic is a standard feature of HP ProLiant Network Adapter Teaming.

1.7 Automatic (both) – The Automatic team type is not really an individual team type. Automatic teams decide whether to operate as an NFT, or a TLB team, or as an 802.3ad Dynamic team. If all teamed ports are connected to a switch that supports the IEEE 802.3ad Link Aggregation Protocol (LACP) and all teamed ports are able to negotiate 802.3ad operation with the switch, then the team will choose to operate as an 802.3ad Dynamic team. However, if the switch does not support LACP or if any ports in the team do not have successful LACP negotiation with the switch, the team will choose to operate as a TLB team. As network and server configurations change, the Automatic team type ensures that HP ProLiant servers intelligently choose between TLB and 802.3ad Dynamic to minimize server reconfiguration.

2. Load Balancing Algorithm

All load-balancing team types (TLB, SLB, and 802.3ad Dynamic) load balance transmitted frames. There is a fundamental decision that must be made when determining load balancing mechanisms: whether or not to preserve frame order.

Frame order preservation is important for several reasons – to prevent frame retransmission because frames arrive out of order and to prevent performance-decreasing frame reordering within OS protocol stacks. In order to avoid frames from being transmitted out of order when communicating with a target network device, the team’s load-balancing algorithm assigns “outbound conversations” to a particular teamed port. In other words, if frame order preservation is desired, outbound load balancing by the team should be performed on a conversation-by-conversation basis rather than on a frame-by-frame basis. To accomplish this, the load-balancing device (either a team or a switch) needs information to identify conversations. Destination MAC address, Destination IP address, and TCP Connection are used to identify conversations.

It is very important to understand the differences between the load-balancing methods when deploying HP ProLiant Network Adapter Teaming in an environment that requires load balancing of routed Layer 3 traffic. Because the methods use conversations to load balance, the resulting traffic may not be distributed equally across all ports in the team. The benefits of maintaining frame order outweigh the lack of perfect traffic distribution across teamed ports’ members. Implementers of HP ProLiant Network Adapter Teaming can choose the appropriate load balancing method via the NCU.

 clip_image002

2.1 TLB Automatic method

Automatic is a load-balancing method that is designed to preserve frame ordering.

This method will load balance outbound traffic based on the highest layer of information in the frame. For instance, if a frame has a TCP header with TCP port values, the frame will be load balancing by TCP connection (see “TLB TCP Connection method” below). If the frame has an IP header with an IP address but no TCP header, then the frame is load balanced by destination IP address (see “TLB Destination IP Address method” below). If the frame does not have an IP header, the frame is load balanced by destination MAC address (see “TLB Destination MAC Address method” below).

2.2 TLB TCP Connection method

TCP Connection is also a load-balancing method that is designed to preserve frame ordering.

This method will load balance outbound traffic based on the TCP port information in the frame’s TCP header. This load-balancing method combines the TCP source and destination ports to identify the TCP conversation. Combining these values, the algorithm can identify individual TCP conversations (even multiple conversations between the team and one other network device). The algorithm used to choose which teamed port to use per TCP conversation is similar to the algorithms used in the “TLB Destination IP Address method” and “TLB Destination MAC Address method” sections below.

If this method is chosen, and the frame has an IP header with and IP address but not a TCP header, then the frame is load balanced by destination IP address (see “TLB Destination IP Address method” below). If the frame does not have an IP header, the frame is load balanced by destination MAC address (see “TLB Destination MAC Address method” below).

2.3 TLB Destination IP Address method

Destination IP Address is a load-balancing method that will attempt to preserve frame ordering.

This method makes load-balancing decisions based on the destination IP address of the frame being transmitted by the teaming driver. The frame’s destination IP address belongs to the network device that will ultimately receive the frame. The team utilizes the last three bits of the destination IP address to assign the frame to a port for transmission.

Because IP addresses are in decimal format, it is necessary to convert them to binary format. For example, an IP address of 1.2.3.4 (dotted decimal) would be 0000 0001.00000010.00000011.0000 0100 in binary format. The teaming driver only uses the last three bits (100) of the least significant byte (0000 0100 = 4) of the IP address. Utilizing these three bits, the teaming driver consecutively assigns destination IP addresses to each functional network port in its team starting with 000 being assigned to network port 1, 001 being assigned to network port 2, and so on. Of course, how the IP addresses are assigned depends on the number of network ports in the TLB team and how many of those ports are in a functional state (see Table 4-4).

clip_image004


3. How to configure 802.3ad with Cisco and HP Procurve?

3.1 Configuration of Cisco Switch with 2 network ports

Switch#conf ter
Switch(config)#Int PORT (a.e. Gi3/1)
Switch(config-if)#switchport mode access
Switch(config-if)#spanning-tree portfast
Switch(config-if)#channel-group <48> mode active
Switch(config-if)#Int PORT (a.e. Gi3/1)
Switch(config-if)#switchport mode access
Switch(config-if)#spanning-tree portfast
Switch(config-if)#channel-group <48> mode active

3.2 Configuration of HP Procurve with 2 network ports

PROCURVE-Core1#conf ter
PROCURVE-Core1# trunk PORT1-PORT2 (a.e. C1/C2) Trk<ID> (a.e. Trk99) LACP
PROCURVE-Core1# vlan <VLANID>
PROCURVE-Core1# untagged Trk<ID> (a.e. Trk99)
PROCURVE-Core1# show lacp
PROCURVE-Core1# show log lacp

a.e.: How to add additional ports to an existing HP trunk?

image

Note: In this example I do add port D5 and D6 to an configured trunk with trunk ID 70. In total I do have here an 4 port NIC team with ports C23, C24, D5, D6 based on LACP.

3.3 Configuration of HP NIC

clip_image006

NOTE: Automatic can also be used, as the teaming network drivers will automatically detect and handle the best teaming method with the switches => 802.3ad Dynamic Fault Tolerance

image

image

3.4 Results in “sh running-config” in Cisco Example

interface GigabitEthernet3/1
description SERVERNAME-NIC1
switchport access vlan <VLANID>
switchport mode access
spanning-tree portfast
channel-group 60 mode active

interface GigabitEthernet3/2
description SERVERNAME-NIC2
switchport access vlan <VLANID>
switchport mode access
spanning-tree portfast
channel-group <1-48> mode active

interface Port-channel<1-48>
description SERVERNAME-TEAM1
switchport
switchport access vlan <VLANID>
switchport mode access

3.5 Results in “sh int status” in Cisco Example

Gi3/1 SERVERNAME-NIC1 connected 10 a-full a-1000
Gi3/2 SERVERNAME-NIC2 connected 10 a-full a-1000
Po60 SERVERNAME-TEAM1 connected 10 a-full a-1000

Note: In this example the network ports Gi3/1 and Gi3/2 are bound to an new PortChannel (Po60) which is created and connected to VLAN10.

4. References

http://www.cisco.com/en/US/tech/tk389/tk213/technologies_configuration_example09186a008089a821.shtml

http://www.cisco.com/application/pdf/paws/98469/ios_etherchannel.pdf

http://cdn.procurve.com/training/Manuals/2900-MCG-Jan08-11-PortTrunk.pdf

IMPORTANTE: In NIC Teaming with HP hardware scenarios especially when Hyper-V is involved it is important to follow the installation guide from the NIC manufacturer, in case of HP NCU it is mainly important to strictly follow the installation order:

1. Install OS + patches
2. Install Hyper-V role
3. Install NCU (Network Configuration Utility) (included in Proliant Support Pack, current version 8.70)

More detailed steps can you find in the HP reference guide here:

Using HP ProLiant Network Teaming Software with Microsoft® Windows® Server 2008 (R2) Hyper-V  (4th Edition)
http://h20000.www2.hp.com/bc/docs/support/SupportManual/c01663264/c01663264.pdf

Note: Please be aware most of this blog post are cross-references from HP and Cisco networking documentation.

Stay tuned …. Winking smile

Regards

Ramazan

How to use Powershell in Failover Clustering–Part 1

August 5, 2011

Powershell is a really powerful toy for any Windows administrator where tasks need to be automated and or repeated several times.

Here I would like to start an blog series of how to use Powershell in Windows Server Failover Cluster (WSFC) environments.

Let’s start over with some basic commandlet’s to get status of clustered groups, resources and cluster core resources, move groups and soon. Later we will do some advanced operations with Powershell.

The WSFC Powershell (CLI) interface is everywhere available where WSFC feature is installed:

  • Windows Server 2008 R2 (SP1)
    • Full
    • Core (not installed by default)
  • Microsoft Hyper-V Server 2008 R2 (SP1)
  • Remote Server Administration Tools (RSAT) for Windows 7 (SP1)

First of all, you must import the “FailoverCluster” module to get the commandlet’s. To get a list of all available modules you can use:

PS:\ Get-Module -ListAvailable

image

To import the module you need to use “Import-Module” commandlet:

PS:\ Import-Module FailoverClusters

image

YUHUU Smile We have now the full set of all Failover Cluster commandlet’s available, let’s have a look which they are:

PS:\ Get-Command –Module FailoverClusters or Get-Command | findstr Cluster

image

image

As you can see, there are for the most of the GUI actions an cmdlet available, sometimes more. In total there are 69 Failover Cluster specific cmdlet’s which gives you many many *creative* usage ways.

NOTE: A really helpful method with Powershell is the Get-Help cmdlet:
Built-in help:
Get-Help cmdlet -Full
Examples:
Get-Help cmdlet –Examples
Online help (INET connection is required):
Get-Help cmdlet –Online

Let’s start with some easy commandlet to do some basic operations:

How to get a list of all clustered groups via Powershell?

PS:\ Get-ClusterGroup

image

image

As you can see I do have some resources offline. Let’s bring them online

How to bring online a Cluster Group via Powershell?

PS:\ Start-ClusterGroup “GROUPNAME” image

Now, let’s move a group to a different node to balance the workloads in my *test* cluster Smile

How to move a cluster group via Powershell?

PS:\ Move-ClusterGroup “GROUPNAME” –Node “NODENAME”

image

How to get a list of all clustered resources via Powershell?

PS:\ Get-ClusterResource

image

How to get a list of all clustered groups from an node via Powershell?

PS:\ Get-ClusterNode –Name “NODENAME” | Get-ClusterGroup

image

How to get a list of all clustered resources within a cluster group via Powershell?

PS:\  Get-ClusterGroup "GROUPNAME" | Get-ClusterResource

image

How to get more parameters from a clustered disk (resource)?

PS:\ Get-ClusterResource "Cluster Disk 1" | Get-ClusterParameter

image

image

How can I test/validate my cluster via Powershell?

Since R2 you can also validate your cluster via CLI Powershell:

PS:\ Get-Help Test-Cluster

A.e.: PS:\ Test-Cluster –Node Node1,Node2

image

image

A list of all available Test-Cluster scenarios – which can be *in-/excluded*- can you find here:

image

Hope this has opened your interest in more around Powershell for Failover Clustering – stay tuned for more or start with playing with PS…..Winking smile

Some additional reference can you find here:

Mapping Cluster.exe Commands to Windows PowerShell Cmdlets for Failover Clusters
http://technet.microsoft.com/en-us/library/ee619744(WS.10).aspx

PowerShell Quick Reference
http://www.microsoft.com/downloads/details.aspx?FamilyId=DF8ED469-9007-401C-85E7-46649A32D0E0&displaylang=en

Clustering with PowerShell
http://technet.microsoft.com/en-us/library/ee619751(WS.10).aspx

PowerShell for Failover Clustering: Finding the Drive Letter
http://blogs.msdn.com/b/clustering/archive/2009/10/16/9908325.aspx

PowerShell for Failover Clustering: Understanding Error Codes
http://blogs.msdn.com/b/clustering/archive/2010/04/28/10003627.aspx

PowerShell for Failover Clustering: Frequently Asked Questions
http://blogs.msdn.com/b/clustering/archive/2009/05/23/9636665.aspx

Regards

Ramazan Can

Post SP1 Hotfixes for Windows 2008 R2 SP1 with Failover Clustering and Hyper-V

August 3, 2011

In addition to my previous blog around the list of recommended hotfixes for SCVMM, Hyper-V and Failover Clustering, I do personally maintain currently a list with all post (recommended/required/regular….) SP1 released hotfixes especially with focus on Failover Cluster and Hyper-V:

https://ramazancan.wordpress.com/2010/11/21/recommended-hotfixes-scvmm-hyper-v/

KB2528357 – Nonpaged pool leak when you disable and enable some storage controllers in Windows 7 or in Windows Server 2008 R2
http://support.microsoft.com/kb/2528357

KB2568088 – Virtual machine does not start on a computer that has an AMD CPU that supports the AVX feature and that is running Windows Server 2008 R2 (SP1)
http://support.microsoft.com/kb/2568088

KB2547551 – Hyper-V Export function consumes all available memory in Windows Server 2008 or in Windows Server 2008 R2 (SP1)
http://support.microsoft.com/kb/2547551

KB2550569 – "0x20001" Stop error when you start a Linux VM in Windows Server 2008 R2 SP1
http://support.microsoft.com/kb/2550569

KB2504962 – Dynamic Memory allocation in a Virtual Machine does not change although there is available memory on the host
http://support.microsoft.com/kb/2504962

KB2534356 – Some CPU cores are parked while other active CPU cores have a heavy workload in Windows Server 2008 R2 SP1
http://support.microsoft.com/kb/2534356

KB2496034 – Cluster service stops when an error occurs in the registry replication process of a failover cluster in Windows Server 2008 R2 (SP1) or in Windows Server 2008
http://support.microsoft.com/kb/2496034

KB2549448 – Cluster service still uses the default time-out value after you configure the regroup time-out setting in Windows Server 2008 R2 (SP1)
http://support.microsoft.com/kb/2549448

KB2549472 – Cluster node cannot rejoin the cluster after the node is restarted or removed from the cluster in Windows Server 2008 R2 (SP1)
http://support.microsoft.com/kb/2549472

KB2446607 – You cannot enable BitLocker on a disk volume in Windows Server 2008 R2 (SP1) if the computer is a failover cluster node
http://support.microsoft.com/kb/2446607

KB2462576 – The NFS share cannot be brought online in Windows Server 2008 R2 (SP1) when you try to create the NFS share as a cluster resource on a third-party storage disk
http://support.microsoft.com/kb/2462576

KB2485543 – You cannot access or mount a Windows Server 2008 R2 (SP1) based NFS share after a failover if the NFS share uses Krb5 or Krb5i authentication
http://support.microsoft.com/kb/2485543

KB2545685 – Recommended hotfixes and updates for Windows Server 2008 R2 SP1 Failover Clusters
http://support.microsoft.com/kb/2545685

KB2531907 – Validate SCSI Device Vital Product Data (VPD) test fails after you install Windows Server 2008 R2 SP1
http://support.microsoft.com/kb/2531907

KB2494162 – The Cluster service stops unexpectedly on a Windows Server 2008 R2 (SP1) failover cluster node when you perform multiple backup operations in parallel on a cluster shared volume
http://support.microsoft.com/kb/2494162

KB2552040 – A Windows Server 2008 R2 (SP1) failover cluster loses quorum when an asymmetric communication failure occurs
http://support.microsoft.com/kb/2552040

KB2494036 – A hotfix is available to let you configure a cluster node that does not have quorum votes in Windows Server 2008 and in Windows Server 2008 R2 (SP1)
http://support.microsoft.com/kb/2494036

KB2496089 – The Hyper-V Virtual Machine Management service stops responding intermittently when the service is stopped in Windows Server 2008 R2
http://support.microsoft.com/kb/2496089

KB2521220 – "0x0000001E" Stop error when you perform disk I/O-intensive operations on dynamic disks in Windows Server 2008 or in Windows Server 2008 R2 (SP1)
http://support.microsoft.com/kb/2521220

KB2512715 – Validate Operating System Installation Option test may identify Windows Server 2008 R2 Server Core installation type incorrectly in Windows Server 2008 R2 (SP1)
http://support.microsoft.com/kb/2512715

KB2501763 – Read-only pass-through disk after you add the disk to a highly available VM in a Windows Server 2008 R2 SP1 failover cluster
http://support.microsoft.com/kb/2501763

KB2520235 – "0x0000009E" Stop error when you add an extra storage disk to a failover cluster in Windows Server 2008 R2 (SP1)
http://support.microsoft.com/kb/2520235

KB2529956 – Windows Server 2008 R2 (SP1) installation may hang if more than 64 logical processors are active
http://support.microsoft.com/kb/2529956

KB2545227 – Event ID 10 is logged in the Application log after you install Service Pack 1 for Windows 7 or Windows Server 2008 R2
http://support.microsoft.com/kb/2545227

KB2517329 – Performance decreases in Windows Server 2008 R2 (SP1) when the Hyper-V role is installed on a computer that uses Intel Westmere or Sandy Bridge processors
http://support.microsoft.com/kb/2517329/en-us

KB2532917 – Hyper-V Virtual Machines Exhibit Slow Startup and Shutdown
http://support.microsoft.com/kb/2532917

KB2494016 – Stop error 0x0000007a occurs on a virtual machine that is running on a Windows Server 2008 R2-based failover cluster with a cluster shared volume, and the state of the CSV is switched to redirected access
http://support.microsoft.com/kb/2494016/en-us

KB2263829 – The network connection of a running Hyper-V virtual machine may be lost under heavy outgoing network traffic on a computer that is running Windows Server 2008 R2 SP1
http://support.microsoft.com/kb/2263829/en-us

KB2485986 – An update is available for Hyper-V Best Practices Analyzer for Windows Server 2008 R2 (SP1)
http://support.microsoft.com/kb/2485986/en-us

KB2494162 – The Cluster service stops unexpectedly on a Windows Server 2008 R2 (SP1) failover cluster node when you perform multiple backup operations in parallel on a cluster shared volume
http://support.microsoft.com/kb/2494162/en-us

KB2521348 – A virtual machine online backup fails in Windows Server 2008 R2 (SP1) when the SAN policy is set to "Offline All"
http://support.microsoft.com/kb/2521348

KB2519736 – Stop error message in Windows Server 2008 R2 SP1 or in Windows 7 SP1: "STOP: 0x0000007F"
http://support.microsoft.com/kb/2519736

KB980915 – A long time delay occurs when you reconnect an IPSec connection from a computer that is running Windows Server 2003, Windows Vista, Windows Server 2008, Windows 7, or Windows Server 2008 R2 (SP1) (in IPSEC scenarios)
http://support.microsoft.com/kb/980915

RemoteFX specific:

KB2501816 – "Display driver stopped responding but has recovered" error in a Windows 7 SP1-based VM that has a RemoteFX video adapter
http://support.microsoft.com/kb/2501816

KB2505030 – A RemoteFX VM does not start, and you receive the error "Microsoft Synthetic 3D Display Controller: Failed to Power on" when you try to add the RemoteFX 3D Video adapter from Hyper-V settings
http://support.microsoft.com/kb/2505030

KB2506391 – In Windows Server 2008 R2 SP1, RemoteFX virtual machines cannot start, and you receive an error: "Failed to Power on with Error ‘Insufficient system resources exist to complete the requested service’"
http://support.microsoft.com/kb/2506391

KB2505694 – Error message when you try to start a RemoteFX-enabled virtual machine: "Microsoft Synthetic 3D Display Controller : Failed to Power on with Error ‘Insufficient system resources exist to complete the requested service’"
http://support.microsoft.com/kb/2505694

KB2506417 – New and existing RemoteFX-enabled virtual machines do not start on a domain controller that is running the Remote Desktop Virtualization Host service in Windows Server 2008 R2 SP1
http://support.microsoft.com/kb/2506417

KB2506434 – In Windows Server 2008 R2 SP1, RemoteFX VMs cannot start if the Hyper-V server display adapter has been changed
http://support.microsoft.com/kb/2506434

KB2519946 – Timeout Detection and Recovery (TDR) randomly occurs in a virtual machine that uses the RemoteFX feature in Windows Server 2008 R2 (SP1)
http://support.microsoft.com/kb/2519946

KB2523676 – GPU is not accessed leads to some VMs that use the RemoteFX feature to not start in Windows Server 2008 R2 SP1
http://support.microsoft.com/kb/2523676

KB2533362 – Hyper-V settings hang after installing RemoteFX on Windows 2008 R2 SP1
http://support.microsoft.com/kb/2533362

MPIO specific:

KB2277904 – You cannot access an MPIO-controlled storage device in Windows Server 2008 R2 (SP1) after you send the "IOCTL_MPIO_PASS_THROUGH_PATH_DIRECT" control code that has an invalid MPIO path ID
http://support.microsoft.com/kb/2277904

KB2406705 – Some I/O requests to a storage device fail on a fault-tolerant system that is running Windows Server 2008 or Windows Server 2008 R2 (SP1) when you perform a surprise removal of one path to the storage device
http://support.microsoft.com/kb/2406705

KB2460971 – MPIO failover fails on a computer that is running Windows Server 2008 R2 (SP1)
http://support.microsoft.com/kb/2460971

KB2511962 – "0x000000D1" Stop error occurs in the Mpio.sys driver in Windows Server 2008 R2 (SP1)
http://support.microsoft.com/kb/2511962

KB2522766 – The MPIO driver fails over all paths incorrectly when a transient single failure occurs in Windows Server 2008 or in Windows Server 2008 R2 (SP1)
http://support.microsoft.com/kb/2522766/

Thanks to Alex K. for the MPIO part…..Winking smile

Please let me know if I have missed any KB here.

PS: KB2496089 is already included in SP1 –> Hotfixes and Security Updates included in Windows 7 and Windows Server 2008 R2 Service Pack 1.xls

Stay tuned….. Smile

Regards

Ramazan

Failover Cluster Network Design with Hyper-V–How many NICs are required?

July 25, 2011

Failover Clustering reliability and stability is also “strongly” dependent on the underlying networking design and *drivers* but that’s a another story…. Let’s focus here in the design part.

Since there is no more the hard requirement like in Windows 2003 based clusters (MSCS) for an “HB” (heartbeat) network there is some “unsureness” around the network design for Failover Clustering based on Windows 2008 especially when virtualization workloads are involvedSmileHeartbeat traffic.

Cluster intra-communication (heartbeat traffic) will now go over each cluster network per default except you disable for cluster usage like in case of ISCSI:

image

NOTE: It is a well known best practice to disable cluster communication for ISCSI networks = dedicated for ISCSI traffic only!

The “golden” rule here is, for a “general” Failover Cluster, Microsoft does “recommend” to have at minimum 2 redundant network “paths” between the cluster nodes . But often you want to use more than the minimum “recommended” as you want to have additional redundancy (and/or performance) in your network connectivity (a.e. NIC Teaming) or you will use features like Hyper-V (CSV, LM) which will bring his own network requirements.

Depending on the used workloads on top of Failover Clustering, the number of required physical NICs can grow fast. In example in Hyper-V Failover Clustering with using Live Migration and ISCSI for VM guests the recommended number is roughly at minimum 4 physical NICs, of course more are required, when using NIC teaming technologies for redundancy and or performance objective.

Here are a few example scenarios and the number of the “minimum recommended” required physical NIC ports per cluster node:

Scenario 1:

Failover Cluster with 2 Nodes and Hyper-V (1 x  Virtual Switch dedicated) in use without LM/CSV

image

=> min. 3 physical NICs are recommended => 2 Cluster Networks are automatically discovered and added to Cluster

Scenario 2:

Failover Cluster with 2 Nodes and Hyper-V (1 x Virtual Switch dedicated) in use with LM/CSV

image

=> min. 4 physical NICs are recommended => 3 Cluster Networks are automatically discovered and added to Cluster

Scenario 3:

Failover Cluster with 2 Nodes and Hyper-V (1 x Virtual Switch dedicated) in use with LM/CSV and ISCSI at host

image

=> min. 5 physical NICs are recommended (see note below for ISCSI) => 4 Cluster Networks are automatically discovered and added to Cluster

Scenario 4:

Failover Cluster with 2 Nodes and Hyper-V (2 x Virtual Switch dedicated) in use with LM/CSV and ISCSI at host and guest

image

=> min. 6 physical NICs are recommended (see note for ISCSI) => 4 Cluster Networks are automatically discovered and added to Cluster

Scenario 5:

Failover Cluster with 2 Nodes and Hyper-V (3 x Virtual Switch dedicated) in use with LM/CSV and ISCSI at host and guest

image

=> min. 7 physical NICs are recommended (see note for ISCSI) => 4 Cluster Networks are automatically discovered and added to Cluster

NOTE: In case of ISCSI it is recommended to have at minimum 2 physical network paths for redundancy (availability) purposes. NIC TEAMING IS NOT SUPPORTED HERE, MPIO or MCS must be used for reliability and availability purposes. As a best practice you should disable “cluster communication” through the ISCSI interfaces!

Of course, now when you use techniques like NIC teaming for networks like “Management, Hyper-V switches, CSV..” the number of required physical NICs will automatically grows.

Generally, the cluster service – “NETFT” network fault tolerant – will automatically discover each network based on their subnet and add it to the cluster as a cluster network. ISCSI networks should be generally disabled for Cluster usage (cluster communication).

Further official guidance around network design in Failover Clustering environments can be found here:

Network in a Failover Cluster
http://technet.microsoft.com/en-us/library/cc773427(WS.10).aspx

Network adapter teaming and server clustering
http://support.microsoft.com/kb/254101

Hyper-V: Live Migration Network Configuration Guide
http://technet.microsoft.com/en-us/library/ff428137(WS.10).aspx

Requirements for Using Cluster Shared Volumes in a Failover Cluster in Windows Server 2008 R2
http://technet.microsoft.com/en-us/library/ff182358(WS.10).aspx

Designating a Preferred Network for Cluster Shared Volumes Communication
http://technet.microsoft.com/en-us/library/ff182335(WS.10).aspx

Appendix A: Failover Cluster Requirements
http://technet.microsoft.com/en-us/library/dd197454(WS.10).aspx

Cluster Network Connectivity Events
http://technet.microsoft.com/en-us/library/dd337811(WS.10).aspx

Understanding Networking with Hyper-V
http://www.microsoft.com/download/en/details.aspx?amp;displaylang=en&displaylang=en&id=9843

Achieving High Availability for Hyper-V
http://technet.microsoft.com/en-us/magazine/2008.10.higha.aspx

Windows Server 2008 Failover Clusters: Networking (Part 1-4)
http://blogs.technet.com/askcore/archive/2010/02/12/windows-server-2008-failover-clusters-networking-part-1.aspx
http://blogs.technet.com/askcore/archive/2010/02/22/windows-server-2008-failover-clusters-networking-part-2.aspx
http://blogs.technet.com/askcore/archive/2010/02/25/windows-server-2008-failover-clusters-networking-part-3.aspx
http://blogs.technet.com/askcore/archive/2010/04/15/windows-server-2008-failover-clusters-networking-part-4.aspx 

Description of what to consider when you deploy Windows Server 2008 failover cluster nodes on different, routed subnets
http://support.microsoft.com/kb/947048

Stay tuned…. Winking smile

Regards

Ramazan

Running Domain Controller on top of Hyper-V and Failover Cluster?

July 15, 2011

Generally this questions is currently a well discussed topic in my customer scenarios therefore I would like to cover the important points when talking about virtualized DCs and especially when Failover Clustering is involved.

Hyper-V:

Mainly the virtualization of DC roles are generally supported if you had understood the caveats. Generally in production environments you should NOT use “snapshot/save state” features for DCs especially in multi-DC deployment but also in single-DC. Reason even for Single-DC environments is that domain members does update their computer password frequently and which doesn’t match anymore when you apply an previous snapshot (please see KB175468 around machine password). Of course, there are some workarounds but from my perspective none of them apply in production environments.

If you read the below articles and you are aware what exactly to overlook, “Yes you can” use this feature in lab scenarios, like you must snapshot all domain members at the same time or reset computer password after applying an earlier DC snapshot. But GENERALLY YOU SHOULD (NEVER) NOT USE SNAPSHOT/SAVE STATE FUNCTION IN PRODUCTION for DC role(s)!

So when running a domain controller within a Hyper-V virtual machine do NOT use:

1. Save states OR,
2. Virtual machine snapshots

In Hyper-V deployments there are some general “considerations” which need to overlooked when deploying virtualized domain controllers, here are some great articles which covers this in detail and gives also some guidelines:

Running Domain Controllers in Hyper-V
http://technet.microsoft.com/en-us/library/virtual_active_directory_domain_controller_virtualization_hyperv(WS.10).aspx

Things to consider when you host Active Directory domain controllers in virtual hosting environments
http://support.microsoft.com/kb/888794/en-us

Considerations when hosting Active Directory domain controller in virtual hosting environments
http://support.microsoft.com/kb/888794/en-us

The Domain Controller Dilemma
http://blogs.msdn.com/b/virtual_pc_guy/archive/2008/11/24/the-domain-controller-dilemma.aspx

Problems with virtual machines and domain membership
http://blogs.msdn.com/b/virtual_pc_guy/archive/2006/03/28/561508.aspx

Hyper-V and Domain Controllers – Demo Tips and Tricks
http://blogs.msdn.com/b/virtual_pc_guy/archive/2009/11/20/hyper-v-and-domain-controllers-demo-tips-and-tricks.aspx

Effects of machine account replication on a domain
http://support.microsoft.com/kb/175468

Running Domain Controllers within Virtual Server 2005
http://www.microsoft.com/downloads/details.aspx?FamilyId=64DB845D-F7A3-4209-8ED2-E261A117FC6B&displaylang=en

Failover Cluster:

Especially in Failover Cluster environments it is a “best practice” and recommended to have at least 1 physical/virtual DC available which is outside of the cluster environment as cluster service does require DC communication before starting cluster service (VCO/CNO).

Checkout the following blog post from my MVP colleague – Lai Yoong Seng MVP Virtual Machine – which discusses arising issues, when putting your DCs on top of Failover Cluster:

http://www.ms4u.info/2011/05/why-you-should-not-running-domain.html

We call this “Henne und Ei Problem” in German where translation has the same sense “Chicken and Egg IssueSmile

Windows 2003 MSCS:

Determining Domain Controller Access for Server Clusters (Windows 2003)
http://technet.microsoft.com/en-us/library/cc779512(WS.10).aspx

Active Directory, DNS and Domain Controllers (Windows 2003)
http://technet.microsoft.com/en-us/library/cc775654(WS.10).aspx

Cluster Networking Requirements (Windows 2003)
http://technet.microsoft.com/es-es/library/cc783193(WS.10).aspx

Stay tuned…. Winking smile

Regards

Ramazan

How to generate and correctly interpret Failover Cluster Log

July 2, 2011

Very often it is required to generate cluster log if the eventlog information’s are not enough to nail down your root cause and for getting a better understanding what is happening under the hood of your failover cluster.

Hereby are some useful articles and techniques for how to generate cluster logs and of course how to correctly read and interpret them:

Anatomy of a Cluster Log Entry
http://technet.microsoft.com/en-us/library/cc962179.aspx

Techniques for Tracking the Source of a Problem
http://technet.microsoft.com/en-us/library/cc962185.aspx

Interpreting the Cluster Log
http://technet.microsoft.com/en-us/library/cc961673.aspx

Cluster Log Basics
http://technet.microsoft.com/en-us/library/cc962184.aspx

Understanding the Cluster Debug Log in 2008
http://blogs.technet.com/b/askcore/archive/2010/04/13/understanding-the-cluster-debug-log-in-2008.aspx

Windows Server 2008 and R2 Cluster Log Appears to Be Missing Gaps of Data
http://blogs.technet.com/b/thbrown/archive/2010/07/31/windows-server-2008-and-r2-cluster-log-and-missing-gaps-of-data.aspx

View Events and Logs for a Failover Cluster
http://technet.microsoft.com/en-us/library/cc772342.aspx

TechNet Webcast: Failover Cluster Validation and Troubleshooting with Windows Server 2008 (Level 300)
https://msevents.microsoft.com/CUI/EventDetail.aspx?culture=en-US&EventID=1032364832&CountryCode=US

Failover Clustering: Pro Troubleshooting in Windows Server 2008 (PPT)
http://media.ch9.ms/teched/na/2011/ppt/WSV309.pptx
http://ecn.channel9.msdn.com/o9/te/NorthAmerica/2010/pptx/WSV314.pptx

How to create the cluster.log in Windows Server 2008 Failover Clustering
http://blogs.msdn.com/b/clustering/archive/2008/09/24/8962934.aspx

DBA 101: Collecting and Interpreting Failover Cluster Logs
http://blogs.msdn.com/b/joesack/archive/2010/02/09/dba-101-collecting-and-interpreting-failover-cluster-logs.aspx

Introduction to Cluster Diagnostics and Verification Tool for Exchange Administrators
http://technet.microsoft.com/en-us/library/aa996161(EXCHG.65).aspx

How to turn on cluster logging in Microsoft Cluster Server (W2K3)
http://support.microsoft.com/kb/168801

The meaning of state codes in the Cluster log
http://support.microsoft.com/kb/286052

Failover Cluster Troubleshooting
http://msdn.microsoft.com/en-us/library/ms189117.aspx

Troubleshooting Cluster Logs 101 – Why did the resources failover to the other node?
http://blogs.technet.com/b/askcore/archive/2008/02/06/troubleshooting-cluster-logs-101-why-did-the-resources-failover-to-the-other-node.aspx

Hope this helps to understand more what is going on under the hood of your failover cluster, especially in root cause analyses (RCA), test and/or proof-of-concept scenarios it is really helpful to be able to read and interpret the cluster logs.

Stay tuned Winking smile

Best Regards

Ramazan

Deploying Microsoft Windows Server 2008 Networking with Cisco

June 6, 2011

Cisco has published an really useful whitepaper around networking deployments with their Cisco gear and does provide great technical details around the new enhanced capabilities of the new networking features from 2008/Vista stack.

Techniques like Teredo, ISATAP, NAT, SMBv2, IPv6…..will be discussed in detail especially the multisite cluster DNS is also covered here:

image

image

image

image

image

image

image

image

Deploying Microsoft Windows Server 2008 and Vista on a Cisco Network
http://www.cisco.com/en/US/docs/solutions/Enterprise/Data_Center/App_Networking/extmsftw2k8vistacisco.pdf

Stay tuned…. Winking smile

Regards

Ramazan

New Hyper-V Performance Blog

June 5, 2011

Tony Voellm has left Microsoft is his previous role as Hyper-V Performance Lead. He had given great insights into Hyper-V performance monitoring and had spread really great and deep technical knowledge on Hyper-V performance objectives on “All topics fundamental” blog:

http://blogs.msdn.com/b/tvoellm/

BUT we can (hopefully) still get his really useful articles on his new blog:

http://perfguy.blogspot.com/

Especially his 1st blog around the question which hypervisor does provide better performance – Hyper-V or VMWare – makes me sure that we will read more in the future from Tony Winking smile

http://perfguy.blogspot.com/2011/03/which-is-faster-windows-hyper-v-or.html

Stay tuned…

Regards

Ramazan