Monitoring, what if….
In this blog I will show you that even though you monitor your servers and services you still might miss events and experience angry users. The blog is called monitoring, what if… because what if you monitor everything but don’t your monitoring tool doesn’t tell you what is related?
In this example we have a simple remote desktop running on a Windows 2008R2 server.
A user will connect to it and start to work, the connection is created by using the name MyDesktop which is an A-Record.
The video recorded shows what is happening at many organizations these days. Monitoring tools are doing a ping to servers on FQDN or IP address and do a check for a service to see if it’s running. Everything works fine, user connect but then at a certain moment a service goes down. Functionality for new users logging on to the network stop working but on the monitoring side only a minor alert is issued stating the stopping of a service (not in the video but in a screenshot later on). See the impact on the users in the video.
The video has showed you that the user can connect fine as long as the DNS server is up and running.
In the video I don’t monitor the DNS service but of course that would have been done also.
The remote desktop server is being monitored, with a ping it’s existence is being verified. A simple powershell command is used to check if the Terminal services service is up and running. If both are okay we assume the server is okay and the users can work just fine.
After we stop the DNS server we get the message the service is stopped.
I haven’t seen many monitoring tools that will give an alert and say “Hey your DNS service is stopped, did you think about the impact on the remote desktop service?“.
Within most monitoring tools all these metrics and events are on their own, so you as a administrator need to understand the chain and realize that DNS down means no new user can reach their desktop.
Can it get worse?
Sure it can, in this example I only show the DNS server that has issues but a chain of components needed to provide users with an application or a desktop is far more complex. DNs, DHCP, AD, File services, Firewalls, hypervisors, storage, networking, saturation etc etc, there are so many things that can go wrong that are not linked to each other in most tools it’s a bit scary to think about.
We put our trust in those tools to alert us and warn us that users are having issues. If we have to make the puzzle yourself first before we know what the impact is were too late, users will be standing near your desk telling you they have issues.
It’s just a short blog to show in a very basic way what could go wrong even though you monitor everything. You monitor everything but the information is not given to you in the correct context and therefore you can act accordingly. Hopefully this short blog has showed you why end 2 end monitoring is important, why we can’t go on with those 90s kind of tooling as we used to use.