Our infrastructure comprises of high availability clusters of different machines, with varied operating systems and applications, spread across multiple continents.
An effective monitoring system is extremely crucial for ensuring maximum uptime. Today, any web services company manages hundreds of servers with a large number of services running on each server. Manually checking each service on just one server 24 x 7 is extremely difficult - across a number of servers - is humanly impossible.
Companies that do not have a good monitoring system, or worse, don't have one at all, have larger downtimes and are increasing the risk of potential damage caused due to service disruptions. An undetected minor issue can change into a major issue rapidly, increasing the amount of damage caused.
Our monitoring systems and tools provide our system administrators with an all-encompassing view into the health of our globally distributed infrastructure. We monitor a large number of parameters related to the health of our servers and individual services that reside on them.