Last updated on April 11, 2024
I’ve been meaning to contribute to open source for some time. I really didn’t have anything particularly in mind to what I wanted to publish. However, I’ve been dealing with varying degree of integration complexity to some of my application environments at the office. Inevitably, someone would scratch their head for several minutes trying to figure out why something failed only to realize that it was due to an external dependency. Needless to say, a simple tool that would provide a holistic view of the system could have saved them some pain. This could then be expanded to other environments: shared development, integration, UAT, QA, and Production. A single page that would provide an overview of the health of the specified environment would be very helpful.
Application Monitoring
There are several things to consider when monitoring an application. This is not meant as an exhaustive list but here are few items:
- application state
- resource utilization
- proactively report application errors
- module/component status
- external dependency status (databases, services, etc.)
The monitor should be proactive and not reactive. It should provide early enough warning before something goes wrong. This includes monitoring dependencies and the application itself. There are tools that monitor logs and report when it detects various errors in the log file. Although this is better than nothing, it’s too reactive. If you’re lucky enough to have nagios in-house, this is better. However, in most cases, they are configured only for Production and, if lucky, QA. However, for other environments, it may be an overkill to use nagios.
Health Monitoring
The purpose of my health monitor tool is to provide a simple framework to quickly setup and actively check the overall health of any application.
Health monitoring tool is built on the following open source frameworks:
- quartz-scheduler – used to schedule recurring checks to verify your site, components/modules, and dependencies
- apache httpcomponents – used to retrieve resources (JSON/HTML) from HTTP/HTTPS servers
- grizzly NIO framework – used as embedded web container to host status page (see above screenshot)
- jackson – used to parse JSON resources and configuration
- freemarker – used for web/email template
- javamail – used to send failure notification
It can proactively check your application and its dependencies with predefined service checks:
- com.jstrgames.monitor.svc.impl.HttpService – make non-SSL web call
- com.jstrgames.monitor.svc.impl.HttpsService – make SSL web call
- com.jstrgames.monitor.svc.impl.SimpleJmxService – make simple remote JMX calls
- com.jstrgames.monitor.svc.impl.SocketService – make a client socket call
It can then verify these services with predefined rules:
- com.jstrgames.monitor.rule.HttpResponseCode – check if expected status code returned
- com.jstrgames.monitor.rule.HttpResponseBody – check if body contains expected values
- com.jstrgames.monitor.rule.JsonResponse – check if specified JSON key has expected value
- com.jstrgames.monitor.rule.SimpleJmxResult – check if specified JMX attribute has expected value
All can be configured on a simple JSON format configuration file.
Example: Checking HTTPS site
The use case is to verify that the text “Google Search” and “I’m Feeling Lucky” appears on the google.com page every 5 minutes:
[{"servicename": "HTTPS Example (SSL)",
"classname": "com.jstrgames.monitor.svc.impl.HttpsService",
"schedule": "0 0/5 * * * ?",
"hostname": "www.google.com",
"port": 443,
"uri": "/",
"rules": [{
"classname": "com.jstrgames.monitor.rule.HttpResponseCode",
"condition": "equals",
"expected": 200
},{
"classname": "com.jstrgames.monitor.rule.HttpResponseBody",
"condition": "contains",
"expected": "Google Search"
},{
"classname": "com.jstrgames.monitor.rule.HttpResponseBody",
"condition": "contains",
"expected": "I'm Feeling Lucky"
}]
}]
Here, class com.jstrgames.monitor.svc.impl.HttpsService is used check www.google.com at port 443. Once the resource has been retrieved, 3 rules are executed:
- verify http response code is 200
- verify http response body contains the phrase “Google Search”
- verify http response body contains the phrase “I’m Feeling Lucky”
If any of the above checks fails, service is considered to be in “FAIL” status. If it fails to connect to said service, it is in “ERROR” status. It will also send out a notification and user will see a similar content in their inbox, granted, not as pretty.
For more details, please go to my github repository. I will be updating its wiki page soon.