Thoughts From A Management Platform Developer

Wednesday, September 20, 2017

Prometheus JMX Exporter Configuration For WildFly 10

I'm currently looking at WildFly JMX MBeans, particular for exposing their attributes as metrics via the Prometheus JMX Exporter.

Here's the JMX Exporter configuration I've tentatively come up with. Let me know in the comments if I've forgotten something or did something wrong.

In short, this JMX Exporter config should collect the metrics for:

* Deployment (EARs)
* Subdeployment (WARs inside of EARs)
* JMS Topics and Queues
* Datasources
* Transactions
* Web Subsystem (i.e. Undertow)
* Servlets
* EJBs (Stateless, Stateful, Singleton, and Message Driven)

It will also store server paths under the "wildfly_config" metric family with the label name identifying the path such as "jboss_home_dir" or "jboss_server_config_dir" and the label value being the path itself.

And of course the default behavior of the JMX Exporter is to also emit metrics for the JVM statistics as well (such as heap and non-heap memory usage) so we get those "for free".

The config is as follows:


---
lowercaseOutputName: true
lowercaseOutputLabelNames: true
rules:

  # CONFIG

  - pattern: "^jboss.as<path=(.+)><>path:(.+)"
    attrNameSnakeCase: true
    name: wildfly_config
    labels:
      $1: $2
    value: 1

  # DEPLOYMENTS

  - pattern: "^jboss.as<deployment=(.+), subdeployment=(.+), subsystem=undertow><>(.+_sessions|session_.+):"
    attrNameSnakeCase: true
    name: wildfly_deployment_$3
    type: GAUGE
    labels:
      name: $1
      subdeployment: $2

  - pattern: "^jboss.as<deployment=(.+), subsystem=undertow><>(.+_sessions|session_.+):"
    attrNameSnakeCase: true
    name: wildfly_deployment_$2
    type: GAUGE
    labels:
      name: $1

  # MESSAGING

  - pattern: "^jboss.as<subsystem=messaging-activemq, server=(.+), jms-(queue|topic)=(.+)><>(.+):"
    attrNameSnakeCase: true
    name: wildfly_messaging_$4
    type: GAUGE
    labels:
      server: $1
      $2: $3

  # DATASOURCES

  - pattern: "^jboss.as<subsystem=datasources, (?:xa-)*data-source=(.+), statistics=(.+)><>(.+):"
    attrNameSnakeCase: true
    name: wildfly_datasource_$2_$3
    type: GAUGE
    labels:
      name: $1

  # TRANSACTIONS

  - pattern: "^jboss.as<subsystem=transactions><>number_of_(.+):"
    attrNameSnakeCase: true
    name: wildfly_transactions_$1
    type: GAUGE

  # WEB SUBSYSTEM

  - pattern: "^jboss.as<subsystem=undertow, server=(.+), (https?)-listener=(.+)><>(bytes_.+|error_count|processing_time|request_count):"
    attrNameSnakeCase: true
    name: wildfly_undertow_$4
    type: GAUGE
    labels:
      server: $1
      listener: $3
      protocol: $2

  # SERVLET

  - pattern: "^jboss.as<deployment=(.+), subdeployment=(.+), subsystem=undertow, servlet=(.+)><>(.+_time|.+_count):"
    attrNameSnakeCase: true
    name: wildfly_servlet_$4
    type: GAUGE
    labels:
      name: $3
      deployment: $1
      subdeployment: $2

  - pattern: "^jboss.as<deployment=(.+), subsystem=undertow, servlet=(.+)><>(.+_time|.+_count):"
    attrNameSnakeCase: true
    name: wildfly_servlet_$3
    type: GAUGE
    labels:
      name: $2
      deployment: $1

  # EJB

  - pattern: "^jboss.as<deployment=(.+), subdeployment=(.+), subsystem=ejb3, (stateless-session|stateful-session|message-driven|singleton)-bean=(.+)><>(.+):"
    attrNameSnakeCase: true
    name: wildfly_ejb_$5
    type: GAUGE
    labels:
      name: $4
      type: $3
      deployment: $1
      subdeployment: $2

  - pattern: "^jboss.as<deployment=(.+), subsystem=ejb3, (stateless-session|stateful-session|message-driven|singleton)-bean=(.+)><>(.+):"
    attrNameSnakeCase: true
    name: wildfly_ejb_$4
    type: GAUGE
    labels:
      name: $3
      type: $2
      deployment: $1

Friday, September 1, 2017

Hawkular Alerts with Jaeger / OpenTracing

Two recent blogs discuss how OpenTracing instrumentation can be used to collect application metrics:

A further interesting integration can be the addition of Hawkular Alerts to the environment.

As the previous blog and demo discuss, Hawkular Alerts is a generic, federated alerts system that can trigger events, alerts, and notifications from different, independent systems such as Prometheus, ElasticSearch, and Kafka.

Here we can combine the two. Let's follow the directions for the OpenTracing demo (using the Jaeger implementation) and add Hawkular Alerts.

What this can show is OpenTracing application metrics triggering alerts when (as in this example) OpenTracing spans encounter a larger-than-expected error rates.

(Note: these instructions assume you are using Kubernetes / Minikube - see the Hawkular OpenTracing blogs linked above for more details on these instructions)

1. START KUBERNETES

Here we start minikube giving it enough resources to run all of the pods necessary for this demo. We also start up a browser pointing to the Kubernetes dashboard, so you can follow the progress of the remaining instructions.

minikube start --cpus 4 --memory 8192
minikube dashboard

2. DEPLOY PROMETHEUS

kubectl create -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/bundle.yaml
kubectl create -f https://raw.githubusercontent.com/objectiser/opentracing-prometheus-example/master/prometheus-kubernetes.yml

(Note: The above command might not work depending on your version - if you get errors, download a copy of prometheus-kubernetes.yml and edit it, changing “v1alpha1” to “v1”)

3. DEPLOY JAEGER

kubectl create -f https://raw.githubusercontent.com/jaegertracing/jaeger-kubernetes/master/all-in-one/jaeger-all-in-one-template.yml

The following will build and deploy the Jaeger example code that will produce the OpenTracing data for the demo:

mkdir -p ${HOME}/opentracing ; cd ${HOME}/opentracing
git clone git@github.com:objectiser/opentracing-prometheus-example.git
cd opentracing-prometheus-example/simple
eval $(minikube docker-env)
mvn clean install docker:build
kubectl create -f services-kubernetes.yml

(Note: The above command might not work depending on your version - if you get errors, edit services-kubernetes.yml and edit it, changing “v1alpha1” to “v1”)

4. DEPLOY HAWKULAR-ALERTS AND CREATE ALERT TRIGGER

The following will deploy Hawkular Alerts and create the trigger definition that will trigger an alert when the Jaeger OpenTracing data indicates an error rate that is over 30%

kubectl create -f https://raw.githubusercontent.com/hawkular/hawkular-alerts/master/dist/hawkular-alerts-k8s.yml
Use “minikube service hawkular-alerts --url” to determine the Hawkular Alerts URL and point your browser to the path “/hawkular/alerts/ui” at that URL (i.e. http://host:port/hawkular/alerts/ui).
From the browser page running the Hawkular Alerts UI, enter a tenant name in the top right text field (“my-organization” for example) and click the “Change” button.
Navigate to the “Triggers” page (found in the left-hand nav menu).
Click the kabob menu icon at the top and select “New Trigger”.
In the text area, enter the following to define a new trigger that will trigger alerts when the Prometheus query shows that there is a 30% error rate or greater in the accountmgr or ordermgr servers:

{
   "trigger":{
      "id":"jaeger-prom-trigger",
      "name":"High Error Rate",
      "description":"Data indicates high error rate",
      "severity":"HIGH",
      "enabled":true,
      "autoDisable":false,
      "tags":{
         "prometheus":"Test"
      },
      "context":{
         "prometheus.url":"http://prometheus:9090"
      }
   },
   "conditions":[
      {
         "type":"EXTERNAL",
         "alerterId":"prometheus",
         "dataId":"prometheus-test",
         "expression":"(sum(increase(span_count{error=\"true\",span_kind=\"server\"}[1m])) without (pod,instance,job,namespace,endpoint,transaction,error,operation,span_kind) / sum(increase(span_count{span_kind=\"server\"}[1m])) without (pod,instance,job,namespace,endpoint,transaction,error,operation,span_kind)) > 0.3"
      }
   ]
}

Now navigate back to the “Dashboard” page (again via the left-hand nav menu). From this Dashboard page, look for alerts when they are triggered. We'll next start generating the data that will trigger these alerts.

5. GENERATE SOME SAMPLE OPEN TRACING APPLICATION DATA

export ORDERMGR=$(minikube service ordermgr --url)
${HOME}/opentracing/opentracing-prometheus-example/simple/genorders.sh

Once the data starts to be collected, you will see alerts in the Hawkular Alerts UI as error rates become over 30% in the past minute (as per the Prometheus query).

If you look at the alerts information in the Hawkular Alerts UI, you’ll see the conditions that triggered the alerts, such as:

Time: 2017-09-01 17:41:17 -0400
External[prometheus]: prometheus-test[Event [tenantId=my-organization, id=1a81471d-340d-4dba-abe9-5b991326dc80, ctime=1504302077288, category=prometheus, dataId=prometheus-test, dataSource=_none_, text=[1.504302077286E9, 0.3333333333333333], context={service=ordermgr, version=0.0.1}, tags={}, trigger=null]] matches [(sum(increase(span_count{error="true",span_kind="server"}[1m])) without (pod,instance,job,namespace,endpoint,transaction,error,operation,span_kind) / sum(increase(span_count{span_kind="server"}[1m])) without (pod,instance,job,namespace,endpoint,transaction,error,operation,span_kind)) > 0.3]

Notice the “ordermgr” service (version "0.0.1") had an error rate of 0.3333 (33%) which caused the alert since it is above the allowed 30% threshold.

At this point, the Hawkular Alerts UI provides the ability for system admins to log notes about the issue, acknowledge the alert and mark the alert resolved if the underlying issue has been fixed. These lifecycle functions (also available as REST operations) are just part of the value add of Hawkular-Alerts.

You could do more complex things such as only trigger this alert if this Prometheus query generated results AND some other condition was true (say, ElasticSearch logs match a particular pattern, or if a Kafka topic had certain data). This demo merely scratches the surface, but does show how Hawkular Alerts can be used to work with OpenTracing to provide additional capabilities that may be found useful by system administrators and IT support personnel.

Friday, August 11, 2017

Hawkular Alerts with Prometheus, ElasticSearch, Kafka

Federated Alerts

Hawkular Alerts aims to be a federated alerting system. That is to say, it can fire alerts and send notifications that are triggered by data coming from a number of third-party external systems.

Thus, Hawkular Alerts is more than just an alerting system for use with Hawkular Metrics. In fact, Hawkular Alerts can be used independently of Hawkular Metrics. This means you do not even have to be using Hawkular Metrics to take advantage of the functionality provided by Hawkular Alerts.

This is a key differentiator between Hawkular Alerts and other alerting systems. Most alerting systems only alert on data coming from their respective storage systems (e.g. the Prometheus Alert Engine alerts only on Prometheus data). Hawkular Alerts, on the other hand, can trigger alerts based on data from various systems.

Alerts vs. Events

Before we begin, a quick clarification is in order. When it is said that Hawkular Alerts fires an "alert" it means some data came into Hawkular Alerts that matched some conditions which triggered the creation of an alert in Hawkular Alerts backend storage (which can then trigger additional actions such as sending emails or calling a webhook). An "alert" typically refers to a problem that has been detected, and someone should take action to fix it. An alert has a lifecycle attached to it - alerts are opened, then acknowledged by some user who will hopefully fix the problem, then resolved when the problem can be considered closed.

However, there can be conditions that occur that do not represent problems but nevertheless are events you want recorded. There is no lifecycle associated with events and no additional actions are triggered by events, but "events" are fired by Hawkular Alerts in the same general manner as "alerts" are.

In this document, when it is said that Hawkular Alerts can fire "alerts" based on data coming from external third-party systems such as Prometheus, ElasticSearch, and Kakfa, this also means events can be fired as well as alerts. What this means is you can record any event (not just a "problem", aka "alert") that can be gleaned from this data coming from external third-party systems.

See alerting philosophy for more.

Demo

There is a recorded demo found here that will illustrate what this document is describing. After you read this document, you should watch the demo to gain further clarity on what is being explained. The demo is the multiple-sources example which you can run yourself found here (note: at the time of writing, this example is only found in the next branch, to be merged in master soon).

Prometheus

Hawkular Alerts can take the results of Prometheus metric queries and use the queried data for triggers that can fire alerts.

This Hawkular Alerts trigger will fire an alert (and send an email) when a Prometheus metric indicates our store’s inventory of widgets is consistently low (as defined by the Prometheus query you see in the "expression" field of the condition):

"trigger":{
   "id": "low-stock-prometheus-trigger",
   "name": "Low Stock",
   "description": "The number of widgets in stock is consistently low.",
   "severity": "MEDIUM",
   "enabled": true,
   "tags": {
      "prometheus": "Prometheus"
   },
   "actions":[
      {
      "actionPlugin": "email",
      "actionId": "email-notify-owner"
      }
   ]
},
"conditions":[
   {
      "type": "EXTERNAL",
      "alerterId": "prometheus",
      "dataId": "prometheus-dataid",
      "expression": "rate(products_in_inventory{product=\"widget\"}[30s])<2 class="pl-pds" span="" style="box-sizing: border-box; color: #032f62;">"
   }
 ]

Integration with Prometheus Alert Engine

As a side note, though not demostrated in the example, Hawkular Alerts also has an integration with Prometheus' own Alert Engine. This means the alerts generated by Prometheus itself can be forward to Hawkular Alerts which can, in turn, be used for additional processing, perhaps for use with data that is unavailable to Prometheus that can tell Hawkular Alerts to fire other alerts. For example, Hawkular Alerts can take Prometheus alerts as input and feed it back into other conditions that trigger on the Prometheus alert along with ElasticSearch logs.

ElasticSearch

Hawkular Alerts can examine logs stored in ElasticSearch and trigger alerts based on patterns that match within the ElasticSearch log messages.

This Hawkular Alerts trigger will fire an alert (and send an email) when ElasticSearch logs indicate sales are being lost due to inventory being out of stock of items (as defined by the condition which looks for a log category of "FATAL" which happens to mean a lost sale in the case of the store’s logs). Notice dampening is enabled on this trigger - this alert will only fire when the logs indicate lost sales every 3 times.

"trigger":{
   "id": "lost-sale-elasticsearch-trigger",
   "name": "Lost Sale",
   "description": "A sale was lost due to inventory out of stock.",
   "severity": "CRITICAL",
   "enabled": true,
   "tags": {
      "Elasticsearch": "Localhost instance"
   },
   "context": {
      "timestamp": "@timestamp",
      "filter": "{\"match\":{\"category\":\"inventory\"}}",
      "interval": "10s",
      "index": "store",
      "mapping": "level:category,@timestamp:ctime,message:text,category:dataId,index:tags"
   },
   "actions":[
      {
      "actionPlugin": "email",
      "actionId": "email-notify-owner"
      }
   ]
},
"dampenings": [
   {
      "triggerMode": "FIRING",
      "type":"STRICT",
      "evalTrueSetting": 3
   }
],
"conditions":[
   {
      "type": "EVENT",
      "dataId": "inventory",
      "expression": "category == 'FATAL'"
   }
]

Kafka

Hawkular Alerts can examine data retrieved from Kafka message streams and trigger alerts based that Kafka data.

This Hawkular Alerts trigger will fire an alert when data over a Kakfa topic indicates a large purchase was made to fill the store’s inventory (as defined by the condition which evaluates to true when any number over 17 is received on the Kafka topic):

"trigger":{
   "id": "large-inventory-purchase-kafka-trigger",
   "name": "Large Inventory Purchase",
   "description": "A large purchase was made to restock inventory.",
   "severity": "LOW",
   "enabled": true,
   "tags": {
      "Kafka": "Localhost instance"
   },
   "context": {
      "topic": "store",
      "kafka.bootstrap.servers": "localhost:9092",
      "kafka.group.id": "hawkular-alerting"
   },
   "actions":[ ]
},
"conditions":[
   {
      "type": "THRESHOLD",
      "dataId": "store",
      "operator": "GT",
      "threshold": 17
   }
]

But, Wait! There’s More!

The above only mentions the different ways Hawkular Metrics retrieves data for use in determining what alerts to fire. What is not covered here is the fact that Hawkular Alerts can stream data in the other direction as well - Hawkular Alerts can send alert and event data to things like an ElasticSearch server or a Kafka broker. There are additional examples (mentioned below) that can demonstrate this capability.

The point is Hawkular Alerts should be seen as a shared, common alerting engine that can be shared for use by multiple third-party systems and can be used as both a consumer and producer - as a consumer of the data from external third-party systems (which is used to fire alerts and events) and as a producer to send notifications of alerts and events to external third-party systems.

More Examples

Take a look at the Hawkular Alerts examples for more examples on using external systems as data to be used for triggering alerts. (note: at the time of writing, some examples are currently in the next branch such as the Kafka ones).

Tuesday, August 1, 2017

Hawkular Alerts 2.0 UI WIP

Hawkular Alerts 2.0 UI

A quick 10-minute demo has been published to illustrate the progress that was made by the hAlerts team on the new UI.

This is a work-in-progress, and things will change, but the UI is actually functional now.

It is best the video is viewed in full screen mode. The video link is: https://www.youtube.com/watch?v=bb9SaJudPlU

Monday, November 21, 2016

Hawkular OpenShift Demo - Running Outside OpenShift

Below is a quick 8 minute demo of the Hawkular OpenShift Agent.

For more information, see: https://github.com/hawkular/hawkular-openshift-agent

Monday, November 14, 2016

Hawkular OpenShift Agent - First Demo

Below is a quick 10 minute demo of the Hawkular OpenShift Agent.

For more information, see: https://github.com/hawkular/hawkular-openshift-agent

Thursday, October 20, 2016

Hawkular OpenShift Agent is Born

A new Hawkular agent has been published on github.com - Hawkular OpenShift Agent.

It is implemented in Go and the main use case for which it was created is to be able to collect metrics from OpenShift pods. The idea is you run Hawkular OpenShift Agent (HOSA) on an OpenShift node and HOSA will listen for pods to come up and down on the node. As pods come online, the pods will tell the agent what (if any) metrics should be collected. As pods go down, the agent will stop collecting metrics from all endpoints running on that pod.

Today, only Prometheus endpoints (using either the binary or text protocol) can be scraped with Jolokia endpoints next on the list to be implemented. So HOSA will be able to support collecting metrics from either type of endpoint in the near future.

For more information - how to build and configure it - refer to the Hawkular OpenShift Agent README.

Monday, October 17, 2016

Pulling in a Go Dependency From a Fork, Branch, or Github PR using Glide

While writing a Go app, I decided to use Glide as the dependency management system (I tried Godep first, but even on the first day of using it, my dependencies were getting screwed up, lost, builds would mysteriously break - so I decided to switch to Glide, which seems much better).

I was using the Hawkular Go Client library because I needed to write metric data to Hawkular Metrics. So in my glide.yaml, I had this:

- package: github.com/hawkular/hawkular-client-go
subpackages:
- metrics

Which simply tells Glide that I want to use the latest master of the client library (I'm not using a versioned library yet. I guess I should start doing that).

Anyway, I needed to add a feature to the Hawkular Go Client. So I forked the git repository, created a branch in my fork where I implemented the new feature, and submitted a Github pull request from my own branch. Rather than wait for the PR to be merged, I wanted Glide to pull in my own branch in my forked repo so I could immediately begin using the new feature. It was as simple as adding three lines to my glide.yaml and running "glide update":

- package: github.com/hawkular/hawkular-client-go
repo: git@github.com:jmazzitelli/hawkular-client-go.git
vcs: git
ref: issue-8
subpackages:
- metrics

This tells Glide that the Hawkular Go client package is now located at a different repository (my fork located at github.com) under a branch called "issue-8".

Running "glide update" pulled in the Hawkuar Go client from my fork's branch and placed it in my vendors/ directory. I can now start using my new feature in my Go app without waiting for the PR to be merged. Once the PR is merged, I can remove those three lines, "glide update" again, and things should be back to normal.

Wednesday, October 5, 2016

Installing Open Shift Origin and Go for a Development Environment

The Hawkular team is doing some work to develop a Hawkular Agent for running within an Open Shift environment. Since the Go Programming Language ("Go") seems to be the language de jour and many things within the Open Shift infrastructure is developed in Go, the first attempt at this new Hawkular Agent will also be implemented in Go.

Because of this, I needed to get a development environment up and running that included both Open Shift and Go. This blog is simply my notes on how I did this.

INSTALL "GO"

Go isn’t necessary to install and run Open Shift, but because I want to write a Go application, I need it.

INSTALL CORE GO SYSTEM

* Create the directory where Go is to be installed

   mkdir $HOME/bin/go-install (this is where Go will be installed)

* Create the Go workspace (GOPATH will end up pointing to here)

   mkdir $HOME/source/go
   mkdir $HOME/source/go/bin
   mkdir $HOME/source/go/pkg
   mkdir $HOME/source/go/src

* Download go package from https://golang.org/dl/

   cd /tmp
   wget https://storage.googleapis.com/golang/go1.7.1.linux-amd64.tar.gz

* Unpack the tar in the $HOME/bin/go-install - should end up inside a subdirectory "go"

   cd $HOME/bin/go-install
   tar xzvf /tmp/go*.tar.gz

* Set up your shell environment so you can run Go
** Put the following Inside .bashrc

   export GOROOT=${HOME}/bin/go-install/go
   export PATH=${PATH}:${GOROOT}/bin
   export GOPATH=${HOME}/source/go

* Make sure Go is working before going on - run “go version” to confirm. I like to log completely out and then back in again so my bash script gets loaded for all my shells.

INSTALL ADDITIONAL GO TOOLS

* Download and install guru

   cd $HOME/bin/go-install/go/bin
   go get golang.org/x/tools/cmd/guru
   go build golang.org/x/tools/cmd/guru

** The above puts the "guru" executable in your current directory (which should be $HOME/bin/go-install/go/bin)

* Download and install gocode

   cd $HOME/bin/go-install/go/bin
   go get github.com/nsf/gocode
   go build github.com/nsf/gocode

** The above puts the "gocode" executable in your current directory (which should be $HOME/bin/go-install/go/bin)

* Download and install godef

   cd $HOME/bin/go-install/go/bin
   go get github.com/rogpeppe/godef
   go build github.com/rogpeppe/godef

** The above puts the "godef" executable in your current directory (which should be $HOME/bin/go-install/go/bin)

* Download and install godep

   cd $HOME/bin/go-install/go/bin
   go get github.com/tools/godep
   go build github.com/tools/godep

** The above puts the "godep" executable in your current directory (which should be $HOME/bin/go-install/go/bin)

* If you use Eclipse, install GoClipse:
** Plugin Site: http://goclipse.github.io/releases/
** Make sure the GoClipse configuration knows where your Go installation is along with your guru, gocode, and godef executables. See the Go configuration settings in Eclipse preferences.

INSTALL OPEN SHIFT

These instructions will install Open Shift in a virtual machine. So you need to get Vagrant and VirtualBox first before doing anything. And of course Vagrant needs Ruby 2 so you need that before anything. Then you install the Open Shift image.

Note that you may have to change your BIOS settings to enable virtualization. To avoid a VERR_VMX_MSR_ALL_VMX_DISABLED error, I had to do this on my Lenovo laptop in the Security->Virtualization section of the BIOS settings.

These notes follow the instructions found at https://www.openshift.org/vm

(October 5, 2016: Note that the instructions there tell you to not use Vagrant 1.8.5 nor VirtualBox 5.1 - unfortunately, they tell you this all the way at the bottom after the instructions. So if you start at the top and work your way down, you will be frustrated beyond belief until you decide to skip all the way to the bottom and realize this. I emailed the folks maintaining that page to put those notices at the top so people can see these warnings before they start downloading the wrong stuff.)

INSTALL RUBY

If you already have Ruby 2+ installed, you can skip this. This will install RVM and then use that to install Ruby 2. If you skip this step, make sure you have a Ruby 2 installation available before going on to installing and running Vagrant.

* Download and install RVM and Ruby
** You need to accept the GPG key for RVM and install RVM

   gpg2 --keyserver hkp://keys.gnupg.net --recv-keys 409B6B1796C275462A1703113804BB82D39DC0E3
   curl -L https://get.rvm.io | bash -s stable
   rvm autolibs packages

** Now install Ruby 2

   rvm install 2.3.0
   rvm use --default 2.3.0
   ruby -v

INSTALL VAGRANT

* Download and install Vagrant .rpm
** https://releases.hashicorp.com/vagrant/

   cd /tmp
   wget https://releases.hashicorp.com/vagrant/1.8.4/vagrant_1.8.4_x86_64.rpm
   rpm -i vagrant_*.rpm
   vagrant version

INSTALL VIRTUAL BOX

* Download and install Virtual Box
** https://www.virtualbox.org/wiki/Downloads
** https://www.virtualbox.org/wiki/Download_Old_Builds
** You must grab it from the Oracle yum repo - we first must add that repo to our system:

   sudo wget -P /etc/yum.repos.d http://download.virtualbox.org/virtualbox/rpm/fedora/virtualbox.repo

** If anything goes wrong during the install, you'll want to update your system and reboot - so you may want to do this now:

    dnf update

** Make sure the kernel version is expected - these should output the same version string - otherwise, reboot

   rpm -qa kernel | sort -V | tail -n 1

   uname -r

** Install VirtualBox now

   dnf install VirtualBox-5.0

I ran into some problems when I tried to run VirtualBox before I realized I needed to download VirtualBox from the Oracle yum repo. If you download from, say, RPMFusion, these instructions may not work (they did not for me). If you run into problems, you might have to install some additional packages via dnf (such as kernel-devel, kernel-headers, dkms) and perform some additional magic which I do not know which explains why I received a Dreadful grade on my O.W.L. exam.

INSTALL THE OPEN SHIFT IMAGE

* Go to an empty directory where you want to prepare your Vagrantfile

   mkdir ${HOME}/openshift

   cd ${HOME}/openshift

* Create a Vagrant file that initializes the OpenShift image.

   vagrant init openshift/origin-all-in-one

* Run Open Shift inside a VM within VirtualBox

   vagrant up --provider=virtualbox

At this point you should have an OpenShift environment running. If any errors occur, a log message should tell you how you can proceed.

You will want to download the Open Shift command line client "oc". Rather than regurgitate the instructions here, simply log into Open Shift using the default "admin" user (password "admin") and go to https://10.2.2.2:8443/console/command-line and follow the instructions there to download, install, and use "oc".

Note that you can SSH into your VM by executing "vagrant ssh"

The Open Shift self-signed certificate found at "/var/lib/origin/openshift.local.config/master/ca.crt" can be used to authenticate clients.

UPGRADE OPEN SHIFT

Should you wish to upgrade the Open Shift VM in the future, the following steps should do it.

* Go to the directory where you ran the VM:

   cd ${HOME}/openshift

* Update the image

   vagrant box update --box openshift/origin-all-in-one

* Destroy and re-create the VM environment

   vagrant destroy --force
   vagrant up --provider=virtualbox

REMOVE OPEN SHIFT

Should you wish to remove the Open Shift VM in the future, the following steps should do it.

* Go to the directory where your Vagrantfile is

   cd ${HOME}/openshift

* Remove Open Shift VM

   vagrant halt
   vagrant destroy --force
   vagrant box remove --force openshift/origin-all-in-one

Tuesday, July 12, 2016

Collecting Prometheus Data and Storing in Hawkular

My last blog entry talked about how to collect JMX data via Jolokia and store that data in Hawkular. Another relatively unknown feature similar to that, which I will now describe in this blog entry, is the ability to collect Prometheus metric data and store that in Hawkular.

Prometheus is itself a metric collection and storage system. However, Prometheus data endpoints (that is, endpoints that emit Prometheus metric data) follow a specific format for the emitted metric data. The Hawkular WildFly Agent has the ability to parse this Prometheus metric data format and push that metric data for storage into Hawkular which can then, of course, be used by Hawkular and its clients (for the purposes of graphing the metric data, alerting on the metric data, etc.).

I will explain how you can quickly get Hawkular to collect metric data from any Prometheus endpoint and store that data into Hawkular.

First, you need a Prometheus endpoint that emits metric data! For this simple demo, I simply ran the Prometheus Server which itself emits metrics. I did this via docker:

1. Make sure your docker daemon is running: sudo docker daemon
2. Run the Prometheus Server docker image: sudo docker run -p 9090:9090 prom/prometheus

That's it! You now have a Prometheus Server running and listening on port 9090. To see the metrics it, itself, emits, go to http://localhost:9090/metrics. We'll be asking the Hawkular WildFly Agent to collect these metrics and push them into a Hawkular Server.

Second, you need to run a Hawkular Server. I won't go into details here on how to do that. Suffice it to say, either build or download a Hawkular Server distribution and run it (if it is not a developer build, make sure you run your external Cassandra Server prior to starting your Hawkular Server - e.g. sudo docker run cassandra -p 9042:9042).

Now you want to run a Hawkular WildFly Agent to collect some of that Prometheus metric data and store it in the Hawkular Server. In this demo, I'll be running the Swarm Agent, which is simply a Hawkular WildFly Agent packaged in a single jar that you can run as a standalone app. However, its agent subsystem configuration is the same as if you were running the agent in a full WildFly Server so the configuration I'll be describing can be used no matter how you have deployed your agent.

Rather than rely on the default configuration file that comes with the Swarm Agent I extracted the default configuration file and edited it as I describe below.

I deleted all the "-dmr" related configuration settings (metric-dmr, resource-type-dmr, remote-dmr, etc). I want my agent to only collect data from my Prometheus endpoint, so no need to define all these DMR metadata. (NOTE: the Swarm Agent configuration file does already, out-of-box, support Prometheus. I will skip that for this demo - I want to explicitly explain the parts of the agent configuration that is needed to be in the configuration file).

The Prometheus portion of the agent configuration is very small. Here it is:

<managed-servers>
<remote-prometheus name="My Prometheus Endpoint"
                     enabled="true"
                     interval="30"
                     time-units="seconds"
                     metric-tags="feed=%FeedId"
                     metric-id-template="%FeedId_%MetricName"
                     url="http://127.0.0.1:9090/metrics"/>
</managed-servers>

That's it. <remote-prometheus> tells the agent where the Prometheus endpoint is and how often to pull the metric data from it. Every metric emitted by that Prometheus endpoint will be collected and stored in Hawkular.

Notice I can associate my managed server with metric tags (remote-prometheus is one type of "managed server"). For every metric that is collected for this remote-prometheus managed server, those tags will be added to those metrics in the Hawkular Server (specifically in the Hawkular Metrics component). All metrics will have these same tags. In addition, any labels associated with the emitted Prometheus metrics (Prometheus metric data can have name/value pairs associated with them - Prometheus calls these labels) will be added as Hawkular tags. Similarly, the ID used to store the metrics in the Hawkular Metric component can also be customized. Both metric-tags and metric-id-template are optional. You can also place those attributes on individual metric definitions (which I describe below) which is most useful if you have specific tags you want to add only to metrics of a specific metric type but not on all of the metrics collected for your managed server.

If you wish to have the agent only collect a subset of the metrics emitted by that endpoint, then you must tell the agent which metrics you want collected. You do this via metric sets:

<metric-set-prometheus name="My Prometheus Metrics">
<metric-prometheus name="http_requests_total" />
<metric-prometheus name="go_memstats_heap_alloc_bytes" />
<metric-prometheus name="process_resident_memory_bytes" />
</metric-set-prometheus>

Once you defined your metric-set-prometheus entries, you specify them in the metric-sets attribute on your <remote-prometheus> element (e.g. metric-sets="My Prometheus Metrics").

OK, now that I've got my Swarm Agent configuration in place (call it "agent.xml" for simplicity), I can run the agent and point it to my configuration file:

java -jar hawkular-swarm-agent-dist-*-swarm.jar agent.xml

At this point I have my agent running along side of my Hawkular Server and the Prometheus Server which is the source of our metric data. The agent is collecting information from the Prometheus endpoint and pushing the collected data to Hawkular Server.

In order to visualize this collected data, I'm using the experimental developer tool HawkFX. This is simply a browser that let's you see what data is in Hawkular Inventory as well as Hawkular Metrics. When I log in, I can see all the metric data that comes directly from the Prometheus endpoint.

I can select a metric to see its graph:

If I were to have configured my agent to only collect a subset of metrics (as I had shown earlier), I would see only those metrics that I asked to collect - all the other metrics emitted by the Prometheus endpoint are ignored:

What this all shows is that you can use Hawkular WildFly Agent to collect metric data from a Prometheus endpoint and store that data inside Hawkular.

Collecting JMX Data and Storing in Hawkular

The Hawkular WildFly Agent has the ability to not only monitor WildFly Servers but also JMX MBean Servers via Jolokia (and also Prometheus endpoints, but that's for another blog - let's focus on Jolokia-enabled JMX MBean Servers for now).

What this means is if you have a Jolokia-enabled server, you can collect JMX data from it and store that data in Hawkular. This includes both metric data as well as resource information.

This blog will attempt to quickly show how this is done.

First, you need a Jolokia-enabled server! For this demo, here's the quick steps I did to get this running on my box:

1. I downloaded WildFly 10.0.0.Final and unzipped it.
2. I downloaded the latest Jolokia .war file and copied it to my new WildFly Server's standalone/deployments directory
3. I started my WildFly Server and bound it to some IP address that I dedicated to it (in this case, it was simply a loopback IP that I dedicated to this WildFly Server):

bin/standalone.sh -b 127.0.0.5 -bmanagement=127.0.0.5

At this point, I now have a server with some JMX data exposed over the Jolokia endpoint.

Second, I need to run a Hawkular Server. I won't go into details here on how to do that. Suffice it to say, either build or download a Hawkular Server distribution and run it (if it is not a developer build, make sure you run your external Cassandra Server prior to starting your Hawkular Server - e.g. sudo docker run cassandra -p 9042:9042).

Now I want to run a Hawkular WildFly Agent to collect some of that JMX data from my WildFly Server and store it in Hawkular Server. In this demo, I'll be running the Swarm Agent, which is simply a Hawkular WildFly Agent packaged in a single jar that you can run as a standalone app. However, its agent subsystem configuration is the same as if you were running the agent in a full WildFly Server so the configuration I'll be describing can be used no matter how you have deployed your agent.

Rather than rely on the default configuration file that comes with the Swarm Agent (which is designed to collect and store data from a WildFly Server's DMR management endpoint, not Jolokia) I extracted the default configuration file and edited it as I describe below.

I deleted all the "-dmr" related configuration settings (metric-dmr, resource-type-dmr, remote-dmr, etc). I want my agent to only collect data from my Jolokia endpoint, so no need to define all these DMR metadata.

I then added metadata that describes the JMX data I want to collect. For example, I collect availability metrics (to tell me if an MBean is available or not) and gauge metrics (to graph things like used memory). I also collect resource properties that some MBeans expose as JMX attributes. I assign these to different resources by defining resource metadata which point to specific JMX MBean ObjectNames. I then define the details of my Jolokia-enabled WildFly Server in a <remote-jmx> so my agent knows where my Jolokia-enabled WildFly Server is.

Some example configuration is:

<avail-jmx name="Memory Pool Avail"
   interval="30"
         time-units="seconds"
           attribute="Valid"
         up-regex="[tT].*"/>

This defines an availability metric that says for the attribute "Valid" if its value matches the regex "[tT].*" consider its availability UP (note this regex matches the string "true", case-insensitive), otherwise it is DOWN. We will attach this availability metric to a resource below.

Here is an example of a gauge metric:

<metric-jmx name="Pool Used Memory"
        interval="1"
            time-units="minutes"
            metric-units="bytes"
            attribute="Usage#used"/>

Notice the data can come from a sub-attribute of a composite value (the "used" value within the composite attribute "Usage").

You group availability metrics and numeric metrics in metric sets (avail-set-jmx and metric-set-jmx respectively) and then associate those metric sets to specific resource types. Resource types define resources that you want to monitor (resources in JMX are simply MBeans identified with ObjectNames). For example, in my demo, I want to monitor my Memory Pools. So I create a resource type definition that describe the Memory Pools:

<resource-type-jmx name="Memory Pool MBean"
         parents="Runtime MBean"
         resource-name-template="%type% %name%"
         object-name="java.lang:type=MemoryPool,name=*"
         metric-sets="MemoryPoolMetricsJMX"
         avail-sets="MemoryPoolAvailsJMX"
<resource-config-jmx name="Type"
         attribute="Type"/>
</resource-type-jmx>

Here you can see my resource type "Memory Pool Bean" refers to all resources that match the JMX query "java.lang:type=MemoryPool,name=*". For all the resources that match that query, I associate with them the availability and numeric metrics defined in the sets mentioned in the metric-sets and avail-sets attributes. I also want a resource configuration property collected for each resource - "Type". Each Memory Pool MBean has a Type attribute that we want to collect and store. Notice also that all of these resources are to be considered children of the parent resource whose resource type name is "Runtime MBean" (which I defined elsewhere in my configuration).

Once all of my metadata is configured (I've configured the agent to collect all the configuration properties, availability and numeric metrics of all the JMX MBeans I want), I now configure the agent with the Jolokie-enabled endpoint. This tells the agent how to connect to the Jolokia endpoint and what I want the agent to monitor in that endpoint:

<remote-jmx name="My Remote JMX"
enabled="true"
resource-type-sets="MainJMX,MemoryPoolJMX"
metric-tags="server=%ManagedServerName,host=127.0.0.5"
metric-id-template="%ResourceName_%MetricTypeName"
url="http://127.0.0.5:8080/jolokia-war-1.3.3"/>

Here I configure the URL endpoint of my WildFly Server's Jolokia war. I then tell it what resource types I want to monitor in that Jolokia endpoint (I've grouped all my resource types into two different resource type sets called MainJMX and MemoryPoolJMX). The grouping is all up to you - if you want one big resource type set, that's fine. For metrics, you can have one big availability metric set and one big numeric metric set - it doesn't matter how you organize your sets.

One final note before I run the agent. Notice I can associate my managed server with metric tags (remote-jmx is one type of "managed server"). For every metric that is collected for this remote-jmx managed server, those tags will be added to those metrics in the Hawkular Server (specifically in the Hawkular Metrics component). So for the "Pool Used Memory" metric I defined earlier, when that metric is stored in the Hawkular Server, it will be tagged with "server" and "host" where the values of those tags are the name of my managed server (in this case, %ManagedServerName is replaced with "My Remote JMX") and "127.0.0.5" respectively. All metrics will have the same tags. Similarly, the ID used to store the metrics in the Hawkular Metric component can also be customized though this is a rarely used feature and you probably will never need it. Both metric-tags and metric-id-template are optional. You can also place those attributes on individual metrics which is most useful if you have specific tags you want to add only to metrics of a specific metric type but not on all of the metrics collected for your managed server.

OK, now that I've got my Swarm Agent configuration in place (call it "agent.xml" for simplicity), I can run the agent and point it to my configuration file:

java -jar hawkular-swarm-agent-dist-*-swarm.jar agent.xml

At this point I have my agent running along side of my Hawkular Server and WildFly Server enabled with Jolokia. The agent is collecting information from Jolokia and pushing the collected data to Hawkular Server.

In order to visualize this collected data, I'm using the experimental developer tool HawkFX. This is simply a browser that let's you see what data is in Hawkular Inventory as well as Hawkular Metrics. When I log in, I can see all the resources stored in Hawkular that comes directly from Jolokia - these resources represent the different JMX MBeans we asked the agent to monitor:

You can see "My Remote JMX Runtime MBean" is my parent resource, it has one availability metric "VM Avail", three numeric metrics and six child resources (those are the Memory Pool resources described above when we added them to the configuration).

You can drill down and see the metrics associated with the children as well. For example, the Memory Pool for the PS Old Gen has a metric "Pool Used Memory" that we can graph (the metric data was pulled from Jolokia, stored in Hawkular Metrics, which is then graphed by HawkFx as you see here):

Finally, you can use HawkFx to confirm that the resource configuration properties were successfully collected from Jolokia and stored into Hawkular. For example, here you can see the "Type" property we configured earlier - the type of this memory pool is "HEAP".

What this all shows is that you can use Hawkular WildFly Agent to collect resource information and metric data from JMX over a Jolokia endpoint and store that data inside Hawkular.

Thoughts From A Management Platform Developer

Wednesday, September 20, 2017

Prometheus JMX Exporter Configuration For WildFly 10

Friday, September 1, 2017

Hawkular Alerts with Jaeger / OpenTracing

Friday, August 11, 2017

Hawkular Alerts with Prometheus, ElasticSearch, Kafka

Federated Alerts

Alerts vs. Events

Demo

Prometheus

Integration with Prometheus Alert Engine

ElasticSearch

Kafka

But, Wait! There’s More!

More Examples

Tuesday, August 1, 2017

Hawkular Alerts 2.0 UI WIP

Hawkular Alerts 2.0 UI

Monday, November 21, 2016

Hawkular OpenShift Demo - Running Outside OpenShift

Monday, November 14, 2016

Hawkular OpenShift Agent - First Demo

Thursday, October 20, 2016

Hawkular OpenShift Agent is Born

Monday, October 17, 2016

Pulling in a Go Dependency From a Fork, Branch, or Github PR using Glide

Wednesday, October 5, 2016

Installing Open Shift Origin and Go for a Development Environment

Tuesday, July 12, 2016

Collecting Prometheus Data and Storing in Hawkular

Collecting JMX Data and Storing in Hawkular

Followers

Blog Archive

About Me