Scope

After +20 years of computer business we still lack of consistent performance monitoring between different operating systems, each system deploying its own type of monitoring and data collection. UNIX systems try to stay a bit close with each other since all are POSIX systems and follow similar industry standards, like The Open Group.


The main idea behind SDR is to deploy a number of light recorders which collect and store various metrics from operating system: cpu, memory, disk or network usage over a long period of time without disrupting the production business. The recording process is done very simple, sampling over a period of time, storing several performance metrics as a time series. The recorded data is stored as plain ASCII data, easy to be consolidated and accessed by any 3rd party systems.

Another idea is to make SDR available across many operating systems, having the data collection process standard between operating systems: (Open)Solaris, Linux, BSD. In this sense SDR looks like a blackbox where a vast number of applications and operating system metrics are stored and analyzed.


SDR

SDR can help in cases where the budget is limited and the time to deploy the solution is an important factor for your site. You don't want to spend a lot of time setting up an expensive monitoring system based on a relational database system and a complicated reporting module which takes time to setup and learn.

Raw data

To keep things simple SDR is making available all collected metrics as variable measured sequentially in time, called time series. All these observations collected over fixed sampling intervals create a historical time series. To provide access to anyone or to any application to this volume of data the history time series are stored on commodity disk drives, compressed but in text format.


Time series let us understand what has happened in past and look in the future, using various statistical models. In addition , having access to these historical time series will help us to build a simple capacity planning model for our application or site.

Agentless or agent based data collection ?

Certain monitoring systems use the concept of agentless recording, a system which runs on a centralized machine and executes via SSH or RSH operating system commands or custom probes. Example here: HP SiteScope.

In contrast with such systems, SDR tries to stay simple and clear: list and define all recorders used to collect the performance data and store somewhere this set of data, which anyone can easily access it. Each recorder requires to be installed on every system we plan to collect data from.


Main points about current Recording Operation Mode:

Probe Definition

A recorder, is defined as a light probe developed in KSH, Perl or even C langauge, which can directly talk and extract via Kernel Statistics interface, if available, operating system metrics. As well the probe can interact directly with a userland process and obtain the required metrics. This should happen without creating additional load or impact anyhow the execution of the measured production environment.


Each recorder should be capable of accessing operating system interfaces without calling additional utilities and display its data in the following manner:


timestamp : metric1 : metric2: ... : metricN

where timestamp should be defined as Unix time or POSIX time and metricN are the values collected from OS or application.

Data analysis

The next step, after data collection, is to gather and store all server's data under a centralized place from where we could start analysing and estimating the capacity in use. To analyse and digest all these collected data SDR offers a way to gather the raw data from each server and keep it safe over a long period of time. Developed and built on top of the RRD, the high performance data logging and graphing system , SDR in matter of minutes can generate reports for all kind of time-series data collected by the recording module. In addition, coupled with PDQ analytic solver , SDR can be used to modelate a certain workload and predict future growth.

Why System Data Recorder?

Recording

Reporting

Design

The System Data Recorder is simple organized as two main things: the collection part, or the part which handles recording the data from each system and a reporting side where we permanently store and generate simple reports and graphs and perform the analysis. For some configurations we can use only the recording part without the reporting side at all.


Data recorder module consists of many simple utilities developed in Korn shell , Perl or C language which extract different telemetry from the operating system. As well some recorders gather their data from various processes, directly using OS or third parties utilities. We try to stay low and keep to minimum the number of dependencies for our main recorders.


There are 5 recorders, which should be installed and deployed on any system and optional recorders only required for certain cases: JVM monitoring or dedicated hardware platforms like: Niagara power based servers, CMT. SDR was mainly developed around (Open)Solaris operating environment, because of its powerful observability capabilities and robust features. However currently SDR is being ported to FreeBSD and RedHat operating systems.


If your system deploys some sort of virtualization then the recorders will operate from the global level. If the virtualization type includes domains or Xen technology then the recorders are deployed in all these systems.


SDR

Recorded data:

There are five main recorders: sysrec, netrec, nicrec , zonerec and cpurec. Each of these recorders are simple Perl or Ksh utilities running as separate processes, being light and designed to dont be considered a hog for your system when the system is under high utilisation. Some others are developed in C language to reduce as much as possible the the footprint. Additional we have corerec and jvmrec, recorders which should be deployed on systems which are based on CMT architecture or run Java Virtual Machine.


Each recorder is operated by the SMF, the (Open)Solaris service management facility in order to ensure their activity, restarting them automatically in case one fails or exists unexpected. For other operating systems, where we dont have a similar framework we used the standard way of starting and monitoring daemon processes using a watchdog. SMF has a number of advantages like dependency checking which is easily implemented with SMF: the recorders should not start if the local filesystem is not mounted or the network interfaces are not present.


SDR

Each recorder outputs its data to a file called the raw output file. Every night we rotate this file using logadm utility and we compress it. This way we make sure the stored data is small and easy to be transported to our reporting system. The stored data is small and compact in size, majority of collectors record directly raw data in RRD format, easy to be imported into Round-Robin Database system, the final place where the data will be stored. The default time for storing data is 1 year but this can easily be changed.



Last updated: 2010-05-17


Back to main homepage