The System Data Recorder is simple organized as two main things:
the collection part, or the part which handles recording the data
from each system and a reporting side where we permanently store
and generate simple reports and graphs and perform the analysis.
For some configurations we can use only the recording part
without the reporting side at all.
Data recorder module consists of many simple utilities developed in Korn shell
, Perl or C language which extract different telemetry from the operating
system. As well some recorders gather their data from various processes,
directly using OS or third parties utilities. We try to stay low and keep
to minimum the number of dependencies for our main recorders.
There are 5 recorders, which should be installed and
deployed on any system and optional recorders only required
for certain cases: JVM monitoring or dedicated hardware
platforms like: Niagara power based servers, CMT. SDR was mainly
developed around (Open)Solaris operating environment, because of its powerful
observability capabilities and robust features. However currently SDR
is being ported to FreeBSD and RedHat operating systems.
If your system deploys some sort of virtualization then the recorders
will operate from the global level. If the virtualization type includes
domains or Xen technology then the recorders are deployed in all these systems.
Recorded data:
- System CPU, Mem, Disk and Net utilisation, Queuing statistics
- Virtualization: Zone, VM statistics
- CPU statistics: cross calls, system and user time
- CMT core utilisation: T1,T2
- NET UDP, IP, TCP/IP statistics of each zone
- NIC statistics
- Java Virtual Machine: Garbage Collector statistics
There are five main recorders: sysrec, netrec, nicrec
, zonerec and cpurec. Each of these recorders are simple
Perl or Ksh utilities running as separate processes, being light and designed
to dont be considered a hog for your system when the system is under high
utilisation. Some others are developed in C language to reduce as much as
possible the footprint. Additional, we have corerec and jvmrec,
recorders which should be deployed on systems which are based on CMT
architecture or run Java Virtual Machine.
For Solaris based systems each recorder is operated by the SMF,
the service management facility in order to ensure their activity,
restarting them automatically in case one fails or exists unexpected. For other operating systems, where we dont have a similar framework, we are using
a simple watchdog.
Each recorder outputs its data to a file, called the raw output file.
Every night we rotate this file using logadm utility
and we compress it. This way we make sure the stored data is small and easy
to be transported to our reporting system. The stored data is small and compact
in size, majority of collectors record directly raw data in RRD format,
easy to be imported into Round-Robin Database system, the final place
where the data will be stored. The default time for storing data is 1 year
but this can easily be changed.