SDR - Frequently Asked Questions

This is the FAQ section for SDR

General

Recording

Reporting

Development: recording, reporting



General

  1. Why SDR ?
    Recording
    • Human experience behind the performance analysis process
    • Rights to use, RTU free
    • Open source license
    • Very simple
    • Control over raw data in matter of minutes
    • Raw data: time series, easy statistical modeling: trends, seasonal variations
    • Very simple to change, add, remove whatever you need
    • Easy to work with other modules, example PDQ
    • Simple to educate your IT staff

    Reporting
    • Human experience behind the performance analysis process
    • Rights to use, RTU free
    • Open source license
    • Very simple
    • Not an analytics package
    • Statistical models: R and the analytical solver: PDQ will help to look over your data and predict the future
    • You don't have to click, dozens of options, links to get what you need
    • Should save your time in front of computer being simple and to the point
    • Simple to educate your IT staff
  2. But we have dozens of such systems already, example Orca...

    The main idea of SDR is to combine operating system metrics: CPU, Mem, Disk and Net I/O in terms of utilization and queuing and couple these with RRD, R and PDQ for statistical analysis and prediction. SDR places an important role on the data collected, the raw data, which is simple stored on commodity disk drives: SATA, SAS without the need of a relational database management system.

  3. How about SAR, I could easily use it instead of SDR... What's the difference ?

    SDR lets you have a numbers of light recorders specific for certain purpose: monitor CPU, DiskIO, NetIO, or some application. At some point of time you want to add or modify certain metrics from your recordings. You are restricted to do that using SAR, unless you are a kernel developer or you plan enhancing the tool. SDR on the other hand can easily be modified and enhanced in matter of minutes.

    For example, lets say you plan to monitor the number of file descriptors per process. You cant use SAR for such thing, so most likely you will need to write your own probe, script or use another standard OS utility. Instead of these you can easily use procrec and if you are not happy with the results change the recorder as you please.

    SAR still remains a very useful monitoring tool, which ships with SDR by default for Linux based operating systems. SDR can use SAR by default, if needed.

  4. How about HP SiteScope, IBM Tivoli ITCAM, or others ?

    SDR tries to stay simple and follow GCaP methodologies and help you build a capacity plan for your IT infrastructure. Majority of the current commercial IT performance monitoring solutions are complex and large. They include application performance monitoring, end to end monitoring modules along with event alerting. These make them good for certain goals, bad for the other purposes. You need to dedicate a lot of time to learn and administer such systems. SDR focuses on: performance monitoring, analysis and forecasting. At the end you should have a simple Capacity Planning model for your site or application, with minimal time spent and investment.

  5. How about: Zenoss, Ganglia or Munin open source monitoring systems ?

    They are all good for their job. Some started as network monitoring solutions, some measuring clusters and grids, majority showing plots about data all over. SDR tries to gather and have a logic between the data collected and follow a path towards capacity planning. As mentioned, SDR is a proof of concept built around GCaP methodologies.

  6. How about collectd, the system statistics collection daemon ?

    collectd is a fine system performance collector based on plugins. It does support several Operating System platforms, outputting data for different formats: RRD, CSV using output plugins. The collectd daemon combines different techniques within to fetch the performance metrics, for example the CPU utilization on Linux or Solaris:

    "Why is the CPU usage split up in so many files? Can I change that? The short answer is: That is because otherwise backwards compatibility would be impossible and you would have to re-create your files from scratch regularly. And, "no". The long answer and explanation of the short answer is: collectd runs on a variety of operating systems. Each operating system has it's own method for accounting CPU states, memory consumption, swap usage, and so on. If all these data sources where in one data set, every new supported operating system or any addition to an already supported operating system would mean that we need to modify the data set. ... FAQ collectd.org"

    Some points between SDR and collectd:

    • Language: SDR is based on a dynamic scripting language. We want few recorders, no dozens of plugins! 4 main recorders, responsible for overall system performance, CPU, Disk and Network and some extra, additional recorders for different other jobs: HTTP workloads, JVM workloads, per process statistics, ... We use Perl5, a mature language with a serious development environment containing a vast number of modules available on CPAN.

    • Interface: each recorder must be a command-line interface utility. Simple and easy to use for long trend or temporary recordings. Each recorder should be simple to start and stop without any dependencies on network protocols or different operating system or 3rd parties utilities. Each recorder must use OS interfaces to fetch performance metrics , like Solaris' kstat or Linux's proc interfaces and store this set of data on disk on a certain output format, easy to be used for the Reporting module. The recording process consists of one or many recorders which run at a a fixed interval of time. The interval of time between samples is defined in seconds. SDR supports sub-second time intervals !

    • Simplicity: Each recorder must have a manual page and a simple help option which should describe the metrics collected from OS or applications. Each recorder should not deliver complex logic supporting many Operating System interfaces within same recorder ! Use one thing at the time, but use it well !

    • Modularity: We believe one tool cannot do all jobs ! So we have assigned different tasks to different recorders, making easy and simple to update a recorder without to break others.

    • Reporting: SDR is trying to deliver a compact and ready to use interface for reporting all collected data intended for Capacity Planning and Performance Analysis !

    • GCaP: SDR is built on top of principles from Guerrilla Capacity Planning class by Performance Dynamics Prof. Dr. Neil Gunther.

  7. Is SDR agentless or agent based system ?

    Agent based system. Trying to avoid all marketing nonsense about this topic, agentless systems are much simple to setup, indeed, but they often are confusing: they don't list and define what probes are sent on each target and more complex because they use entirely the network to send-receive data. In majority of cases all agentless systems use Operating System utilities, like mpstat, vmstat to collect various metrics from each target. There is nothing innovative in such approach. SDR, on the other hand, has certain recorders which talk utilization and saturation along with all the other OS metrics. As well SDR tries to stay simple and keep control over the raw data collected with minimal resources. Periodically the raw data is synced to a master site.

    You can experiment with SDR recorders and use them directly as custom probes, if you wish, in a agentless system. SDR recorders have been designed to operate no matter what the reporting side is.

  8. Which performance monitoring, analysis software should I use ?

    It depends. There is no such thing as one tool for all jobs. Try to list and clear define what are your goals: raw data access, performance metrics of each monitored server, application performance monitoring, end to end response time, do you really need to have a capacity planning setup for your site, etc. Consult with your team members and your system or service manager and select a number of software to be evaluated. Select the best fits your organization.

  9. Why do you always talk about GCaP ?

    Because it is the human experience behind the performance analysis process not a specific software package! Majority of the IT companies simple buy software to replace people or their competence. That's wrong. And GCaP really helps you understand this and correct it. In addition, GCaP has all needed pieces to understand performance analysis and help you build a capacity plan for a site or a specific application without using any software package nor selling you one!

  10. SDR, what R stands for: Recording or Reporting ?

    System-Data-Recorder, so it means Recording. However you can have custom made packages for your installation where all recorded data can be displayed and lots of reports are available. The main idea of SDR is to combine the recording side with a light reporting module, based on RRD and Perl.

  11. Is SDR smilar with Sensor Data Repository ?

    No. Sensor Data Repository is used by ipmitool. When using ipmitool S-D-R means totally something else. SDR stands for System Data Recorder !

  12. Can I install SDR under BSD or Linux based operating systems ?

    SDR consists of two main things: Recording and Reporting. You can run one without the other, however many installations requires both. SDR recording at the moment requires Solaris due its tight integration with KSTAT interface. Work is under way to port SDR Recording module to FreeBSD/OpenBSD/NetBSD and RedHat. SDR Reporting can be installed under any POSIX based operating system: Solaris, FreeBSD or RedHat. Feel free to contribute the code for your operating system, if you would like to speed up the process of supporting more operating systems.

  13. Why many recorders, cant you use only one ?

    SDR recording part includes several recorders designed to collect data from a particular parts of your systems: CPU, Memory, Disk and Network and additional recorders for different other jobs: JVM, Solaris Zones, applications etc. Instead of having one, two general recorders we tried to design 4 main recorders which can be easily maintained and ported and others specialized for other purposes. Simplicity was the main criteria ! For more information see Faq 52.

  14. All recorders seem to be simple Perl or Ksh scripts, why ?

    Simplicity was one of the main reasons behind. KSTAT interface in Solaris can be accessed via a Perl or C program. Brendan Gregg, the author of sysperfstat inspired me to keep using the same way, KSTAT scripts. When I was not able to obtain the information from KSTAT I used a simple Ksh script calling basic OS utilities. This last part needs improvement, example here zonerec, jvmrec. The main goal is to use as few utilities as possible and gather all data from OS interfaces.

  15. SDR Team: who, when and how ?

    We are a small group, majority of us having day jobs. We do most of the work during night, public holidays, summer or winter holidays. There is a full time developer implementing webrec, the response time analyzer in Java. We will try to do our best and reply your emails in short time !

  16. SDR Usage: commercial or non-commercial ?

    www.systemdatarecorder.org is a research group focusing on: performance analysis, visualization and capacity planning. We are researching and building new tools which should help people and corporations in:

    • analyzing and visualizing workloads
    • analyzing infrastructure usage: web, app, database servers, storage, networks
    • consolidate and save energy

    systemdatarecorder.org is a not-for-profit project, so you can easily use our work for commercial or non-commercial purpose.

    If you are interested in getting support or ordering a specific SDR module then there is a commercial company which can offer such services: SystemDataRecorder Oy Finland !

  17. We are a small/medium/large ISV and we would like to sell, support SDR. Is it possible ?

    For sure. Visit the copyrights section and read carefully the documentation.

    Recording Module You can easily reuse all our recorders as custom monitoring probes along with your own software or simple add them to your solution. Recorders are developed as GPL based software !

    Reporting Module You can easily build your own reporting module based on the source code. You can add all our reporting tools and enhance it as you please. If you would like to receive help, support or have custom enhancements you can contact System Data Recorder Oy which sells commercial support and customization for SDR !

    Please send us email, if you found SDR useful. Thank you !

  18. We are a small company and we would like to sell and support SDR. Is it possible ?

    For sure. See above example.

  19. I would like to help SDR project by coding certain recorders or reporting utilities. Do you need help ?

    Yes, we would like to receive help ! Thank you ! Please take a look at Section 100 SDR Development and check the following list of open issues:

    Please contact us if you would like to be assigned to other tasks !

  20. I still believe that a complete monitoring system from a big vendor will help my organization and see results within couple of weeks.

    As already mentioned SDR tries to stay simple and follow GCaP methodologies. In addition SDR focuses on a simple performance monitoring strategy and a very simple and flexible reporting module based on Perl5, RRDtool, R and PDQ. This approach makes things simple and easy for learning, education and faster to implement and build different customizations.

    There is no such thing that one tool for all jobs. So we strongly believe that SDR should focus only on performance monitoring and analysis. No event monitoring, no alert management and no extra complexity of a relation database system which will require extensive programming and maintenance.

    We are open and flexible to learn and work with anyone. We believe in open source projects and in idea that recording must be made simple and easy for any SysAdmin or System Manager no matter if they use Linux or UNIX. We are not interested in building large GUI or complex UI which requires extensive programming and effort. Instead we like simple interfaces and fast ones. We dont want that you will waste your time clicking and building reports or developing large templates with SDR. Instead SDR will be ready and offer you already the needed data. When you will need to adjust or change things, SDR will let you do that by scripting.


Recording

  1. What is Server Infrastructure Monitoring ?

    Part of Infrastructure Monitoring, this includes all aspects of server performance monitoring, focusing on the physical/virtual server and the operating system running on it:

    • The requirements and the definition of all monitoring points

    • The operating system monitoring points:

      • Overall CPU Utilization

      • Per CPU utilization and additional stats: rate of kernel mutexes, rate of context switches, CPU Percentage servicing interrupts

      • Memory Utilization and additional stats: total size of used memory, total size of used swap space

      • Disk IO Utilization: reads + writes across all disks

      • Per disk utilization and additional stats

      • Network IO Utilization: reads + writes across all NICs

      • Per NIC utilization and additional stats: packets that were dropped/sec, collisions, errors

      • Network Protocol Stats: TCP, UDP, IP

      • Virtualization stats: containers and virtual machines

      • Per process stats: owner, state, nice, the priority of the process, the no. of light weight processes, the no. of open file descriptors

      • Java Virtual Machine Garbage Collection statistic: survivor S0/S1 utilization, old space utilization, permanent space utilization, number of young generation GC events

      • Additional monitors: platform specific monitors, example SPARC processor utilization

  2. Does it mean SDR is designed for server infrastructure monitoring only ?

    The more recorders, the better !

    SDR, by default, is collecting data from different operating systems running on top of some physical or virtual servers. The current supported list of operating systems contains: Linux based systems like, Ubuntu LTS , RedHat Entreprise Linux or Solaris systems. This was the original idea behind SDR. In addition we try to expand this list, including many other operating systems: MacOSX, FreeBSD, Windows but since we are a small group of peole this takes time.

    On the other hand the recording module can easily be ported to other devices or computers to retrieve data from. Example we gather every 5 minutes data from a weather device and store this data under our reporting server. So feel free to port or develop new recorders for your devices following our principles!

  3. So SDR only supports Linux and Solaris systems, right ?

    At this moment, yes ! We are trying to add support for other operating systems as well: FreeBSD, MacOSX or Windows !

  4. What operating systems SDR Recording Module is supporting at this moment ?

    As SDR Recording 0.73.x this is the list of supported operating systems:

    • Linux 2.6+ based kernels:

      • CentOS 5.x x64

      • CentOS 6.x x64

      • RedHat Entreprise Linux 5.x x64

      • RedHat Entreprise Linux 6.x x64

      • Ubuntu 10.x LTS x64

      • Ubuntu 11.x LTS x64

    • Solaris 8,9 x64 and SPARC (partial)

    • Solaris 10,11 x64 and SPARC

    Note: At this moment SDR does not support VMware ESX Server !
  5. Zonerec under Solaris is based on prstat utility, why!?

    To gather data from various Solaris zones, KSTAT interface should be used. Currently there is a open effort to improve this. Meanwhile prstat can be used to obtain data for each zone. Extended Process Accounting can as well be used to obtain information from each process running on the physical machine. However at this moment Im looking into new ways to improve this.

  6. Solaris sysrec does not correctly report memory utilization

    Make sure you use at least SDR 0.70 which includes updates related to sysrec and ZFS.

  7. Sysrec consumes a lot of CPU under a very old Sun Ultra10 workstation. Whats going on ?

    SDR mainly uses the Perl language to fetch and parse system statistics data. It is simple to read, understand and maintain all recorders this way. We have introduced as well certain recorders, as native binaries, example nicrec, to experiment and see the benefits. In general running a Perl application will have a different footprint than a generic native application developed in C, lets say. If our hardware is even older, a very low CPU freq system then the Perl application will add a bit of overhead versus the native application.

    For systems powered by low CPU freq, like Ultra5 or Ultra10 Sun hardware, certain recorders will have a high footprint when executed every second, for example. sysrec is one of them. Below a short description of such case:

      System Configuration: Sun Microsystems  sun4u Sun Blade 100 (UltraSPARC-IIe)
      502 MHz SUNW,UltraSPARC-IIe
    
      $ sysrec 1
       PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP       
      3990 sdr        11M 8760K sleep   49    0   0:02:18 5.7% sysrec/1
      3874 sdr      9216K 8520K run     54    0   0:09:37 3.7% prstat/1
      3992 sdr      4608K 3984K cpu0    59    0   0:01:30 3.1% prstat/1
       127 root     6496K 4016K sleep   59    0   0:01:35 1.0% nscd/35
    
      $ sysrec 2
       PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP       
      3874 sdr      9216K 8520K sleep   25    0   0:08:17 5.3% prstat/1
      3984 sdr        11M 8760K sleep   59    0   0:00:01 2.8% sysrec/1
       127 root     6496K 4016K sleep   59    0   0:01:09 0.9% nscd/35
      3987 sdr      4096K 3456K cpu0    59    0   0:00:00 0.4% prstat/1
    
      $ sysrec 3
       PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP       
      3874 sdr      9216K 8520K run     19    0   0:08:14 4.9% prstat/1
      3981 sdr        11M 8760K sleep   59    0   0:00:01 2.1% sysrec/1
      3983 sdr      4096K 3472K cpu0    54    0   0:00:00 1.4% prstat/1
       127 root     6496K 4016K sleep   59    0   0:01:09 0.9% nscd/35
    
      A very busy system:
    
      $ sysrec 1
         PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP       
      4000 sdr      4944K 2520K run      0    4   0:01:01  84% perl/1
      3990 sdr        11M 8760K sleep   49    0   0:02:27 5.7% sysrec/1
      4008 sdr      4096K 3456K cpu0    59    0   0:00:01 3.9% prstat/1
      127 root     6496K 4016K sleep   59    0   0:01:36 0.4% nscd/35
    
      [...]
      1303199130:100.00:86.52:0.00:0.00:0.00:0.00:0.00:0.00:96.00:4.00:0.00:0.98:0.53:0.38
      1303199131:99.00:86.52:0.00:0.00:0.00:0.00:0.00:0.00:96.00:3.00:1.00:0.98:0.53:0.38
      1303199132:100.00:86.52:0.00:0.00:1.00:0.00:0.00:0.00:96.00:4.00:0.00:0.98:0.53:0.38
      1303199133:98.99:86.52:0.00:0.00:0.00:0.00:0.00:0.00:95.99:3.00:1.01:0.98:0.54:0.38
      1303199134:100.00:86.52:0.00:0.00:0.00:0.00:0.00:0.00:96.00:4.00:0.00:0.98:0.54:0.38
      1303199135:100.00:86.52:0.00:0.00:0.00:0.00:0.00:0.00:97.00:3.00:0.00:0.98:0.54:0.38
      1303199136:100.00:86.52:0.00:0.00:0.00:0.00:0.00:0.00:96.00:4.00:0.00:0.98:0.54:0.38
      1303199137:100.00:86.52:0.00:0.00:0.00:0.00:0.00:0.00:96.00:4.00:0.00:0.99:0.54:0.38
      1303199138:99.00:86.52:0.00:0.00:0.00:0.00:0.00:0.00:96.00:3.00:1.00:0.99:0.54:0.38
      1303199139:100.00:86.52:0.00:0.00:0.00:0.00:0.00:0.00:97.00:3.00:0.00:0.99:0.54:0.38
      [...]
      

    To analyze sysrec's footprint we will turn to the original version of sysrec, sysperfstat and check its footprint using a Sun hdw workstation: Ultra 10, running Solaris 10 U7. We will try to profile the sysperfstat and sysrec recordes and check their footprints !

    sysperfstat, original Gregg Brendan author

      Initial State: System CPU 99% Utilisation
    
       PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP       
      4018 sdr      4944K 2272K run     38    4   0:03:47  94% perl/1
      4021 sdr      4096K 3472K cpu0    59    0   0:00:10 4.9% prstat/1
       127 root     6496K 4016K sleep   59    0   0:01:37 0.3% nscd/35
       625 noaccess  199M  115M sleep   59    0   0:04:29 0.3% java/18
      3960 sdr      7512K 5536K sleep   59    0   0:00:04 0.1% sshd/1
       217 root     3144K 1624K sleep  100    -   0:00:10 0.0% xntpd/1
     NPROC USERNAME  SWAP   RSS MEMORY      TIME  CPU                             
        12 sdr      9728K   19M   0.9%   0:05:59  99%
        27 root       45M   51M   2.5%   0:02:58 0.3%
         1 noaccess  114M  115M   5.6%   0:04:29 0.3%
         1 smmsp    1560K 6768K   0.3%   0:00:00 0.0%
         2 daemon   1824K 6072K   0.3%   0:00:00 0.0%
    Total: 43 processes, 190 lwps, load averages: 1.07, 0.61, 0.37
    
      $ ./sysperfstat 1 2
                ------ Utilisation ------     ------ Saturation ------
        Time    %CPU   %Mem  %Disk   %Net     CPU    Mem   Disk    Net
    11:08:56   66.92  31.26   3.17   0.75    0.30   0.01   0.26   0.00
    11:08:57  100.00  31.28   0.00   0.00    0.00   0.00   0.00   0.00
    
      sysperfstat uses OS's Perl version:
    
      Total Elapsed Time = 1.584845 Seconds
      User+System Time = 0.444845 Seconds
    Exclusive Times
    %Time ExclSec CumulS #Calls sec/call Csec/c  Name
     29.0   0.129  0.129    751   0.0002 0.0002  Sun::Solaris::Kstat::_Stat::FETCH
     24.7   0.110  0.110      2   0.0550 0.0550  Sun::Solaris::Kstat::update
     8.99   0.040  0.040      1   0.0400 0.0400  Sun::Solaris::Kstat::new
     6.74   0.030  0.030      3   0.0100 0.0099  vars::BEGIN
     4.50   0.020  0.150      1   0.0200 0.1500  main::discover_net
     4.50   0.020  0.109      3   0.0066 0.0362  Sun::Solaris::Kstat::BEGIN
     2.25   0.010  0.118      2   0.0049 0.0591  main::BEGIN
     2.25   0.010  0.010      2   0.0049 0.0048  DynaLoader::BEGIN
     2.25   0.010  0.010      4   0.0025 0.0025  strict::unimport
     2.25   0.010  0.020      6   0.0016 0.0033  AutoLoader::BEGIN
     2.25   0.010  0.010      2   0.0049 0.0048  main::fetch_cpu
     0.00   0.000  0.000      1   0.0000 0.0000  Config::launcher
     0.00       - -0.000      1        -      -  DynaLoader::dl_load_file
     0.00       - -0.000      1        -      -  version::(bool
     0.00       - -0.000      1        -      -  version::(cmp
    
    Total Elapsed Time = 1.774845 Seconds
      User+System Time = 0.444845 Seconds
    Exclusive Times
    %Time ExclSec CumulS #Calls sec/call Csec/c  Name
     24.5   0.109  0.109    751   0.0001 0.0001  Sun::Solaris::Kstat::_Stat::FETCH
     22.4   0.100  0.100      2   0.0500 0.0500  Sun::Solaris::Kstat::update
     11.2   0.050  0.150      1   0.0500 0.1500  main::discover_net
     8.99   0.040  0.109      3   0.0132 0.0362  Sun::Solaris::Kstat::BEGIN
     6.74   0.030  0.030      1   0.0300 0.0300  Sun::Solaris::Kstat::new
     6.74   0.030  0.030      3   0.0100 0.0099  vars::BEGIN
     2.25   0.010  0.118      2   0.0049 0.0591  main::BEGIN
     2.25   0.010  0.010      2   0.0049 0.0048  DynaLoader::BEGIN
     2.25   0.010  0.030      7   0.0014 0.0042  Config::FETCH
     0.00   0.000  0.000      1   0.0000 0.0000  Config::launcher
     0.00       - -0.000      1        -      -  DynaLoader::dl_load_file
     0.00       - -0.000      1        -      -  version::(bool
     0.00       - -0.000      1        -      -  version::(cmp
     0.00       - -0.000      1        -      -  Config::TIEHASH
     0.00       - -0.000      1        -      -  Config::import
    
    
      sysperfstat uses SDR's Perl version:
    
      Total Elapsed Time = 1.654845 Seconds
      User+System Time = 0.444845 Seconds
    Exclusive Times
    %Time ExclSec CumulS #Calls sec/call Csec/c  Name
     24.5   0.109  0.109    751   0.0001 0.0001  Sun::Solaris::Kstat::_Stat::FETCH
     22.4   0.100  0.100      2   0.0500 0.0500  Sun::Solaris::Kstat::update
     8.99   0.040  0.150      1   0.0400 0.1500  main::discover_net
     8.99   0.040  0.040      1   0.0400 0.0400  Sun::Solaris::Kstat::new
     8.99   0.040  0.109      3   0.0132 0.0362  Sun::Solaris::Kstat::BEGIN
     6.74   0.030  0.030      3   0.0100 0.0099  vars::BEGIN
     2.25   0.010  0.118      2   0.0049 0.0591  main::BEGIN
     2.25   0.010  0.010      2   0.0049 0.0048  DynaLoader::BEGIN
     2.25   0.010  0.010      2   0.0049 0.0048  main::fetch_cpu
     2.25   0.010  0.009      2   0.0048 0.0046  main::fetch_disk
     0.00   0.000  0.000      1   0.0000 0.0000  Config::launcher
     0.00       - -0.000      1        -      -  DynaLoader::dl_load_file
     0.00       - -0.000      1        -      -  version::(bool
     0.00       - -0.000      1        -      -  version::(cmp
     0.00       - -0.000      1        -      -  Config::TIEHASH
    
    Total Elapsed Time = 1.564006 Seconds
      User+System Time = 0.424006 Seconds
    Exclusive Times
    %Time ExclSec CumulS #Calls sec/call Csec/c  Name
     27.8   0.118  0.118    751   0.0002 0.0002  Sun::Solaris::Kstat::_Stat::FETCH
     23.5   0.100  0.100      2   0.0500 0.0500  Sun::Solaris::Kstat::update
     11.7   0.050  0.108      3   0.0165 0.0362  Sun::Solaris::Kstat::BEGIN
     9.43   0.040  0.159      1   0.0397 0.1594  main::discover_net
     7.08   0.030  0.030      1   0.0300 0.0300  Sun::Solaris::Kstat::new
     7.08   0.030  0.030      3   0.0100 0.0099  vars::BEGIN
     2.36   0.010  0.010      1   0.0100 0.0100  DynaLoader::dl_load_file
     2.36   0.010  0.020      1   0.0099 0.0198  DynaLoader::bootstrap
     2.36   0.010  0.010      2   0.0049 0.0048  DynaLoader::BEGIN
     0.00   0.000  0.000      1   0.0000 0.0000  Config::launcher
     0.00       - -0.000      1        -      -  version::(bool
     0.00       - -0.000      1        -      -  version::(cmp
     0.00       - -0.000      1        -      -  Config::TIEHASH
     0.00       - -0.000      1        -      -  Config::import
     0.00       - -0.000      1        -      -  Config::AUTOLOAD
      

    SDR 0.73.1 sysrec based on sysperfstat

      Initial State: System CPU 99% Utilisation
    
       PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP       
      4018 sdr      4944K 2272K run     18    4   0:10:57  94% perl/1
      4043 sdr      4096K 3456K cpu0    59    0   0:00:00 1.6% prstat/1
       625 noaccess  199M  115M sleep   59    0   0:04:30 0.3% java/18
       127 root     6496K 4016K sleep   59    0   0:01:37 0.1% nscd/35
      3960 sdr      7512K 5536K sleep   59    0   0:00:04 0.1% sshd/1
      3970 sdr      1792K 1400K sleep   59    0   0:00:00 0.1% ksh/1
      3901 sdr      7512K 5448K sleep   59    0   0:00:04 0.0% sshd/1
       217 root     3144K 1624K sleep  100    -   0:00:10 0.0% xntpd/1
      3916 sdr      3088K 2352K sleep   59    0   0:00:00 0.0% ksh93/1
       623 root     8656K 1976K sleep   59    0   0:00:02 0.0% sendmail/1
       131 root     3680K 2424K sleep   59    0   0:00:03 0.0% picld/7
       123 daemon   4912K 2976K sleep   59    0   0:00:00 0.0% kcfd/3
       236 daemon   3160K 1072K sleep   59    0   0:00:00 0.0% rpcbind/1
       289 root       16M 5168K sleep   59    0   0:00:06 0.0% fmd/21
       628 root     5600K 1664K sleep   59    0   0:00:00 0.0% sshd/1
     NPROC USERNAME  SWAP   RSS MEMORY      TIME  CPU                             
        12 sdr      9744K   19M   0.9%   0:13:00  95%
         1 noaccess  114M  115M   5.6%   0:04:30 0.3%
        27 root       45M   51M   2.5%   0:02:58 0.1%
         1 smmsp    1560K 6768K   0.3%   0:00:00 0.0%
         2 daemon   1824K 6072K   0.3%   0:00:00 0.0%
    Total: 43 processes, 190 lwps, load averages: 1.08, 0.96, 0.64
    
      $ /opt/sdr/bin/sysrec  1 2              
      1303201057:67.27:86.51:3.14:0.74:0.29:0.01:0.26:0.00:52.79:14.47:32.74:1.05:0.98:0.68
      1303201058:99.42:86.51:0.00:0.00:0.00:0.00:0.00:0.00:96.44:2.98:0.58:1.05:0.98:0.68
    
      Total Elapsed Time = 1.731365 Seconds
      User+System Time = 0.581364 Seconds
    Exclusive Times
    %Time ExclSec CumulS #Calls sec/call Csec/c  Name
     17.2   0.100  0.100      2   0.0500 0.0500  Sun::Solaris::Kstat::update
     12.0   0.070  0.120      1   0.0700 0.1200  main::discover_net
     11.8   0.069  0.265      5   0.0139 0.0531  main::BEGIN
     10.3   0.060  0.060      1   0.0600 0.0600  Sun::Solaris::Kstat::new
     8.43   0.049  0.049    763   0.0001 0.0001  Sun::Solaris::Kstat::_Stat::FETCH
     5.16   0.030  0.030      3   0.0100 0.0099  vars::BEGIN
     3.44   0.020  0.020      4   0.0050 0.0050  Exporter::import
     3.44   0.020  0.030      2   0.0100 0.0149  Tie::Hash::BEGIN
     3.44   0.020  0.020      2   0.0099 0.0098  XSLoader::load
     1.72   0.010  0.010      1   0.0100 0.0100  POSIX::AUTOLOAD
     1.72   0.010  0.010      1   0.0100 0.0100  Exporter::export
     1.72   0.010  0.010      3   0.0033 0.0033  AutoLoader::import
     1.72   0.010  0.010      4   0.0025 0.0025  DynaLoader::dl_load_file
     1.72   0.010  0.010      2   0.0050 0.0049  Exporter::as_heavy
     1.72   0.010  0.040      3   0.0033 0.0132  POSIX::SigRt::BEGIN
    
     Total Elapsed Time = 1.650410 Seconds
      User+System Time = 0.610410 Seconds
    Exclusive Times
    %Time ExclSec CumulS #Calls sec/call Csec/c  Name
     17.6   0.108  0.108    763   0.0001 0.0001  Sun::Solaris::Kstat::_Stat::FETCH
     16.3   0.100  0.100      2   0.0500 0.0500  Sun::Solaris::Kstat::update
     9.83   0.060  0.060      1   0.0600 0.0600  Sun::Solaris::Kstat::new
     8.03   0.049  0.275      5   0.0099 0.0551  main::BEGIN
     6.55   0.040  0.040      3   0.0133 0.0133  vars::BEGIN
     4.91   0.030  0.039      7   0.0042 0.0056  POSIX::BEGIN
     3.28   0.020  0.020      4   0.0050 0.0050  DynaLoader::dl_load_file
     3.28   0.020  0.020      2   0.0100 0.0099  Exporter::as_heavy
     3.28   0.020  0.020      2   0.0100 0.0098  Tie::Hash::BEGIN
     3.28   0.020  0.119      1   0.0197 0.1194  main::discover_net
     1.64   0.010  0.010      1   0.0100 0.0100  POSIX::bootstrap
     1.64   0.010  0.010      1   0.0100 0.0100  Sun::Solaris::Kstat::bootstrap
     1.64   0.010  0.010      3   0.0033 0.0033  AutoLoader::import
     1.64   0.010  0.010      4   0.0025 0.0025  Exporter::import
     1.64   0.010  0.030      3   0.0033 0.0099  POSIX::SigRt::BEGIN
    
    Total Elapsed Time = 1.830410 Seconds
      User+System Time = 0.610410 Seconds
    Exclusive Times
    %Time ExclSec CumulS #Calls sec/call Csec/c  Name
     16.3   0.100  0.100      2   0.0500 0.0500  Sun::Solaris::Kstat::update
     11.1   0.068  0.068    763   0.0001 0.0001  Sun::Solaris::Kstat::_Stat::FETCH
     9.83   0.060  0.060      1   0.0600 0.0600  Sun::Solaris::Kstat::new
     9.67   0.059  0.285      5   0.0119 0.0571  main::BEGIN
     8.19   0.050  0.119      1   0.0497 0.1194  main::discover_net
     6.55   0.040  0.040      3   0.0133 0.0133  vars::BEGIN
     4.91   0.030  0.039      7   0.0042 0.0056  POSIX::BEGIN
     3.28   0.020  0.020      2   0.0100 0.0099  Exporter::as_heavy
     3.28   0.020  0.020      2   0.0100 0.0098  Tie::Hash::BEGIN
     3.28   0.020  0.020      2   0.0099 0.0098  DynaLoader::bootstrap
     1.64   0.010  0.010      1   0.0100 0.0100  POSIX::bootstrap
     1.64   0.010  0.010      3   0.0033 0.0033  AutoLoader::import
     1.64   0.010  0.010      4   0.0025 0.0025  Exporter::import
     1.64   0.010  0.030      3   0.0033 0.0099  POSIX::SigRt::BEGIN
     1.64   0.010  0.010      2   0.0049 0.0048  DynaLoader::BEGIN
      

    sysrec is based on sysperfstat and on top of this adds certain new metrics. Profiling the code we can see that the footprint is a bit higher but not very much. In Production systems sysrec will not run every second unless you are interactively debugging or investigating some problem. In the long run sysrec des run using a more relaxed interval of time: 60+ seconds or so. If you plan to use SDR on old hardware keep in mind these notes !

  8. I see that nicrec is a binary file, why ?

    The main goal of SDR Recording Module is to deliver system and specialized recorders using a dynamic scripting language, like Perl. However we tried to experiment and have some recorders as C or Java applications. Nicrec under SDR 0.73 for Solaris based systems switched from Perl to C. We did not totally replace the Perl version. Unfortunately Tim Cook, the author of this tool, stopped working on this. We have removed Solaris native version of nicrec with 0.73.1 ! We ship the default Perl nicrec version !

  9. Can I run SDR inside a VM guest ?

    Yes, SDR supports VM guests. VirtualBox and VMware are the most common VM technologies and SDR supports them.

  10. Can I run SDR inside a Solaris zone ?

    Yes, SDR supports Solaris zones. Make sure you check your configuration: you can manage all zones from global level of the machine. If resource management is in use, then most likely you can consider using SDR inside a Solaris zone.

  11. I see big timekeeping inaccuracies when running SDR inside a VM guest. Why ?

    Make sure you read: "Timekeeping in VMware Virtual Machines" There are certain actions required to be set on the guest OS in order to minimize the chance of having inaccurate times. It has been observed if a guest OS CPU utilization is 80-90% certain SDR recorders start loosing time. This is caused by the guest OS. Our internal testing has recorded such issues on a guest RHEL 5.4 32bit kernel 2.6.18-164. Solaris 10 10/09 s10x_u8wos_08a X86 kernel: 141445-09 seems to handle enough good situations like these.

  12. Why jvmrec is using jstat

    jvmrec is built ontop of jstat utility delivered by Sun/Oracle's JDK which already records GC statistics from a running Java application. jvmrec currently does not support JRockit nor J9 VMs. IBM's J9 JDK is enough similar to Sun/Oracle's Hotspot VM but it does not include any command line utility similar to jstat, to collect and report GC statistics.

  13. What is webrec ?

    SDR should be capable to analyze not only the operating system's resources but the applications, deployed on certain computer system infrastructure. Currently a large number of applications are deployed as web applications, using the HTTP protocol. A very clear requirement was to develop a light recorder to capture metrics from these applications. For start, the response time of one or many HTTP actions, was clear needed.

    In order to gather the response times from a series of HTTP actions we have created webrec, a simple HTTP recorder based on Apache HTTPClient. The recorder is written in Java and is capable of recording multiple HTTP actions gathered as workloads. For each workload a thread is used. So you can use multiple workloads to record multiple HTTP actions of different parts of your site or sites !

  14. Why should I care about webrec ?

    If you care how well your HTTP site runs, or how efficient your web application has been developed you need to measure it and store its activity over a long period of time. webrec helps you in this respect measuring your application(s) and storing important metrics. Later you can conduct several types of analysis based on this set of data ! But first you need to have the data.

  15. Does webrec support GET, POST or https, authentication ?

    Currently webrec does support: GET and cookies. Work is under way to implement POST. Later on we are planning in adding support for https and authentication. This is based on SDR 0.73 release for 2010 year !

  16. Why SDR recorders are producing data ':' separated ?

    SDR Recorders are simple probes gathering OS or application metrics ! The recorders must be light and simple enough to report these metrics for the reporting system which will be used or for other tools. When we started to develop SDR we selected RRDtool as a reporting tool for SDR. Therefore each recorder, be default, will report data easy to be digested by RRDtool.

    However it is easy to switch or change the raw data from the current format to a CSV or a custom format, if you need. Remember that SDR Recorder's raw data is not really designed for direct human contact, even if this is possible !

  17. You keep talking about SDR main recorders or the default recorders... Are there basic, main recorders and other type of recorders ?

    SDR Recorders are organized in two parts: the main or the default recorders responsible for collecting metrics regarding overall system performance, cpu, mem, disk, network and additional recorders monitoring applications or different parts of the operating system: processes, java applications, virtual machines.

    The main recorders are: sysrec, cpurec, diskrec and nicrec and they look over the main four system resources a computing system has: CPU, Memory, Disk and Network.

    All the other recorders are additional and are used to monitor different other class of applications or part of the system. They are: netrec, procrec, jvmrec, zonerec, corerec or webrec.

  18. I see different release numbers for different operating systems. Why?

    SDR Recording module contains recorders specific to different operating systems and applications. Linux 2.6 is our main development platform. Solaris 10 comes next. We cannot support all sort of releases and therefore our recorders will have different versions for different operating systems. In addition some operating system releases do not include same level of functionalities as our main platform.

  19. Is it so that all recorders run and output raw data at the same second ?

    Short answer, no. If it happens that you have a fast enough system and you start all recorders at once then it is true that all recorders will collect data at same second. It is important to notice here that all main recorders are Perl based probes using Time::HiRes, POSIX modules and using hires timers. This way we ensure that we dont lose time and we keep same second as the recorder was started.

    We have seen under Linux and Solaris x86 based systems that main recorders are able to keep same second if started at once!


Reporting

  1. Why can't I download a package for the reporting module ?

    Not at this moment. The reporting side always differs from case to case. To make a generic package, which works all cases might be difficult. In near future a simple generic reporting package will be offered but it will require custom procedures to setup the reports, from case to case.

  2. What operating systems the reporting module supports ?

    UNIX: Solaris SPARC, x86, FreeBSD or Linux based operating systems.

  3. Why can't I input a custom date under System Statistics - 7 days ?

    Drill down report is currently being designed. You will be able to select a certain period of time, in addition to the preformat data, and data mine across all server's data and display all server metrics.

  4. The reporting module seems a bunch of static images, why ?

    SDR Reporting module is not a analytics package. It is not based on a relational database system nor has any substantial server side programming implemented.

    The module includes the plots for different metrics and data gathered from all configured servers. In addition the reporting module displays as well other useful information: server characteristics, a per server stretch factor (a derivative metric out of the load average), a workload management part analyzing the applications running on that server and a prediction module.

    There are several reasons why:

    • Time: with minimal effort of clicks you should obtain the needed information, pre-formatted for you in matter of seconds. If you require certain custom reports, the reporting module has to be configure to do that.

    • Simplicity: minimize the number of defects by having a complex server side software. Simple modify or customize the reporting module, based on your input.

    • Efficiency: avoid complex Flash based solutions, which sometimes are eating your computer system resources: CPU, Network bandwidth.

    • Education:Avoid spending lots of your cash in long and boring education packages, where you simple learn how to click between functions of a software package.

    • Recovery:Long hours trying to restore a large database, where all your aggregated data is stored. Simple and easy recovery was another reason why to select Round Robin Database and a simple scripting language to do the job. In our lab to restore data for 1 server over a year took around approx. 10 minutes.

  5. How is the reporting module developed: using open-source software ?

    Yes. The reporting engine is using open-source software and all its components are as well available back as open-source software ! This way you can easily build your own reporting engine and integrate it as you like for your preferate operating system platform. However a prebuilt reporting package is not offered at this time, unless you are a registered user of SDR !

  6. The reporting module is powered by a HTTP server. What server is that ?

    Starting with SDR 0.73 the default webserver is Engine X, NGINX.

  7. Can I use a different HTTP server instead the default one ?

    Sure, you can use any HTTP server you like. Just make sure you install your HTTP server under /opt/sdr/report/ws or point your installation to SDR. Remember the default docroot is defined under /opt/sdr/report/docroot !

  8. Can I compile my own version of Apache and use that for SDR ?

    Sure, you can use Apache HTTP server for SDR. Compile and install Apache HTTP server and point your installation to SDR. See above example !

  9. Why are you using NGINX HTTP server for SDR Reporting module ?

    Because of its merits: speed, simplicity, architecture and scalability.


Development: recording, reporting

  1. I would like to write my own recorder but reuse parts of SDR, can I do this ?

    Yes you can. At this moment we dont provide a complete SDK to help in writing and developing new recorders but you can easily check some ground rules when writting new tools for SDR. You can as well easily change and modify our recorders or reporting tools for your own purpose !

  2. Do you have a SDK for writing and developing new recorders ?

    No, we dont have a SDK for this ! You can follow certain examples and respect some rules when you plan to write new recorders or reporting utilities for SDR ! For start you can download the SDR Recording package, selecting your own platform and start coding !

  3. What basic rules should I check and follow when writing a new recorder, reporting analyzer tool for SDR ?

    Majority of all our recorders and reporting tools are Perl scripts using SDR Perl distribution, compiled and tuned for SDR usage ! It is important when you plan to write a new recorder, for example, to follow certain basic rules:

    • Language: Perl5. You should try to write the new recorder or reporting utility using Perl 5 language. You can use your own Perl development environment but make sure you properly integrate and test the new application using SDR Perl5 distribution ! If you have a strong reason why you would like to use a different language please submit a new bug in Bugzilla ! Of course certain recorders can be written in different programming language, like C.

    • Test and Integration IT: This is the phase when you need to submit all your code for review and integration with all the other SDR tools ! At this phase you should be able to submit a number of test results for your new recorder or report utility: profiling, perlcritic, code coverage reports to the gatekeeper of SDR and wait for integration approval !

    • Profiling: You should profile and check your code using Devel::NYTProf profiler ! Present the profiling report in HTML format.

    • Code Coverage: You should properly test your code and ensure how thoroughly tests exercise code ! Use Devel::Cover for this task and present a detailed HTML report for this phase.

    • Perl Critic: Always make sure you run perlcritic against your new code and ensure you pass the default level of perlcritic. Make sure you run perlcritic severity level 4 !

    • Recorder Template: Use the following template when developing a new recorder:

            #!/opt/sdr/perl/bin/perl -w
            #
            # System Data Recorder: newrec
            # 
            # newrec - records XYZ telemetry 
            #
            # USAGE:    newrec [-hv] [[interval [count]]
            #    eg,
            #           newrec 60      # print continously every minute 
            #                          # XYZ telemetry
            #
            # COPYRIGHT: Copyright (c) 2011 System Data Recorder
            #
            #  This program is free software; you can redistribute it and/or
            #  modify it under the terms of the GNU General Public License
            #  as published by the Free Software Foundation; either version 2
            #  of the License, or (at your option) any later version.
            #
            #  This program is distributed in the hope that it will be useful,
            #  but WITHOUT ANY WARRANTY; without even the implied warranty of
            #  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
            #  GNU General Public License for more details.
            #
            #  You should have received a copy of the GNU General Public License
            #  along with this program; if not, write to the Free Software Foundation,
            #  Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
            #
            #  (http://www.gnu.org/copyleft/gpl.html)
      
            #
            # SDR VERSION: MAJOR.MINOR.X
            # 
            # HISTORY
            # 17-Jul-2010 Perl version, RRD format output  - 0.73 sp
            # 26-Dec-2010 New Output fixes                 - 0.73 sp
      
      
            use strict;
            use Getopt::Std;
            use Time::HiRes qw(time alarm setitimer ITIMER_REAL);
            use POSIX qw(pause);
            # use other Perl specific modules
      
      
            # Debug Only
            # use Data::Dumper;
      
            #
            # Command line arguments
            #
            usage() if defined $ARGV[0] and $ARGV[0] eq "--help";
            getopts('hv') or usage();
            usage() if defined $main::opt_h;
            revision() if defined $main::opt_v;
      
            # process [[interval [count]]
            my ($interval, $loop_max);
            if (defined $ARGV[0]) {
                $interval = $ARGV[0];
                $loop_max = defined $ARGV[1] ? $ARGV[1] : 2**32;
                usage() if $interval == 0;
            }
            else {
                $interval = 1;
                $loop_max = 1; 
            }
      
            # Variables
            my $loop = 0;              # current loop number
            $main::opt_h = 0;          # help option
            $main::opt_v = 0;          # revision option
            $| = 1;                    # autoflush
      
      
            # ######### #
            # MAIN BODY #
            # ######### #
      
            # Set a timer for S::S::L object, for example
            $SIG{ALRM} = sub { };
      
            setitimer(ITIMER_REAL, .2, .2);
            my $lxs = Sys::Statistics::Linux->new(
                cpustats  => 1
            );
            ### .2sec sleep using a timer
            pause;
      
            # how often do we trigger (seconds)?
            my $first_interval = $interval;
      
            # signal handler is empty
            $SIG{ALRM} = sub { };
      
            # first value is the initial wait, second is the wait thereafter
            setitimer(ITIMER_REAL, $first_interval, $interval);
      
            while (1) {
      
                ### Get Stats
                # ...
      
                # Print Stats
                # ...
      
                ### Check for end
                if ( ++$loop == $loop_max ) {
                    last;
                }
      
                ### Interval
                pause;
            }
      
            #
            # usage - print usage and exit.
            #
            sub usage {
              print STDERR << END;
            USAGE: newrecr [-hv] | [interval [count]]
               eg, newrec         # print summary since boot only
                   newrec 5       # print continually every 5 seconds
                   rewrec 1 5     # print 5 times, every 1 second
             Fields:
              ...
            END
            exit 0;
            }
      
      
            # revision - print revision and exit
            #
            sub revision {
                   print STDERR <<END;
            newrec: major.minor.x, YYYY-MM-DD
            END
                   exit 0;
            }
        

  4. I love Python and I really would like to write my recorder using this language. Can I do that ?

    At the moment SDR Recording/Reporting does not package its own Python environment. Perl5 has been selected as the main scripting engine for recording/reporting modules. If you however see important that Python is more capable than Perl and you can produce a test case where you can develop some Solaris, Linux recorders based on Python same way we do in Perl, please submit a new bug in bugzilla !

  5. Why can't I use my own Perl distribution, I really dislike the way SDR deploys Perl.

    SDR uses its own Perl compiler in order to minimize the relation and dependencies on operating system. SDR Recording and Reporting has all needed Perl modules in order to run correctly the recorders and reporting tools. Of course you can use your own Perl distribution but make sure you have all needed modules and test before you do that. SDR uses Perl 5.12.x as a baseline version for recording and reporting modules.

  6. I am fluent in C and some recorders really can be re-written in C to keep resources usage minimal. Can I do that ?

    See above example. Of course you can write a recorder using C language but you should have a strong reason to do that. As well you should ensure you will profile and test the code as good as possible ! We encourage people using scripting languages for easy development and troubleshooting, like Perl5 !

  7. Why should I use perl critic ? Im confident my code is readable and is correct. I dont have time to run perl critic against my new recorder !

    There are certain rules before accepting new code into SDR: you need to profile, run a code coverage test and execute percritic against your new code before integrating into SDR ! If you fail providing these reports you cannot integrate with SDR. Perl Critic is a static source code analysis engine which will enforce that your code will use certain guidelines ! This way we ensure we manage and troubleshoot easy code from anyone !