SDR - Frequently Asked Questions
This is the FAQ section for SDR
This is the FAQ section for SDR
The main idea of SDR is to combine operating system metrics: CPU, Mem, Disk and Net I/O in terms of utilization and queuing and couple these with RRD, R and PDQ for statistical analysis and prediction. SDR places an important role on the data collected, the raw data, which is simple stored on commodity disk drives: SATA, SAS without the need of a relational database management system.
SDR lets you have a numbers of light recorders specific for certain purpose: monitor CPU, DiskIO, NetIO, or some application. At some point of time you want to add or modify certain metrics from your recordings. You are restricted to do that using SAR, unless you are a kernel developer or you plan enhancing the tool. SDR on the other hand can easily be modified and enhanced in matter of minutes.
For example, lets say you plan to monitor the number of file descriptors per process. You cant use SAR for such thing, so most likely you will need to write your own probe, script or use another standard OS utility. Instead of these you can easily use procrec and if you are not happy with the results change the recorder as you please.
SAR still remains a very useful monitoring tool, which ships with SDR by default for Linux based operating systems. SDR can use SAR by default, if needed.
SDR tries to stay simple and follow GCaP methodologies and help you build a capacity plan for your IT infrastructure. Majority of the current commercial IT performance monitoring solutions are complex and large. They include application performance monitoring, end to end monitoring modules along with event alerting. These make them good for certain goals, bad for the other purposes. You need to dedicate a lot of time to learn and administer such systems. SDR focuses on: performance monitoring, analysis and forecasting. At the end you should have a simple Capacity Planning model for your site or application, with minimal time spent and investment.
They are all good for their job. Some started as network monitoring solutions, some measuring clusters and grids, majority showing plots about data all over. SDR tries to gather and have a logic between the data collected and follow a path towards capacity planning. As mentioned, SDR is a proof of concept built around GCaP methodologies.
collectd is a fine system performance collector based on plugins.
It does support several Operating System platforms, outputting data for
different formats: RRD, CSV using output plugins. The collectd daemon combines different
techniques within to fetch the performance metrics, for example the CPU utilization
on Linux or Solaris:
"Why is the CPU usage split up in so many files? Can I change that?
The short answer is: That is because otherwise backwards compatibility would be impossible
and you would have to re-create your files from scratch regularly. And, "no".
The long answer and explanation of the short answer is: collectd runs on a variety of
operating systems. Each operating system has it's own method for accounting
CPU states, memory consumption, swap usage, and so on. If all these data sources where
in one data set, every new supported operating system or any addition to an already
supported operating system would mean that we need to modify the data set.
...
FAQ collectd.org"
Some points between SDR and collectd:
Language: SDR is based on a dynamic scripting language. We want few recorders, no dozens of plugins! 4 main recorders, responsible for overall system performance, CPU, Disk and Network and some extra, additional recorders for different other jobs: HTTP workloads, JVM workloads, per process statistics, ... We use Perl5, a mature language with a serious development environment containing a vast number of modules available on CPAN.
Interface: each recorder must be a command-line interface utility. Simple and easy to use for long trend or temporary recordings. Each recorder should be simple to start and stop without any dependencies on network protocols or different operating system or 3rd parties utilities. Each recorder must use OS interfaces to fetch performance metrics , like Solaris' kstat or Linux's proc interfaces and store this set of data on disk on a certain output format, easy to be used for the Reporting module. The recording process consists of one or many recorders which run at a a fixed interval of time. The interval of time between samples is defined in seconds. SDR supports sub-second time intervals !
Simplicity: Each recorder must have a manual page and a simple help option which should describe the metrics collected from OS or applications. Each recorder should not deliver complex logic supporting many Operating System interfaces within same recorder ! Use one thing at the time, but use it well !
Modularity: We believe one tool cannot do all jobs ! So we have assigned different tasks to different recorders, making easy and simple to update a recorder without to break others.
Reporting: SDR is trying to deliver a compact and ready to use interface for reporting all collected data intended for Capacity Planning and Performance Analysis !
GCaP: SDR is built on top of principles from Guerrilla Capacity Planning class by Performance Dynamics Prof. Dr. Neil Gunther.
Agent based system. Trying to avoid all marketing nonsense about this topic, agentless systems are much simple to setup, indeed, but they often are confusing: they don't list and define what probes are sent on each target and more complex because they use entirely the network to send-receive data. In majority of cases all agentless systems use Operating System utilities, like mpstat, vmstat to collect various metrics from each target. There is nothing innovative in such approach. SDR, on the other hand, has certain recorders which talk utilization and saturation along with all the other OS metrics. As well SDR tries to stay simple and keep control over the raw data collected with minimal resources. Periodically the raw data is synced to a master site.
You can experiment with SDR recorders and use them directly as custom probes, if you wish, in a agentless system. SDR recorders have been designed to operate no matter what the reporting side is.
It depends. There is no such thing as one tool for all jobs. Try to list and clear define what are your goals: raw data access, performance metrics of each monitored server, application performance monitoring, end to end response time, do you really need to have a capacity planning setup for your site, etc. Consult with your team members and your system or service manager and select a number of software to be evaluated. Select the best fits your organization.
Because it is the human experience behind the performance analysis process not a specific software package! Majority of the IT companies simple buy software to replace people or their competence. That's wrong. And GCaP really helps you understand this and correct it. In addition, GCaP has all needed pieces to understand performance analysis and help you build a capacity plan for a site or a specific application without using any software package nor selling you one!
System-Data-Recorder, so it means Recording. However you can have custom made packages for your installation where all recorded data can be displayed and lots of reports are available. The main idea of SDR is to combine the recording side with a light reporting module, based on RRD and Perl.
No. Sensor Data Repository is used by ipmitool. When using ipmitool S-D-R means totally something else. SDR stands for System Data Recorder !
SDR consists of two main things: Recording and Reporting. You can run one without the other, however many installations requires both. SDR recording at the moment requires Solaris due its tight integration with KSTAT interface. Work is under way to port SDR Recording module to FreeBSD/OpenBSD/NetBSD and RedHat. SDR Reporting can be installed under any POSIX based operating system: Solaris, FreeBSD or RedHat. Feel free to contribute the code for your operating system, if you would like to speed up the process of supporting more operating systems.
SDR recording part includes several recorders designed to collect data from a particular parts of your systems: CPU, Memory, Disk and Network and additional recorders for different other jobs: JVM, Solaris Zones, applications etc. Instead of having one, two general recorders we tried to design 4 main recorders which can be easily maintained and ported and others specialized for other purposes. Simplicity was the main criteria ! For more information see Faq 52.
Simplicity was one of the main reasons behind. KSTAT interface in Solaris can be accessed via a Perl or C program. Brendan Gregg, the author of sysperfstat inspired me to keep using the same way, KSTAT scripts. When I was not able to obtain the information from KSTAT I used a simple Ksh script calling basic OS utilities. This last part needs improvement, example here zonerec, jvmrec. The main goal is to use as few utilities as possible and gather all data from OS interfaces.
We are a small group, majority of us having day jobs. We do most of the work during night, public holidays, summer or winter holidays. There is a full time developer implementing webrec, the response time analyzer in Java. We will try to do our best and reply your emails in short time !
www.systemdatarecorder.org is a research group focusing on: performance analysis, visualization and capacity planning. We are researching and building new tools which should help people and corporations in:
systemdatarecorder.org is a not-for-profit project, so you can easily use our work for commercial or non-commercial purpose.
If you are interested in getting support or ordering a specific SDR module then there is a commercial company which can offer such services: SystemDataRecorder Oy Finland !
For sure. Visit the copyrights section and read carefully the documentation.
Recording Module You can easily reuse all our recorders as custom monitoring probes along with your own software or simple add them to your solution. Recorders are developed as GPL based software !
Reporting Module You can easily build your own reporting module based on the source code. You can add all our reporting tools and enhance it as you please. If you would like to receive help, support or have custom enhancements you can contact System Data Recorder Oy which sells commercial support and customization for SDR !
Please send us email, if you found SDR useful. Thank you !
For sure. See above example.
Yes, we would like to receive help ! Thank you ! Please take a look at Section 100 SDR Development and check the following list of open issues:
Please contact us if you would like to be assigned to other tasks !
As already mentioned SDR tries to stay simple and follow GCaP methodologies. In addition SDR focuses on a simple performance monitoring strategy and a very simple and flexible reporting module based on Perl5, RRDtool, R and PDQ. This approach makes things simple and easy for learning, education and faster to implement and build different customizations.
There is no such thing that one tool for all jobs. So we strongly believe that SDR should focus only on performance monitoring and analysis. No event monitoring, no alert management and no extra complexity of a relation database system which will require extensive programming and maintenance.
We are open and flexible to learn and work with anyone. We believe in open source projects and in idea that recording must be made simple and easy for any SysAdmin or System Manager no matter if they use Linux or UNIX. We are not interested in building large GUI or complex UI which requires extensive programming and effort. Instead we like simple interfaces and fast ones. We dont want that you will waste your time clicking and building reports or developing large templates with SDR. Instead SDR will be ready and offer you already the needed data. When you will need to adjust or change things, SDR will let you do that by scripting.
Part of Infrastructure Monitoring, this includes all aspects of server performance monitoring, focusing on the physical/virtual server and the operating system running on it:
The requirements and the definition of all monitoring points
The operating system monitoring points:
Overall CPU Utilization
Per CPU utilization and additional stats: rate of kernel mutexes, rate of context switches, CPU Percentage servicing interrupts
Memory Utilization and additional stats: total size of used memory, total size of used swap space
Disk IO Utilization: reads + writes across all disks
Per disk utilization and additional stats
Network IO Utilization: reads + writes across all NICs
Per NIC utilization and additional stats: packets that were dropped/sec, collisions, errors
Network Protocol Stats: TCP, UDP, IP
Virtualization stats: containers and virtual machines
Per process stats: owner, state, nice, the priority of the process, the no. of light weight processes, the no. of open file descriptors
Java Virtual Machine Garbage Collection statistic: survivor S0/S1 utilization, old space utilization, permanent space utilization, number of young generation GC events
Additional monitors: platform specific monitors, example SPARC processor utilization
The more recorders, the better !
SDR, by default, is collecting data from different operating systems running on top of some physical or virtual servers. The current supported list of operating systems contains: Linux based systems like, Ubuntu LTS , RedHat Entreprise Linux or Solaris systems. This was the original idea behind SDR. In addition we try to expand this list, including many other operating systems: MacOSX, FreeBSD, Windows but since we are a small group of peole this takes time.
On the other hand the recording module can easily be ported to other devices or computers to retrieve data from. Example we gather every 5 minutes data from a weather device and store this data under our reporting server. So feel free to port or develop new recorders for your devices following our principles!
At this moment, yes ! We are trying to add support for other operating systems as well: FreeBSD, MacOSX or Windows !
As SDR Recording 0.73.x this is the list of supported operating systems:
Linux 2.6+ based kernels:
CentOS 5.x x64
CentOS 6.x x64
RedHat Entreprise Linux 5.x x64
RedHat Entreprise Linux 6.x x64
Ubuntu 10.x LTS x64
Ubuntu 11.x LTS x64
Solaris 8,9 x64 and SPARC (partial)
Solaris 10,11 x64 and SPARC
To gather data from various Solaris zones, KSTAT interface should be used. Currently there is a open effort to improve this. Meanwhile prstat can be used to obtain data for each zone. Extended Process Accounting can as well be used to obtain information from each process running on the physical machine. However at this moment Im looking into new ways to improve this.
Make sure you use at least SDR 0.70 which includes updates related to sysrec and ZFS.
SDR mainly uses the Perl language to fetch and parse system statistics data. It is simple to read, understand and maintain all recorders this way. We have introduced as well certain recorders, as native binaries, example nicrec, to experiment and see the benefits. In general running a Perl application will have a different footprint than a generic native application developed in C, lets say. If our hardware is even older, a very low CPU freq system then the Perl application will add a bit of overhead versus the native application.
For systems powered by low CPU freq, like Ultra5 or Ultra10 Sun hardware, certain recorders will have a high footprint when executed every second, for example. sysrec is one of them. Below a short description of such case:
System Configuration: Sun Microsystems sun4u Sun Blade 100 (UltraSPARC-IIe)
502 MHz SUNW,UltraSPARC-IIe
$ sysrec 1
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
3990 sdr 11M 8760K sleep 49 0 0:02:18 5.7% sysrec/1
3874 sdr 9216K 8520K run 54 0 0:09:37 3.7% prstat/1
3992 sdr 4608K 3984K cpu0 59 0 0:01:30 3.1% prstat/1
127 root 6496K 4016K sleep 59 0 0:01:35 1.0% nscd/35
$ sysrec 2
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
3874 sdr 9216K 8520K sleep 25 0 0:08:17 5.3% prstat/1
3984 sdr 11M 8760K sleep 59 0 0:00:01 2.8% sysrec/1
127 root 6496K 4016K sleep 59 0 0:01:09 0.9% nscd/35
3987 sdr 4096K 3456K cpu0 59 0 0:00:00 0.4% prstat/1
$ sysrec 3
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
3874 sdr 9216K 8520K run 19 0 0:08:14 4.9% prstat/1
3981 sdr 11M 8760K sleep 59 0 0:00:01 2.1% sysrec/1
3983 sdr 4096K 3472K cpu0 54 0 0:00:00 1.4% prstat/1
127 root 6496K 4016K sleep 59 0 0:01:09 0.9% nscd/35
A very busy system:
$ sysrec 1
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
4000 sdr 4944K 2520K run 0 4 0:01:01 84% perl/1
3990 sdr 11M 8760K sleep 49 0 0:02:27 5.7% sysrec/1
4008 sdr 4096K 3456K cpu0 59 0 0:00:01 3.9% prstat/1
127 root 6496K 4016K sleep 59 0 0:01:36 0.4% nscd/35
[...]
1303199130:100.00:86.52:0.00:0.00:0.00:0.00:0.00:0.00:96.00:4.00:0.00:0.98:0.53:0.38
1303199131:99.00:86.52:0.00:0.00:0.00:0.00:0.00:0.00:96.00:3.00:1.00:0.98:0.53:0.38
1303199132:100.00:86.52:0.00:0.00:1.00:0.00:0.00:0.00:96.00:4.00:0.00:0.98:0.53:0.38
1303199133:98.99:86.52:0.00:0.00:0.00:0.00:0.00:0.00:95.99:3.00:1.01:0.98:0.54:0.38
1303199134:100.00:86.52:0.00:0.00:0.00:0.00:0.00:0.00:96.00:4.00:0.00:0.98:0.54:0.38
1303199135:100.00:86.52:0.00:0.00:0.00:0.00:0.00:0.00:97.00:3.00:0.00:0.98:0.54:0.38
1303199136:100.00:86.52:0.00:0.00:0.00:0.00:0.00:0.00:96.00:4.00:0.00:0.98:0.54:0.38
1303199137:100.00:86.52:0.00:0.00:0.00:0.00:0.00:0.00:96.00:4.00:0.00:0.99:0.54:0.38
1303199138:99.00:86.52:0.00:0.00:0.00:0.00:0.00:0.00:96.00:3.00:1.00:0.99:0.54:0.38
1303199139:100.00:86.52:0.00:0.00:0.00:0.00:0.00:0.00:97.00:3.00:0.00:0.99:0.54:0.38
[...]
|
To analyze sysrec's footprint we will turn to the original version of sysrec, sysperfstat and check its footprint using a Sun hdw workstation: Ultra 10, running Solaris 10 U7. We will try to profile the sysperfstat and sysrec recordes and check their footprints !
sysperfstat, original Gregg Brendan author
Initial State: System CPU 99% Utilisation
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
4018 sdr 4944K 2272K run 38 4 0:03:47 94% perl/1
4021 sdr 4096K 3472K cpu0 59 0 0:00:10 4.9% prstat/1
127 root 6496K 4016K sleep 59 0 0:01:37 0.3% nscd/35
625 noaccess 199M 115M sleep 59 0 0:04:29 0.3% java/18
3960 sdr 7512K 5536K sleep 59 0 0:00:04 0.1% sshd/1
217 root 3144K 1624K sleep 100 - 0:00:10 0.0% xntpd/1
NPROC USERNAME SWAP RSS MEMORY TIME CPU
12 sdr 9728K 19M 0.9% 0:05:59 99%
27 root 45M 51M 2.5% 0:02:58 0.3%
1 noaccess 114M 115M 5.6% 0:04:29 0.3%
1 smmsp 1560K 6768K 0.3% 0:00:00 0.0%
2 daemon 1824K 6072K 0.3% 0:00:00 0.0%
Total: 43 processes, 190 lwps, load averages: 1.07, 0.61, 0.37
$ ./sysperfstat 1 2
------ Utilisation ------ ------ Saturation ------
Time %CPU %Mem %Disk %Net CPU Mem Disk Net
11:08:56 66.92 31.26 3.17 0.75 0.30 0.01 0.26 0.00
11:08:57 100.00 31.28 0.00 0.00 0.00 0.00 0.00 0.00
sysperfstat uses OS's Perl version:
Total Elapsed Time = 1.584845 Seconds
User+System Time = 0.444845 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c Name
29.0 0.129 0.129 751 0.0002 0.0002 Sun::Solaris::Kstat::_Stat::FETCH
24.7 0.110 0.110 2 0.0550 0.0550 Sun::Solaris::Kstat::update
8.99 0.040 0.040 1 0.0400 0.0400 Sun::Solaris::Kstat::new
6.74 0.030 0.030 3 0.0100 0.0099 vars::BEGIN
4.50 0.020 0.150 1 0.0200 0.1500 main::discover_net
4.50 0.020 0.109 3 0.0066 0.0362 Sun::Solaris::Kstat::BEGIN
2.25 0.010 0.118 2 0.0049 0.0591 main::BEGIN
2.25 0.010 0.010 2 0.0049 0.0048 DynaLoader::BEGIN
2.25 0.010 0.010 4 0.0025 0.0025 strict::unimport
2.25 0.010 0.020 6 0.0016 0.0033 AutoLoader::BEGIN
2.25 0.010 0.010 2 0.0049 0.0048 main::fetch_cpu
0.00 0.000 0.000 1 0.0000 0.0000 Config::launcher
0.00 - -0.000 1 - - DynaLoader::dl_load_file
0.00 - -0.000 1 - - version::(bool
0.00 - -0.000 1 - - version::(cmp
Total Elapsed Time = 1.774845 Seconds
User+System Time = 0.444845 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c Name
24.5 0.109 0.109 751 0.0001 0.0001 Sun::Solaris::Kstat::_Stat::FETCH
22.4 0.100 0.100 2 0.0500 0.0500 Sun::Solaris::Kstat::update
11.2 0.050 0.150 1 0.0500 0.1500 main::discover_net
8.99 0.040 0.109 3 0.0132 0.0362 Sun::Solaris::Kstat::BEGIN
6.74 0.030 0.030 1 0.0300 0.0300 Sun::Solaris::Kstat::new
6.74 0.030 0.030 3 0.0100 0.0099 vars::BEGIN
2.25 0.010 0.118 2 0.0049 0.0591 main::BEGIN
2.25 0.010 0.010 2 0.0049 0.0048 DynaLoader::BEGIN
2.25 0.010 0.030 7 0.0014 0.0042 Config::FETCH
0.00 0.000 0.000 1 0.0000 0.0000 Config::launcher
0.00 - -0.000 1 - - DynaLoader::dl_load_file
0.00 - -0.000 1 - - version::(bool
0.00 - -0.000 1 - - version::(cmp
0.00 - -0.000 1 - - Config::TIEHASH
0.00 - -0.000 1 - - Config::import
sysperfstat uses SDR's Perl version:
Total Elapsed Time = 1.654845 Seconds
User+System Time = 0.444845 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c Name
24.5 0.109 0.109 751 0.0001 0.0001 Sun::Solaris::Kstat::_Stat::FETCH
22.4 0.100 0.100 2 0.0500 0.0500 Sun::Solaris::Kstat::update
8.99 0.040 0.150 1 0.0400 0.1500 main::discover_net
8.99 0.040 0.040 1 0.0400 0.0400 Sun::Solaris::Kstat::new
8.99 0.040 0.109 3 0.0132 0.0362 Sun::Solaris::Kstat::BEGIN
6.74 0.030 0.030 3 0.0100 0.0099 vars::BEGIN
2.25 0.010 0.118 2 0.0049 0.0591 main::BEGIN
2.25 0.010 0.010 2 0.0049 0.0048 DynaLoader::BEGIN
2.25 0.010 0.010 2 0.0049 0.0048 main::fetch_cpu
2.25 0.010 0.009 2 0.0048 0.0046 main::fetch_disk
0.00 0.000 0.000 1 0.0000 0.0000 Config::launcher
0.00 - -0.000 1 - - DynaLoader::dl_load_file
0.00 - -0.000 1 - - version::(bool
0.00 - -0.000 1 - - version::(cmp
0.00 - -0.000 1 - - Config::TIEHASH
Total Elapsed Time = 1.564006 Seconds
User+System Time = 0.424006 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c Name
27.8 0.118 0.118 751 0.0002 0.0002 Sun::Solaris::Kstat::_Stat::FETCH
23.5 0.100 0.100 2 0.0500 0.0500 Sun::Solaris::Kstat::update
11.7 0.050 0.108 3 0.0165 0.0362 Sun::Solaris::Kstat::BEGIN
9.43 0.040 0.159 1 0.0397 0.1594 main::discover_net
7.08 0.030 0.030 1 0.0300 0.0300 Sun::Solaris::Kstat::new
7.08 0.030 0.030 3 0.0100 0.0099 vars::BEGIN
2.36 0.010 0.010 1 0.0100 0.0100 DynaLoader::dl_load_file
2.36 0.010 0.020 1 0.0099 0.0198 DynaLoader::bootstrap
2.36 0.010 0.010 2 0.0049 0.0048 DynaLoader::BEGIN
0.00 0.000 0.000 1 0.0000 0.0000 Config::launcher
0.00 - -0.000 1 - - version::(bool
0.00 - -0.000 1 - - version::(cmp
0.00 - -0.000 1 - - Config::TIEHASH
0.00 - -0.000 1 - - Config::import
0.00 - -0.000 1 - - Config::AUTOLOAD
|
SDR 0.73.1 sysrec based on sysperfstat
Initial State: System CPU 99% Utilisation
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
4018 sdr 4944K 2272K run 18 4 0:10:57 94% perl/1
4043 sdr 4096K 3456K cpu0 59 0 0:00:00 1.6% prstat/1
625 noaccess 199M 115M sleep 59 0 0:04:30 0.3% java/18
127 root 6496K 4016K sleep 59 0 0:01:37 0.1% nscd/35
3960 sdr 7512K 5536K sleep 59 0 0:00:04 0.1% sshd/1
3970 sdr 1792K 1400K sleep 59 0 0:00:00 0.1% ksh/1
3901 sdr 7512K 5448K sleep 59 0 0:00:04 0.0% sshd/1
217 root 3144K 1624K sleep 100 - 0:00:10 0.0% xntpd/1
3916 sdr 3088K 2352K sleep 59 0 0:00:00 0.0% ksh93/1
623 root 8656K 1976K sleep 59 0 0:00:02 0.0% sendmail/1
131 root 3680K 2424K sleep 59 0 0:00:03 0.0% picld/7
123 daemon 4912K 2976K sleep 59 0 0:00:00 0.0% kcfd/3
236 daemon 3160K 1072K sleep 59 0 0:00:00 0.0% rpcbind/1
289 root 16M 5168K sleep 59 0 0:00:06 0.0% fmd/21
628 root 5600K 1664K sleep 59 0 0:00:00 0.0% sshd/1
NPROC USERNAME SWAP RSS MEMORY TIME CPU
12 sdr 9744K 19M 0.9% 0:13:00 95%
1 noaccess 114M 115M 5.6% 0:04:30 0.3%
27 root 45M 51M 2.5% 0:02:58 0.1%
1 smmsp 1560K 6768K 0.3% 0:00:00 0.0%
2 daemon 1824K 6072K 0.3% 0:00:00 0.0%
Total: 43 processes, 190 lwps, load averages: 1.08, 0.96, 0.64
$ /opt/sdr/bin/sysrec 1 2
1303201057:67.27:86.51:3.14:0.74:0.29:0.01:0.26:0.00:52.79:14.47:32.74:1.05:0.98:0.68
1303201058:99.42:86.51:0.00:0.00:0.00:0.00:0.00:0.00:96.44:2.98:0.58:1.05:0.98:0.68
Total Elapsed Time = 1.731365 Seconds
User+System Time = 0.581364 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c Name
17.2 0.100 0.100 2 0.0500 0.0500 Sun::Solaris::Kstat::update
12.0 0.070 0.120 1 0.0700 0.1200 main::discover_net
11.8 0.069 0.265 5 0.0139 0.0531 main::BEGIN
10.3 0.060 0.060 1 0.0600 0.0600 Sun::Solaris::Kstat::new
8.43 0.049 0.049 763 0.0001 0.0001 Sun::Solaris::Kstat::_Stat::FETCH
5.16 0.030 0.030 3 0.0100 0.0099 vars::BEGIN
3.44 0.020 0.020 4 0.0050 0.0050 Exporter::import
3.44 0.020 0.030 2 0.0100 0.0149 Tie::Hash::BEGIN
3.44 0.020 0.020 2 0.0099 0.0098 XSLoader::load
1.72 0.010 0.010 1 0.0100 0.0100 POSIX::AUTOLOAD
1.72 0.010 0.010 1 0.0100 0.0100 Exporter::export
1.72 0.010 0.010 3 0.0033 0.0033 AutoLoader::import
1.72 0.010 0.010 4 0.0025 0.0025 DynaLoader::dl_load_file
1.72 0.010 0.010 2 0.0050 0.0049 Exporter::as_heavy
1.72 0.010 0.040 3 0.0033 0.0132 POSIX::SigRt::BEGIN
Total Elapsed Time = 1.650410 Seconds
User+System Time = 0.610410 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c Name
17.6 0.108 0.108 763 0.0001 0.0001 Sun::Solaris::Kstat::_Stat::FETCH
16.3 0.100 0.100 2 0.0500 0.0500 Sun::Solaris::Kstat::update
9.83 0.060 0.060 1 0.0600 0.0600 Sun::Solaris::Kstat::new
8.03 0.049 0.275 5 0.0099 0.0551 main::BEGIN
6.55 0.040 0.040 3 0.0133 0.0133 vars::BEGIN
4.91 0.030 0.039 7 0.0042 0.0056 POSIX::BEGIN
3.28 0.020 0.020 4 0.0050 0.0050 DynaLoader::dl_load_file
3.28 0.020 0.020 2 0.0100 0.0099 Exporter::as_heavy
3.28 0.020 0.020 2 0.0100 0.0098 Tie::Hash::BEGIN
3.28 0.020 0.119 1 0.0197 0.1194 main::discover_net
1.64 0.010 0.010 1 0.0100 0.0100 POSIX::bootstrap
1.64 0.010 0.010 1 0.0100 0.0100 Sun::Solaris::Kstat::bootstrap
1.64 0.010 0.010 3 0.0033 0.0033 AutoLoader::import
1.64 0.010 0.010 4 0.0025 0.0025 Exporter::import
1.64 0.010 0.030 3 0.0033 0.0099 POSIX::SigRt::BEGIN
Total Elapsed Time = 1.830410 Seconds
User+System Time = 0.610410 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c Name
16.3 0.100 0.100 2 0.0500 0.0500 Sun::Solaris::Kstat::update
11.1 0.068 0.068 763 0.0001 0.0001 Sun::Solaris::Kstat::_Stat::FETCH
9.83 0.060 0.060 1 0.0600 0.0600 Sun::Solaris::Kstat::new
9.67 0.059 0.285 5 0.0119 0.0571 main::BEGIN
8.19 0.050 0.119 1 0.0497 0.1194 main::discover_net
6.55 0.040 0.040 3 0.0133 0.0133 vars::BEGIN
4.91 0.030 0.039 7 0.0042 0.0056 POSIX::BEGIN
3.28 0.020 0.020 2 0.0100 0.0099 Exporter::as_heavy
3.28 0.020 0.020 2 0.0100 0.0098 Tie::Hash::BEGIN
3.28 0.020 0.020 2 0.0099 0.0098 DynaLoader::bootstrap
1.64 0.010 0.010 1 0.0100 0.0100 POSIX::bootstrap
1.64 0.010 0.010 3 0.0033 0.0033 AutoLoader::import
1.64 0.010 0.010 4 0.0025 0.0025 Exporter::import
1.64 0.010 0.030 3 0.0033 0.0099 POSIX::SigRt::BEGIN
1.64 0.010 0.010 2 0.0049 0.0048 DynaLoader::BEGIN
|
sysrec is based on sysperfstat and on top of this adds certain new metrics. Profiling the code we can see that the footprint is a bit higher but not very much. In Production systems sysrec will not run every second unless you are interactively debugging or investigating some problem. In the long run sysrec des run using a more relaxed interval of time: 60+ seconds or so. If you plan to use SDR on old hardware keep in mind these notes !
The main goal of SDR Recording Module is to deliver system and specialized recorders using a dynamic scripting language, like Perl. However we tried to experiment and have some recorders as C or Java applications. Nicrec under SDR 0.73 for Solaris based systems switched from Perl to C. We did not totally replace the Perl version. Unfortunately Tim Cook, the author of this tool, stopped working on this. We have removed Solaris native version of nicrec with 0.73.1 ! We ship the default Perl nicrec version !
Yes, SDR supports VM guests. VirtualBox and VMware are the most common VM technologies and SDR supports them.
Yes, SDR supports Solaris zones. Make sure you check your configuration: you can manage all zones from global level of the machine. If resource management is in use, then most likely you can consider using SDR inside a Solaris zone.
Make sure you read: "Timekeeping in VMware Virtual Machines" There are certain actions required to be set on the guest OS in order to minimize the chance of having inaccurate times. It has been observed if a guest OS CPU utilization is 80-90% certain SDR recorders start loosing time. This is caused by the guest OS. Our internal testing has recorded such issues on a guest RHEL 5.4 32bit kernel 2.6.18-164. Solaris 10 10/09 s10x_u8wos_08a X86 kernel: 141445-09 seems to handle enough good situations like these.
jvmrec is built ontop of jstat utility delivered by Sun/Oracle's JDK which already records GC statistics from a running Java application. jvmrec currently does not support JRockit nor J9 VMs. IBM's J9 JDK is enough similar to Sun/Oracle's Hotspot VM but it does not include any command line utility similar to jstat, to collect and report GC statistics.
SDR should be capable to analyze not only the operating system's resources but the applications, deployed on certain computer system infrastructure. Currently a large number of applications are deployed as web applications, using the HTTP protocol. A very clear requirement was to develop a light recorder to capture metrics from these applications. For start, the response time of one or many HTTP actions, was clear needed.
In order to gather the response times from a series of HTTP actions we have created webrec, a simple HTTP recorder based on Apache HTTPClient. The recorder is written in Java and is capable of recording multiple HTTP actions gathered as workloads. For each workload a thread is used. So you can use multiple workloads to record multiple HTTP actions of different parts of your site or sites !
If you care how well your HTTP site runs, or how efficient your web application has been developed you need to measure it and store its activity over a long period of time. webrec helps you in this respect measuring your application(s) and storing important metrics. Later you can conduct several types of analysis based on this set of data ! But first you need to have the data.
Currently webrec does support: GET and cookies. Work is under way to implement POST. Later on we are planning in adding support for https and authentication. This is based on SDR 0.73 release for 2010 year !
SDR Recorders are simple probes gathering OS or application metrics ! The recorders must be light and simple enough to report these metrics for the reporting system which will be used or for other tools. When we started to develop SDR we selected RRDtool as a reporting tool for SDR. Therefore each recorder, be default, will report data easy to be digested by RRDtool.
However it is easy to switch or change the raw data from the current format to a CSV or a custom format, if you need. Remember that SDR Recorder's raw data is not really designed for direct human contact, even if this is possible !
SDR Recorders are organized in two parts: the main or the default recorders responsible for collecting metrics regarding overall system performance, cpu, mem, disk, network and additional recorders monitoring applications or different parts of the operating system: processes, java applications, virtual machines.
The main recorders are: sysrec, cpurec, diskrec and nicrec and they look over the main four system resources a computing system has: CPU, Memory, Disk and Network.
All the other recorders are additional and are used to monitor different other class of applications or part of the system. They are: netrec, procrec, jvmrec, zonerec, corerec or webrec.
SDR Recording module contains recorders specific to different operating systems and applications. Linux 2.6 is our main development platform. Solaris 10 comes next. We cannot support all sort of releases and therefore our recorders will have different versions for different operating systems. In addition some operating system releases do not include same level of functionalities as our main platform.
Short answer, no. If it happens that you have a fast enough system and you start all recorders at once then it is true that all recorders will collect data at same second. It is important to notice here that all main recorders are Perl based probes using Time::HiRes, POSIX modules and using hires timers. This way we ensure that we dont lose time and we keep same second as the recorder was started.
We have seen under Linux and Solaris x86 based systems that main recorders are able to keep same second if started at once!
Not at this moment. The reporting side always differs from case to case. To make a generic package, which works all cases might be difficult. In near future a simple generic reporting package will be offered but it will require custom procedures to setup the reports, from case to case.
UNIX: Solaris SPARC, x86, FreeBSD or Linux based operating systems.
Drill down report is currently being designed. You will be able to select a certain period of time, in addition to the preformat data, and data mine across all server's data and display all server metrics.
SDR Reporting module is not a analytics package. It is not based on a relational database system nor has any substantial server side programming implemented.
The module includes the plots for different metrics and data
gathered from all configured servers. In addition the reporting module
displays as well other useful information: server characteristics,
a per server stretch factor (a derivative metric out of the
load average), a workload management part analyzing the applications
running on that server and a prediction module.
There are several reasons why:
Time: with minimal effort of clicks you should obtain the needed information, pre-formatted for you in matter of seconds. If you require certain custom reports, the reporting module has to be configure to do that.
Simplicity: minimize the number of defects by having a complex server side software. Simple modify or customize the reporting module, based on your input.
Efficiency: avoid complex Flash based solutions, which sometimes are eating your computer system resources: CPU, Network bandwidth.
Education:Avoid spending lots of your cash in long and boring education packages, where you simple learn how to click between functions of a software package.
Recovery:Long hours trying to restore a large database, where all your aggregated data is stored. Simple and easy recovery was another reason why to select Round Robin Database and a simple scripting language to do the job. In our lab to restore data for 1 server over a year took around approx. 10 minutes.
Yes. The reporting engine is using open-source software and all its components are as well available back as open-source software ! This way you can easily build your own reporting engine and integrate it as you like for your preferate operating system platform. However a prebuilt reporting package is not offered at this time, unless you are a registered user of SDR !
Starting with SDR 0.73 the default webserver is Engine X, NGINX.
Sure, you can use any HTTP server you like. Just make sure you install your HTTP server under /opt/sdr/report/ws or point your installation to SDR. Remember the default docroot is defined under /opt/sdr/report/docroot !
Sure, you can use Apache HTTP server for SDR. Compile and install Apache HTTP server and point your installation to SDR. See above example !
Because of its merits: speed, simplicity, architecture and scalability.
Yes you can. At this moment we dont provide a complete SDK to help in writing and developing new recorders but you can easily check some ground rules when writting new tools for SDR. You can as well easily change and modify our recorders or reporting tools for your own purpose !
No, we dont have a SDK for this ! You can follow certain examples and respect some rules when you plan to write new recorders or reporting utilities for SDR ! For start you can download the SDR Recording package, selecting your own platform and start coding !
Majority of all our recorders and reporting tools are Perl scripts using SDR Perl distribution, compiled and tuned for SDR usage ! It is important when you plan to write a new recorder, for example, to follow certain basic rules:
Language: Perl5. You should try to write the new recorder or reporting utility using Perl 5 language. You can use your own Perl development environment but make sure you properly integrate and test the new application using SDR Perl5 distribution ! If you have a strong reason why you would like to use a different language please submit a new bug in Bugzilla ! Of course certain recorders can be written in different programming language, like C.
Test and Integration IT: This is the phase when you need to submit all your code for review and integration with all the other SDR tools ! At this phase you should be able to submit a number of test results for your new recorder or report utility: profiling, perlcritic, code coverage reports to the gatekeeper of SDR and wait for integration approval !
Profiling: You should profile and check your code using Devel::NYTProf profiler ! Present the profiling report in HTML format.
Code Coverage: You should properly test your code and ensure how thoroughly tests exercise code ! Use Devel::Cover for this task and present a detailed HTML report for this phase.
Perl Critic: Always make sure you run perlcritic against your new code and ensure you pass the default level of perlcritic. Make sure you run perlcritic severity level 4 !
Recorder Template: Use the following template when developing a new recorder:
#!/opt/sdr/perl/bin/perl -w
#
# System Data Recorder: newrec
#
# newrec - records XYZ telemetry
#
# USAGE: newrec [-hv] [[interval [count]]
# eg,
# newrec 60 # print continously every minute
# # XYZ telemetry
#
# COPYRIGHT: Copyright (c) 2011 System Data Recorder
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software Foundation,
# Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
#
# (http://www.gnu.org/copyleft/gpl.html)
#
# SDR VERSION: MAJOR.MINOR.X
#
# HISTORY
# 17-Jul-2010 Perl version, RRD format output - 0.73 sp
# 26-Dec-2010 New Output fixes - 0.73 sp
use strict;
use Getopt::Std;
use Time::HiRes qw(time alarm setitimer ITIMER_REAL);
use POSIX qw(pause);
# use other Perl specific modules
# Debug Only
# use Data::Dumper;
#
# Command line arguments
#
usage() if defined $ARGV[0] and $ARGV[0] eq "--help";
getopts('hv') or usage();
usage() if defined $main::opt_h;
revision() if defined $main::opt_v;
# process [[interval [count]]
my ($interval, $loop_max);
if (defined $ARGV[0]) {
$interval = $ARGV[0];
$loop_max = defined $ARGV[1] ? $ARGV[1] : 2**32;
usage() if $interval == 0;
}
else {
$interval = 1;
$loop_max = 1;
}
# Variables
my $loop = 0; # current loop number
$main::opt_h = 0; # help option
$main::opt_v = 0; # revision option
$| = 1; # autoflush
# ######### #
# MAIN BODY #
# ######### #
# Set a timer for S::S::L object, for example
$SIG{ALRM} = sub { };
setitimer(ITIMER_REAL, .2, .2);
my $lxs = Sys::Statistics::Linux->new(
cpustats => 1
);
### .2sec sleep using a timer
pause;
# how often do we trigger (seconds)?
my $first_interval = $interval;
# signal handler is empty
$SIG{ALRM} = sub { };
# first value is the initial wait, second is the wait thereafter
setitimer(ITIMER_REAL, $first_interval, $interval);
while (1) {
### Get Stats
# ...
# Print Stats
# ...
### Check for end
if ( ++$loop == $loop_max ) {
last;
}
### Interval
pause;
}
#
# usage - print usage and exit.
#
sub usage {
print STDERR << END;
USAGE: newrecr [-hv] | [interval [count]]
eg, newrec # print summary since boot only
newrec 5 # print continually every 5 seconds
rewrec 1 5 # print 5 times, every 1 second
Fields:
...
END
exit 0;
}
# revision - print revision and exit
#
sub revision {
print STDERR <<END;
newrec: major.minor.x, YYYY-MM-DD
END
exit 0;
}
|
At the moment SDR Recording/Reporting does not package its own Python environment. Perl5 has been selected as the main scripting engine for recording/reporting modules. If you however see important that Python is more capable than Perl and you can produce a test case where you can develop some Solaris, Linux recorders based on Python same way we do in Perl, please submit a new bug in bugzilla !
SDR uses its own Perl compiler in order to minimize the relation and dependencies on operating system. SDR Recording and Reporting has all needed Perl modules in order to run correctly the recorders and reporting tools. Of course you can use your own Perl distribution but make sure you have all needed modules and test before you do that. SDR uses Perl 5.12.x as a baseline version for recording and reporting modules.
See above example. Of course you can write a recorder using C language but you should have a strong reason to do that. As well you should ensure you will profile and test the code as good as possible ! We encourage people using scripting languages for easy development and troubleshooting, like Perl5 !
There are certain rules before accepting new code into SDR: you need to profile, run a code coverage test and execute percritic against your new code before integrating into SDR ! If you fail providing these reports you cannot integrate with SDR. Perl Critic is a static source code analysis engine which will enforce that your code will use certain guidelines ! This way we ensure we manage and troubleshoot easy code from anyone !