Saturday, September 17, 2011

AWK command

http://www.vectorsite.net/tsawk_1.html#m1

* The Awk text-processing language is useful for such tasks as:
  • Tallying information from text files and creating reports from the results.
  • Adding additional functions to text editors like "vi".
  • Translating files from one format to another.
  • Creating small databases.
  • Performing mathematical operations on files of numeric data.
Awk has two faces: it is a utility for performing simple text-processing tasks, and it is a programming language for performing complex text-processing tasks.
The two faces are really the same, however. Awk uses the same mechanisms for handling any text-processing task, but these mechanisms are flexible enough to allow useful Awk programs to be entered on the command line, or to implement complicated programs containing dozens of lines of Awk statements.
Awk statements comprise a programming language. In fact, Awk is useful for simple, quick-and-dirty computational programming. Anybody who can write a BASIC program can use Awk, although Awk's syntax is different from that of BASIC. Anybody who can write a C program can use Awk with little difficulty, and those who would like to learn C may find Awk a useful stepping stone -- with the caution that Awk and C have significant differences beyond their many similarities.
There are, however, things that Awk is not. It is not really well suited for extremely large, complicated tasks. It is also an "interpreted" language -- that is, an Awk program cannot run on its own, it must be executed by the Awk utility itself. That means that it is relatively slow, though it is efficient as interpretive languages go, and that the program can only be used on systems that have Awk. There are translators available that can convert Awk programs into C code for compilation as stand-alone programs, but such translators have to be purchased separately.
One last item before proceeding: What does the name "Awk" mean? Awk actually stands for the names of its authors: "Aho, Weinberger, & Kernighan". Kernighan later noted: "Naming a language after its authors ... shows a certain poverty of imagination." The name is reminiscent of that of an oceanic bird known as an "auk", and so the picture of an auk often shows up on the cover of books on Awk.
BACK_TO_TOP

[1.2] AWK COMMAND-LINE EXAMPLES

* It is easy to use Awk from the command line to perform simple operations on text files. Suppose we have a file named "coins.txt" that describes a coin collection. Each line in the file contains the following information:
  metal  weight in ounces   date minted   country of origin   description
The file has the contents:
gold     1    1986  USA                 American Eagle
   gold     1    1908  Austria-Hungary     Franz Josef 100 Korona
   silver  10    1981  USA                 ingot
   gold     1    1984  Switzerland         ingot
   gold     1    1979  RSA                 Krugerrand
   gold     0.5  1981  RSA                 Krugerrand
   gold     0.1  1986  PRC                 Panda
   silver   1    1986  USA                 Liberty dollar
   gold     0.25 1986  USA                 Liberty 5-dollar piece
   silver   0.5  1986  USA                 Liberty 50-cent piece
   silver   1    1987  USA                 Constitution dollar
   gold     0.25 1987  USA                 Constitution 5-dollar piece
   gold     1    1988  Canada              Maple Leaf
We could then invoke Awk to list all the gold pieces as follows:
awk '/gold/' coins.txt
This tells Awk to search through the file for lines of text that contain the string "gold", and print them out. The result is:
gold     1    1986  USA                 American Eagle
   gold     1    1908  Austria-Hungary     Franz Josef 100 Korona
   gold     1    1984  Switzerland         ingot
   gold     1    1979  RSA                 Krugerrand
   gold     0.5  1981  RSA                 Krugerrand
   gold     0.1  1986  PRC                 Panda
   gold     0.25 1986  USA                 Liberty 5-dollar piece
   gold     0.25 1987  USA                 Constitution 5-dollar piece
   gold     1    1988  Canada              Maple Leaf
* This is all very nice, a critic might say, but any "grep" or "find" utility can do the same thing. True, but Awk is capable of doing much more. For example, suppose we only want to print the description field, and leave all the other text out. We could then change the invocation of Awk to:
awk '/gold/ {print $5,$6,$7,$8}' coins.txt
This yields:
American Eagle  
   Franz Josef 100 Korona
   ingot   
   Krugerrand   
   Krugerrand   
   Panda   
   Liberty 5-dollar piece 
   Constitution 5-dollar piece 
   Maple Leaf
This example demonstrates the simplest general form of an Awk program:
awk <search pattern> {<program actions>}
Awk searches through the input file for each line that contains the search pattern. For each of these lines found, Awk then performs the specified actions. In this example, the action is specified as:
{print $5,$6,$7,$8}
The purpose of the "print" statement is obvious. The "$5", "$6", "$7", and "$8" are "fields", or "field variables", which store the words in each line of text by their numeric sequence. "$1", for example, stores the first word in the line, "$2" has the second, and so on. By default, a "word" is defined as any string of printing characters separated by spaces.Since "coins.txt" has the structure:
  metal  weight in ounces   date minted   country of origin   description
-- then the field variables are matched to each line of text in the file as follows:
metal:        $1
   weight:       $2
   date:         $3
   country:      $4
   description:  $5 through $8
The program action in this example prints the fields that contain the description. The description field in the file may actually include from one to four fields, but that's not a problem, since "print" simply ignores any undefined fields. The alert reader will notice that the "coins.txt" file is neatly organized so that the only piece of information that contains multiple fields is at the end of the line. This is a little contrived, but that's the way examples are.* Awk's default program action is to print the entire line, which is what "print" does when invoked without parameters. This means that the first example:
   awk '/gold/'
-- is the same as:
awk '/gold/ {print}'
Note that Awk recognizes the field variable $0 as representing the entire line, so this could also be written as:
awk '/gold/ {print $0}'
This is redundant, but it does have the virtue of making the action more obvious.* Now suppose we want to list all the coins that were minted before 1980. We invoke Awk as follows:
   awk '{if ($3 < 1980) print $3, "    ",$5,$6,$7,$8}' coins.txt
This yields:
1908      Franz Josef 100 Korona
   1979      Krugerrand 
This new example adds a few new concepts:
  • No search pattern is specified. Without a search pattern, Awk will match all lines in the input file, and perform the actions on each one.
  • We can add text of our own to the "print" statement (in this case, four spaces) simply by enclosing the text in quotes and adding it to the parameter list.
  • An "if" statement is used to check for a date field earlier than 1980, and the "print" statement is executed only if that condition is true.There's a subtle issue involved here, however. In most computer languages, strings are strings, and numbers are numbers. There are operations that unique to each, and one must be specifically converted to the other with conversion functions -- we don't concatenate numbers, and we don't perform arithmetic operations on strings.
    Awk, on the other hand, makes no strong distinction between strings and numbers. In computer-science terms, it is a "weakly-typed" language. All the fields are regarded as strings, but if that string also happens to represent a number, numeric operations can be performed on it. So we can perform an arithmetic comparison on the date field.
* The next example prints out how many coins are in the collection:
   awk 'END {print NR,"coins"}' coins.txt
This yields:
13 coins
The first new item in this example is the END statement. To explain this requires extending the general form of an Awk program to:
awk 'BEGIN              {<initializations>} 
        <search pattern 1> {<program actions>} 
        <search pattern 2> {<program actions>} 
        ...
        END                {<final actions>}'
The BEGIN clause performs any initializations required before Awk starts scanning the input file. The subsequent body of the Awk program consists of a series of search patterns, each with its own program action. Awk scans each line of the input file for each search pattern, and performs the appropriate actions for each string found. Once the file has been scanned, an END clause can be used to perform any final actions required.So this example doesn't perform any processing on the input lines themselves. All it does is scan through the file and perform a final action: print the number of lines in the file, which is given by the "NR" variable. NR stands for "number of records". NR is one of Awk's "pre-defined" variables. There are others, for example the variable NF gives the number of fields in a line, but a detailed explanation will have to wait for later.
* Suppose the current price of gold is $425, and we want to figure out the approximate total value of the gold pieces in the coin collection. We invoke Awk as follows:
   awk '/gold/ {ounces += $2} END {print "value = $" 425*ounces}' coins.txt
This yields:
value = $2592.5
In this example, "ounces" is a "user defined" variable, as opposed to the "standard" pre-defined variables. Almost any string of characters can be used as a variable name in Awk, as long as the name doesn't conflict with some string that has a specific meaning to Awk, such as "print" or "NR" or "END". There is no need to declare the variable, or to initialize it. A variable handled as a string variable is initialized to the "null string", meaning that if we try to print it, nothing will be there. A variable handled as a numeric variable will be initialized to zero.So the program action:
   {ounces += $2}
-- sums the weight of the piece on each matched line into the variable "ounces". Those who program in C should be familiar with the "+=" operator. Those who don't can be assured that this is just a shorthand way of saying:
{ounces = ounces + $2}
The final action is to compute and print the value of the gold:
END {print "value = $" 425*ounces}
The only thing here of interest is that the two print parameters, the literal '"value = $"' and the expression "425*ounces", are separated by a space, not a comma. This concatenates the two parameters together on output, without any intervening spaces.BACK_TO_TOP

[1.3] AWK PROGRAM EXAMPLE

* All this is fun, but each of these examples only seems to nibble away at "coins.txt". Why not have Awk figure out everything interesting at one time?
The immediate objection to this idea is that it would be impractical to enter a lot of Awk statements on the command line, but that's easy to fix. The commands can be written into a file, and then Awk can be told to execute the commands from that file as follows:
   awk -f <awk program file name>
Given an ability to write an Awk program in this way, then what should a "master" "coins.txt" analysis program do? Here's one possible output:
Summary Data for Coin Collection:
  
     Gold pieces:                   nn
     Weight of gold pieces:         nn.nn
     Value of gold pieces:       n,nnn.nn

     Silver pieces:                 nn
     Weight of silver pieces:       nn.nn
     Value of silver pieces:     n,nnn.nn

     Total number of pieces:        nn
     Value of collection:        n,nnn.nn

The following Awk program generates this information:
# This is an awk program that summarizes a coin collection.
   #
   /gold/    { num_gold++; wt_gold += $2 }      # Get weight of gold.
   /silver/  { num_silver++; wt_silver += $2 }  # Get weight of silver.
   END { val_gold = 485 * wt_gold;              # Compute value of gold.
         val_silver = 16 * wt_silver;           # Compute value of silver.
         total = val_gold + val_silver;
         print "Summary data for coin collection:";  # Print results.
         printf ("\n");
         printf ("   Gold pieces:                   %2d\n", num_gold);
         printf ("   Weight of gold pieces:         %5.2f\n", wt_gold);
         printf ("   Value of gold pieces:        %7.2f\n",val_gold);
         printf ("\n");
         printf ("   Silver pieces:                 %2d\n", num_silver);
         printf ("   Weight of silver pieces:       %5.2f\n", wt_silver);
         printf ("   Value of silver pieces:      %7.2f\n",val_silver);
         printf ("\n");
         printf ("   Total number of pieces:        %2d\n", NR);
         printf ("   Value of collection:         %7.2f\n", total); }
This program has a few interesting features:
  • Comments can be inserted in the program by preceding them with a "#".
  • Note the statements "num_gold++" and "num_silver++". C programmers should understand the "++" operator; those who are not can be assured that it simply increments the specified variable by one.
  • Multiple statements can be written on the same line by separating them with a semicolon (";").
  • Note the use of the "printf" statement, which offers more flexible printing capabilities than the "print" statement. "Printf" has the general syntax:printf("<format_code>",<parameters>)
    There is one format code for each of the parameters in the list. Each format code determines how its corresponding parameter will be printed. For example, the format code "%2d" tells Awk to print a two-digit integer number, and the format code "%7.2f" tells Awk to print a seven-digit floating-point number, with two digits to the right of the decimal point.
    Note also that, in this example, each string printed by "printf" ends with a "\n", which is a code for a "newline" (ASCII line-feed code). Unlike the "print" statement, which automatically advances the output to the next line when it prints a line, "printf" does not automatically advance the output, and by default the next output statement will append its output to the same line. A newline forces the output to skip to the next line.
* This program can be stored in a file named "summary.awk", and invoked as follows:
   awk -f summary.awk coins.txt
The output is:
Summary data for coin collection:

      Gold pieces:                    9
      Weight of gold pieces:          6.10
      Value of gold pieces:        2958.50

      Silver pieces:                  4
      Weight of silver pieces:       12.50
      Value of silver pieces:       200.00

      Total number of pieces:        13
      Value of collection:         3158.50


============================================================================
Quick Reference:

This final section provides a convenient lookup reference for Awk programming.
* Invoking Awk:
   awk [-F<ch>] {pgm} | {-f <pgm file>} [<vars>] [-|<data file>]
-- where:
ch:          Field-separator character.
   pgm:         Awk command-line program.
   pgm file:    File containing an Awk program.
   vars:        Awk variable initializations.
   data file:   Input data file.
* General form of Awk program:
BEGIN              {<initializations>} 
   <search pattern 1> {<program actions>} 
   <search pattern 2> {<program actions>} 
   ...
   END                {<final actions>}
* Search patterns:
/<string>/     Search for string.
   /^<string>/    Search for string at beginning of line.
   /<string>$/    Search for string at end of line.
The search can be constrained to particular fields:
$<field> ~ /<string>/   Search for string in specified field.
   $<field> !~ /<string>/  Search for string \Inot\i in specified field.
Strings can be ORed in a search:
/(<string1>)|(<string2>)/
The search can be for an entire range of lines, bounded by two strings:
/<string1>/,/<string2>/
The search can be for any condition, such as line number, and can use the following comparison operators:
== != < > <= >=
Different conditions can be ORed with "||" or ANDed with "&&".
[<charlist or range>]   Match on any character in list or range.
   [^<charlist or range>]  Match on any character not in list or range.
   .                       Match any single character.
   *                       Match 0 or more occurrences of preceding string.
   ?                       Match 0 or 1 occurrences of preceding string.
   +                       Match 1 or more occurrences of preceding string.
If a metacharacter is part of the search string, it can be "escaped" by preceding it with a "\".* Special characters:
   \n     Newline (line feed).
   
Backspace. \r Carriage return. \f Form feed. A "\" can be embedded in a string by entering it twice: "\\".* Built-in variables:
   $0; $1,$2,$3,...  Field variables.
   NR                Number of records (lines).
   NF                Number of fields.
   FILENAME          Current input filename.
   FS                Field separator character (default: " ").
   RS                Record separator character (default: "\n").
   OFS               Output field separator (default: " ").
   ORS               Output record separator (default: "\n").
   OFMT              Output format (default: "%.6g").
* Arithmetic operations:
+   Addition.
   -   Subtraction.
   *   Multiplication.
   /   Division.
   %   Mod.
   ++  Increment.
   --  Decrement.
Shorthand assignments:
x += 2  -- is the same as:  x = x + 2
   x -= 2  -- is the same as:  x = x - 2
   x *= 2  -- is the same as:  x = x * 2
   x /= 2  -- is the same as:  x = x / 2
   x %= 2  -- is the same as:  x = x % 2
* The only unique string operation is concatenation, which is performed simply by listing two strings connected by a blank space.* Arithmetic functions:
   sqrt()     Square root.
   log()      Base \Ie\i log.
   exp()      Power of \Ie\i.
   int()      Integer part of argument.
* String functions:
  • length()Length of string.
  • substr(<string>,<start of substring>,<max length of substring>)Get substring.
  • split(<string>,<array>,[<field separator>])Split string into array, with initial array index being 1.
  • index(<target string>,<search string>)Find index of search string in target string.
  • sprintf()Perform formatted print into string.
* Control structures:
   if (<condition>) <action 1> [else <action 2>]
   while (<condition>) <action>
   for (<initial action>;<condition>;<end-of-loop action>) <action>
Scanning through an associative array with "for":
for (<variable> in <array>) <action>
Unconditional control statements:
break       Break out of "while" or "for" loop.
   continue    Perform next iteration of "while" or "for" loop.
   next        Get and scan next line of input.
   exit        Finish reading input and perform END statements.
* Print:
print <i1>, <i2>, ...   Print items separated by OFS; end with newline.
   print <i1> <i2> ...     Print items concatenated; end with newline.
* Printf():General format:
   printf(<string with format codes>,[<parameters>])
Newlines must be explicitly specified with a "\n".General form of format code:
   %[<number>]<format code>
The optional "number" can consist of:
  • A leading "-" for left-justified output.
  • An integer part that specifies the minimum output width. (A leading "0" causes the output to be padded with zeroes.)
  • A fractional part that specifies either the maximum number of characters to be printed (for a string), or the number of digits to be printed to the right of the decimal point (for floating-point formats).
The format codes are:
   d    Prints a number in decimal format.
   o    Prints a number in octal format.
   x    Prints a number in hexadecimal format.
   c    Prints a character, given its numeric code.
   s    Prints a string.
   e    Prints a number in exponential format.
   f    Prints a number in floating-point format.
   g    Prints a number in exponential or floating-point format.
* Awk can perform output redirection (using ">" and ">>") and piping (using "|") from both "print" and "printf"

Wednesday, September 14, 2011

Solaris information

Gathering Solaris system informationA UNIX administrator may be asked to gather system information about his/her Solaris systems. Here are the commands used on a Solaris 7 system to gather various system information.

Processors
The psrinfo utility displays processor information. When run in verbose mode, it lists the speed of each processor and when the processor was last placed on-line (generally the time the system was started unless it was manually taken off-line).


/usr/sbin/psrinfo -v
Status of processor 1 as of: 12/12/02 09:25:50
  Processor has been on-line since 11/17/02 21:10:09.
  The sparcv9 processor operates at 400 MHz,
        and has a sparcv9 floating point processor.
Status of processor 3 as of: 12/12/02 09:25:50
  Processor has been on-line since 11/17/02 21:10:11.
  The sparcv9 processor operates at 400 MHz,
        and has a sparcv9 floating point processor.



The psradm utility can enable or disable a specific processor.


To disable a processor:
/usr/sbin/psradm -f processor_id
To enable a processor:

/usr/sbin/psradm -n processor_id


The psrinfo utility will display the processor_id when run in either standard or verbose mode.


RAM

The prtconf utility will display the system configuration, including the amount of physical memory.


To display the amount of RAM:


/usr/sbin/prtconf | grep Memory
Memory size: 3072 Megabytes





Disk space

Although there are several ways you could gather this information, the following command lists the amount of kilobytes in use versus total kilobytes available in local file systems stored on physical disks. The command does not include disk space usage from the /proc virtual file system, the floppy disk, or swap space.

df -lk | egrep -v "Filesystem|/proc|/dev/fd|swap" | awk '{ total_kbytes += $2 } { used_kbytes += $3 } END { printf "%d of %d kilobytes in use.\n", used_kbytes, total_kbytes }'
19221758 of 135949755 kilobytes in use.



You may want to convert the output to megabytes or gigabytes and display the statistics as a percentage of utilization.


The above command will list file system usage. If you are interested in listing physical disks (some of which may not be allocated to a file system), use the format command as the root user, or the iostat -En command as a non-privileged user.




Processor and kernel bits

If you are running Solaris 2.6 or earlier, you are running a 32-bit kernel.


Determine bits of processor:
isainfo -bv


Determine bits of Solaris kernel:
isainfo -kv