, , ,

Group Function in SQL

Group functions allow you to perform data operations on several values in a column of data as though the column were one collective group of data. These functions are also called group-by functions because they are often used in a special clause of select statements, called the group by clause.
The syntax for the GROUP BY clause is:
    SELECT column1, column2, … column_n, aggregate_function (expression)
    FROM tables
    WHERE predicates
    GROUP BY column1, column2, … column_n;
aggregate_function can be a function such as SUM, COUNT, MIN, or MAX.
Here’s a list of the available group functions:

  • avg(x) Averages all x column values returned by the select statement
  • count(x) Counts the number of non-NULL values returned by the select statement for column x
  • max(x) Determines the maximum value in column x for all rows returned by the select statement
  • min(x) Determines the minimum value in column x for all rows returned by the select statement
  • stddev(x) Calculates the standard deviation for all values in column x in all rows returned by the select statement
  • sum(x) Calculates the sum of all values in column x in all rows returned by the select statement
  • Variance(x) Calculates the variance for all values in column x in all rows returned by the select statement

Example using the SUM function
For example, you could also use the SUM function to return the name of the department and the total sales (in the associated department).
SELECT department, SUM(sales) as “Total sales”
FROM order_details
GROUP BY department;

Because you have listed one column in your SELECT statement that is not encapsulated in the SUM function, you must use a GROUP BY clause. The department field must, therefore, be listed in the GROUP BY section.
Example using the COUNT function
For example, you could use the COUNT function to return the name of the department and the number of employees (in the associated department) that make over $25,000 / year.
SELECT department, COUNT(*) as “Number of employees”
FROM employees
WHERE salary > 25000
GROUP BY department;
ROLLUP
This group by operation is used to produce subtotals at any level of aggregation needed. These subtotals then “roll up” into a grand total, according to items listed in the group by expression. The totaling is based on a one-dimensional data hierarchy of grouped information. For example, let’s say we wanted to get a payroll breakdown for our company by department and job position. The following code block would give us that information:
SQL> select deptno, job, sum(sal) as salary
  2  from emp
  3  group by rollup(deptno, job);
   DEPTNO JOB          SALARY
——— ——— ———
       10 CLERK          1300
       10 MANAGER        2450
       10 PRESIDENT      5000
       10                8750
       20 ANALYST        6000
       20 CLERK          1900
       20 MANAGER        2975
       20               10875
       30 CLERK           950
       30 MANAGER        2850
       30 SALESMAN       5600
       30                9400
                        29025
Notice that NULL values in the output of rollup operations typically mean that the row contains subtotal or grand total information. If you want, you can use the nvl( ) function to substitute a more meaningful value.
cube
cube This is an extension, similar to rollup. The difference is that cube allows you to take a specified set of grouping columns and create subtotals for all possible combinations of them. The cube operation calculates all levels of subtotals on horizontal lines across spreadsheets of output and creates cross-tab summaries on multiple vertical columns in those spreadsheets. The result is a summary that shows subtotals for every combination of columns or expressions in the group by clause, which is also known as n-dimensional cross-tabulation. In the following example, notice how cube not only gives us the payroll breakdown of our company by DEPTNO and JOB, but it also gives us the breakdown of payroll by JOB across all departments:
SQL>  select deptno, job, sum(sal) as salary
  2  from emp
  3  group by cube(deptno, job);
DEPTNO JOB          SALARY
——— ——— ———
       10 CLERK          1300
       10 MANAGER        2450
       10 PRESIDENT      5000
       10                8750
       20 ANALYST        6000
       20 CLERK          1900
       20 MANAGER        2975
       20               10875
       30 CLERK           950
       30 MANAGER        2850
       30 SALESMAN       5600
       30                9400
          ANALYST        6000
          CLERK          4150
          MANAGER        8275
          PRESIDENT      5000
          SALESMAN       5600
                        29025
Excluding group Data with having
Once the data is grouped using the group by statement, it is sometimes useful to weed out unwanted data. For example, let’s say we want to list the average salary paid to employees in our company, broken down by department and job title. However, for this query, we only care about departments and job titles where the average salary is over $2000. In effect, we want to put a where clause on the group by clause to limit the results we see to departments and job titles where the average salary equals $2001 or higher. This effect can be achieved with the use of a special clause called the having clause, which is associated with group by statements. Take a look at an example of this clause:
SQL> select deptno, job, avg(sal)
  2  from emp
  3  group by deptno, job
  4  having avg(sal) > 2000;
   DEPTNO JOB        AVG(SAL)
———     ———           ———
       10   MANAGER        2450
       10   PRESIDENT      5000
       20   ANALYST        3000
       20   MANAGER        2975
       30   MANAGER        2850
Consider the output of this query for a moment. First, Oracle computes the average for every department and job title in the entire company. Then, the having clause eliminates departments and titles whose constituent employees’ average salary is $2000 or less. This selectivity cannot easily be accomplished with an ordinary where clause, because the where clause selects individual rows, whereas this example requires that groups of rows be selected. In this query, you successfully limit output on the group by rows by using the having clause.

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply