TSQL PIVOT

by jieyangh

The TSQL PIVOT command is an operation that essentially rotates a set of table data. The end result is a transform that converts columns into rows. One common usage of PIVOT is to normalize denormalized data. This may seem rather abstract and confusing at first, so here’s a simple example.
ACME Corp has an optional end user survey that allows you to them to track customer satisfaction. The survey asks users to rate the company on a scale of 1-5 across a variety of criteria. This data is stored in the database as a set of denormalized key/value pairs:

CREATE TABLE SURVEY_RESULTS (
   RESULT_ID int PRIMARY KEY IDENTITY(1,1)
, [USER_ID] int
, QUESTION_ID int
, SURVEY_SCORE int
)

The SURVEY_RESULTS table has a foreign key, QUESTION_ID, that maps to the QUESTION_ID column in the table SURVEY_QUESTIONS. The SURVEY_QUESTIONS table stores the actual survey questions themselves (In a real world scenario, SURVEY_QUESTION would store a localization id that points to some resource file instead of holding the actual text of the question):

CREATE TABLE SURVEY_QUESTIONS (
QUESTION_ID int PRIMARY KEY
,SURVEY_QUESTION varchar(50))


ALTER TABLE SURVEY_RESULTS
	ADD CONSTRAINT fk_SURVEY_RESULTS_QUESTION_ID_SURVEY_QUESTIONS_QUESTION_ID FOREIGN KEY (QUESTION_ID)
REFERENCES SURVEY_QUESTIONS (QUESTION_ID)

INSERT 
  INTO SURVEY_QUESTIONS WITH (ROWLOCK)
       (QUESTION_ID, SURVEY_QUESTION)
VALUES (1, 'Rate our service')
      ,(2, 'Rate your customer satisfaction')
      ,(3, 'Rate our user interface')

INSERT
  INTO SURVEY_RESULTS WITH (ROWLOCK)
       ([USER_ID], QUESTION_ID, SURVEY_SCORE)
VALUES (1, 1, 5)
      ,(1, 2, 4)
      ,(1, 3, 5)
      ,(2, 1, 4)
      ,(2, 2, 3)
      ,(2, 3, 2)
      ,(3, 1, 3)
      ,(3, 2, 2)
      ,(3, 3, 1)

This design allows flexibility in adding and removing questions. Using a normalized table with a column for each question would mean that every time the survey changed, entire columns would need to be added or removed. Of course, a normalized table makes it much easier to visualize the survey results though, so it would be helpful to create a VIEW on the SURVEY_RESULTS table that converted the key/value rows into columns.

SELECT * FROM SURVEY_RESULTS

Gives us the following, which is hard to interpret at a glance due to its denormalized nature:

 
RESULT_ID   USER_ID     QUESTION_ID SURVEY_SCORE
----------- ----------- ----------- ------------
1           1           1           5
2           1           2           4
3           1           3           5
4           2           1           4
5           2           2           3
6           2           3           2
7           3           1           3
8           3           2           2
9           3           3           1

What we’d like to see is one row for each user, with the score for each question as its own column in that row. This can be achieved using PIVOT (In the real world we’d create a view for easy at-a-glance access to this data):

SELECT  
USER_ID, [1] AS 'Service score', [2] AS 'Customer satisfaction', [3] AS 'User Interface'
FROM
(SELECT [USER_ID], QUESTION_ID, SURVEY_SCORE
    FROM SURVEY_RESULTS) AS InputTable
PIVOT
(
AVG(SURVEY_SCORE)
FOR QUESTION_ID IN ([1], [2], [3])
) AS PivotTable;

This gives us the desired result set:


USER_ID     Service score Customer satisfaction User Interface
----------- ------------- --------------------- --------------
1           5             4                     5
2           4             3                     2
3           3             2                     1

Let’s take a closer look at this query. The first thing that we need to do is specify what data we are pivoting. This is done in the FROM clause. As you can see, this is just a simple table expression that selects all three columns from the SURVEY_RESULTS table. For brevity’s sake, I will refer to this table expression by its alias, InputTable.

Now that we know what we are pivoting, we need to identity the pivot column and the set of values to pivot on. This is done in the FOR clause. In our example, we are pivoting on the QUESITON_ID column, and the set of values we are interested in are 1, 2, and 3, which correspond to the questions about service, customer satisfaction, and the user interface. For each value in this set, all rows in InputTable that have a matching value in the specified pivot column will be combined together based on an aggregation function. In our case, that would be the average survey score.

The astute reader will point out that applying an aggregation function implies that there is some sort of grouping being done. There is indeed an implicit grouping based on any column appearing inside InputTable that is not directly referenced inside of the PIVOT statement. In our case, USER_ID is mentioned inside the InputTable but it is not referenced inside the PIVOT statement, so the final result will be grouped based on USER_ID. The query will generate a row for each user in the final result set, along with three columns corresponding to the average score for each question. These are referenced in the outer SELECT statement and are based on the set of values provided to PIVOT. In our case, these are [1], [2], and [3]. Of course, since our sample data only has one answer from a given user per question, the aggregation function doesn’t really matter.

Let’s suppose we want to average the scores across all users. In that case, we simply modify the query slightly and remove USER_ID from the InputTable (note that we also cast SURVEY_SCORE as a decimal, so the average does not get rounded to the nearest integer). This causes the entire result set to be implicitly grouped, which is exactly what we want:

SELECT  
 [1] AS 'Service score', [2] AS 'Customer satisfaction', [3] AS 'User Interface'
FROM
(SELECT QUESTION_ID, CAST(SURVEY_SCORE AS DECIMAL) AS SURVEY_SCORE
    FROM SURVEY_RESULTS) AS InputTable
PIVOT
(
AVG(SURVEY_SCORE)
FOR QUESTION_ID IN ([1], [2], [3])
) AS PivotTable;

As expected, this query returns exactly one row with the average score for each question appearing in its own column:

 
Service score                           Customer satisfaction                   User Interface
--------------------------------------- --------------------------------------- ---------------------------------------
4.000000                                3.000000                                2.666666

And that covers the PIVOT statement in a nutshell. What about UNPIVOT? As you can probably guess, the UNPIVOT command essentially rotates the data in the opposite direction: It takes a set of normalized table data and denormalizes it, converting columns into rows. Its syntax is similar and its use is left as an exercise to the reader.

Post Views: 1,049

Category: Database, Dev, SQL Tags: PIVOT, sql server, TSQL

Jieyang Hu

1 + 1 = 10

TSQL PIVOT

Leave a Reply Cancel reply

Recent Posts

Categories