What the Heck are Pivot Tables?

Today’s going to be a bit of a cheat because GifGuide is still semi-on hiatus and Pivot Tables don’t involve code. They can, however, help you avoid coding in instances where you just want to summarize a massive pile of data.

No need for Python. No need for VBA. It’s just built into Excel.

Let’s get some data to play with. Kaggle’s got a fun set called IMDb movies extensive dataset with records on 85,855 movies.

It comes with titles, years, genres, and countries. A Pivot Table can take those 85,855 rows and help you answer questions like, “What genre of movie was most popular in each decade from the 1910s to the 2010s (according to this IMDb sample)?” and “Does that change depending on where the movie was made?”

Before jumping into the Pivot Table, I’m going to adjust the way the dataset is formatted to make it easier to pull together later. You’ll notice below that the genre column isn’t parsed. If a movie is both a drama and a sci-fi, it’s thrown into the same cell.

That’s going to make life difficult, so I’m going to separate all that out by first splitting the cells into separate columns anytime there’s a comma (with Text to Column) and then I’m removing all the duplicates. That left me with 25 unique genres.

ActionAdultAdventureAnimationBiographyComedyCrime
DocumentaryDramaFamilyFantasyFilm-NoirHistoryHorror
MusicMusicalMysteryNewsRealty-TVRomanceSci-Fi
SportThrillerWarWestern

Next I’m copying that list and pasting (special paste > transpose) to make myself 25 columns that are just going to exist to let me know if a particular row is tagged with that genre.

There’s probably a better way of doing this (and if you know it, please add it to the comments), but I just used a =ISNUMBER(SEARCH()) to get those populated, then replaced FALSE with 0 and TRUE with 1.

The end result looked like this:

Now that we’ve got the preliminaries out of the way, let’s make a Pivot Table.

Make sure you’ve clicked into your dataset, and then go to Insert>Tables>PivotTable.

That’s going to bring up the PivotTable dialog box. The defaults should already be set, so you’ll just click OK.

That’s where you’ll get the PivotTable Fields sidebar. PivotTables are made up of Rows, Columns, Values, and Filters. We’re going to start by setting up the first three of those.

I’m focused on the change over time, so I’m going to set my rows as years. To do that, you just drag and drop “Years” into the Rows section. I’m separating my totals by genre, so my columns are going to be the 25 columns I just parsed. To have those summed out, drag and drop them into the Values field, and Columns will populate on its own.

PivotTable in action.

Congratulations. You now have a PivotTable. Huzzah.

This is looking closer to what we wanted, but the original question referenced decades, right? So how do we group the individual years into decades? Highlight each group of years in the decade, right-click, and then hit group. You’ll know it’s worked because your group will have a little plus sign on the left hand side so that you can collapse and expand it.

From there, you can rename the groups to make them easier to identify. And here’s the real interesting thing–you can horizontally sort the data. As in, you can make the genre columns change positions based on their totals while keeping the decades in order.

If you want to go by grand total, just right-click any of the cells after the Grand Total cell, and select Sort>Largest to Smallest.

Based on that, it’s clear that drama is overwhelmingly the most popular genre in the dataset. But what if we want to know specifically about the 60’s?

Right click the values in the 60’s row, then get yourself to the Sort by Value window. Under sort direction, you’ll select Left to Right instead of Top to Bottom and then click Okay.

Now for the real fun part of PivotTables. You can filter based on a completely different field. Remember how our original data set also had country of origin? Well, let’s say I want to know the totals specifically for films that are marked as being from the United States. We can do that. Just drag and drop “Country” into the Filter field.

Then you can hit the filter and select the option for “Select Multiple Items” to get everything with USA in the country field. That will include films that had multiple countries of origin assigned.

With your snazzy new PivotTable, you can also create PivotCharts to visualize your data.

So for instance…

Behold. A Stacked Bar Chart.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s