Python: How to Parse CSV Files

The CSV Module

Did you know that if you go to Google Docs and try to load a csv file with 11,920 rows and 92 columns, you’ll likely get a warning like this?

There is, tragically, a limit to the amount of rows Google Sheets will process.

That’s awkward.

Especially if you really only need a little slice of the information inside the CSV. The headers, for instance. Or a couple of it’s columns.

Python can help with that.

The Basic For Loop

To get at the information, we’re going to start with a few lines of code to open the file and loop through the rows. Keep in mind if you have a massive file, it’s going to take a while to loop through every row, but it’ll let you see everything you’re working with.

Oh, what fun.

Once you get the CSV open, the for loop is going to return each row as its own list in brackets. The rowCount variable is in place to keep track of which row the current loop is on. That’s the bones of all the functions in the script below.

A CSV Script to Extract Two Specific Columns

A warning. Before using this script, you’ll need to adjust the filepath of the CSV and specify a place to save the condensed CSV.

The CSV Script

It contains four functions total. The HowManyRows function counts the total number of rows using the rowCount variable and prints it to the console.The ColumnCount function does the same for the number of columns.

The ColumnLabels function returns the values in the first row (the headers) and helpfully tells you their position in the row’s list. Don’t forget, lists start counting at 0. That’s why, in my results below, I have 96 columns, but my last column is numbered 95.

Using that information, you can create a new condensed csv with the CondenseCSV function. That one requires 4 arguments–the old CSV file path, the new CSV file path, and the position of the columns you want to keep from the old CSV file. Before calling that function, I’ve plugged in a prompt to get each number from the user in console.

To the user, that’ll look like this:

the console

So say I’ve looked through this, and I know I only need the columns labeled ID and URL. I’d put in 0 and 1, like so.

the result

The new CSV file will magically appear. The file should be significantly smaller because it’s only got the data from two columns. And once it’s smaller, you’re more likely to get Google sheets to open it without error.

Here’s a link to the script in Gist.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s