Just when I thought I was going to have a nice, relaxing Processing weekend, I realized my Airbnb scraper is broken. Fun fact: If they change their HTML, this script is WILDLY unlikely to work. I’m getting the sense that Airbnb is going to be the wild duck to my Sumatran tiger.
Since the scraper is how I’m gathering data for all the other scripts I’m trying to write, I spent the weekend fixing it.
Yeah, I’ve never had to debug a scraper before, but primarily, I learned it’s a task best undertaken with the Page Inspector.
What’s the Page Inspector?
Originally, I was violently rage clicking through notepad documents full of HTML. Turns out, there exists a marvelous Firefox tool called the Page Inspector. Right-click whatever you want to specific piece of page HTML you want to see, and it’ll show it to you COLOR CODED.
Why didn’t I use this bastard the first time around? Didn’t know it was a thing.
It’s a thing.
Hold up. This is longer. What else did you change?
Added a GUI because it was really necessary.
And the code to loop through several page results within Airbnb because a single page only has 18 listings. To get a decent data set to play with, Imma need more than that. Now… The Airbnb URL looks hellish, but it’s actually somewhat logical. Inside all the &’s and %’s, you’ll find the city, check-in date, check-out date, page number, and coordinates for the northeast and southwest map corners.
The GUI needs the first search result page URL so it can be chopped in half—before and after “Page.” Then the script loops the page numbers to get all the data.
The Script Annotated to Within an Inch of its Life
Despite it being a relatively simple fix, I had to walk through the original script line by line to figure out where exactly the problem was. While I was doing that, I added more comments to help me along next time it breaks.
The Copy/Paste Version:
Gui, Add, Text, , Temporary txt Filepath Gui, Add, Edit, vFilepath W400, C:\Users\Desktop\Airbnb.txt Gui, Add, Text, , Temporary CSV Filepath Gui, Add, Edit, vcvFilePath W400, C:\Users\Desktop\Airbnb.csv Gui, Add, Text, , How many pages? Gui, Add, Edit, vpagecount, Gui, Add, Text, , Page URL Gui, Add, Edit, vINITIALURL R8 W400, https://www.airbnb.com/s/San-Francisco--CA?guests=1&adults=1&children=0&infants=0&ss_id=x3k8ohu0&ss_preload=true&source=bb&page=1&allow_override%5B%5D=&ne_lat=37.912159028101655&ne_lng=-122.32165604091637&sw_lat=37.86051531470982&sw_lng=-122.56060867763512&zoom=11&search_by_map=true&s_tag=aB5-b2TQUnited-States Gui, Add, Button, , OK Gui, Show, ,The Scraper Return ButtonOK: Gui, Submit If Instr(INITIALURL, "page") { foundPos := InStr(INITIALURL, "page", false, 1,1) StringTrimLeft, LEFTURL, INITIALURL, foundPos + 5 StringTrimRight, RIGHTURL, INITIALURL, StrLen(INITIALURL)-foundpos } listing := [] price := [] desc := [] lat := [] lon := [] dind :=0 latind:=0 lonind:=0 ind :=0 LoopCount:= 1 Loop, %pagecount% { AURL := RIGHTURL . "age=" . LoopCount . LEFTURL URLDownloadToFile, %AURL%, %FilePath% FileRead, AllTheText, %FilePath% Loop, Parse, AllTheText,<,> { If InStr(A_LoopField, "data-hosting_id") { foundPos := InStr(A_LoopField, "data-hosting_id=") StringTrimLeft, OutputVar, A_LoopField, foundPos+16 foundPos := RegExMatch(OutputVar, "data-review_count") StringTrimRight, Finalvar, OutputVar, StrLen(OutputVar)-foundPos+3 listing[ind] := FinalVar } else If Instr(A_LoopField, "pricerate") { foundPos := InStr(A_LoopField, ">", false, 1, 1) StringTrimLeft, OutputVar, A_LoopField, foundPos price[ind] := OutputVar ind++ } { continue } } filedelete, %FilePath% Loopcount++ } For index, value in listing { URLDownloadToFile, https://www.airbnb.com/rooms/%value%, %filepath% FileRead, AllTheText, %filepath% foundPos := InStr(AllTheText, "Twitter",False,1,1) StringTrimRight, TText, AllTheText, StrLen(AllTheText) - foundPos fileDelete, %Filepath% FileAppend, %TText%, %Filepath% Loop, Parse, TText,<,> { If InStr(A_LoopField, "og:description") { foundPos := InStr(A_LoopField, "content") StringTrimLeft, OutputVar, A_LoopField, foundPos+8 StringTrimRight, Finalvar, OutputVar, 9 desc[dind]:=FinalVar dind++ } else If Instr(A_LoopField, "latitude") { foundPos := InStr(A_LoopField, "content") StringTrimLeft, OutputVar, A_LoopField, foundPos+8 StringTrimRight, Finalvar, OutputVar, 9 lat[latind]:=FinalVar latind++ } else If Instr(A_LoopField, "longitude") { foundPos := InStr(A_LoopField, "content") StringTrimLeft, OutputVar, A_LoopField, foundPos+8 StringTrimRight, Finalvar, OutputVar, 9 lon[lonind]:=FinalVar lonind++ } } } filedelete, %FilePath% U:=0 For index, value in listing { description := desc[U] stringreplace,description,description,`,,%A_space%,,All li := listing[U] pri := price[U] latitude := lat[U] longitude := lon[U] fileappend, %index%`,%li%`,%pri%`,%latitude%`,%longitude%`,%description%`n, %cvFilePath% U++ } MsgBox Done return
I don’t know how many weeks I’ve been saying this now, but next weekend it’s back to Processing. Pretty graphics. Maybe even a gif or two.
One Reply to “AHK: How to Fix an Airbnb Scraper”