— layout: post title: Scraping Hubble (ruby http crawler) —
Hubble takes the most beautiful images in the universe. They are also cool enough to post the image on their site. I wanted to download their images for use as backgrounds, or art on future openframeworks experiments. Ruby has a couple libs that make this easier. HTTParty and the standard NetLib were used in the script I wrote for pulling the images down. HubbleScrapper doesn’t take any arguments, but does have some interesting tidbits in it.
There are two methods used to follow moved responses. HTTParty does this internally, and that is used for the index of the search. However, this was not working correctly (pulled the preview rather than the full image). I found an example method that used the standard lib and incorporated it into my
fetch method. Not sure why HTTParty and the below snippet differ, would have to look at the internals.
I used a simple consumer model to handle threading. This isn’t producer consumer, since the production is completed before the threading starts. Basically 8 threads are created (later joined), and these threads fetch the image links independently.