A Simple Spider for Researcher Ranking Project

A important issue in ranking researchers is to construct the citation network between researchers.  To achieve it, I need crawl the database of cictation data from HistCite Website, whose URL is http://www.garfield.library.upenn.edu/histcomp/index.html


I downloaded a python spider from web and revised it to make it useable. It is not perfect but enough for this project. It is really cool to watch command windows scrolling down and downloading thousands of papes. Wow, is it so called geek behaviour? 


Source Code

The original version comes from this website http://xlvector.net/blog/?p=18


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: