Dear Readers,
Does anyone have a good scraper for Google Scholar that they would be willing to share (or point me to)? I’m looking for something simple – input a search string (including a “cited by” search) and capture basic metadata (author, title, publication, year) of the results.
Thanks!
Dan
Trey
/ January 2, 2012If you are comfortable with a little Python, I believe this should do the trick with a little modification.
Jason Kerwin
/ January 2, 2012I’m not sure if it could be fully automated, but Zotero includes the functionality to grab whole pages of results and then output them in whatever citation format you want (also BibTeX).
Also, a search for “zotero scrape google scholar” turned up this result, which may or may not actually use zotero: http://thebiobucket.blogspot.com/2011/11/r-function-google-scholar-webscraper.html
Ben Bolker
/ February 26, 2012I have written a little bit about this in the past: http://bmb-common.blogspot.com/2011/02/does-google-scholar-suck-or-am-i-just.html — although the python scraper above is new to me. So far (although I have to look at this new one carefully), I haven’t seen one that doesn’t either (a) suck (because it relies on GS’s front page, which has seriously incomplete metadata or (b) implicitly violate GS’s terms of service, which asks you not to ‘spider’ the page …
Ben Bolker
/ February 26, 2012See also stackoverflow.com/questions/7523961/google-scholar-with-matlab/ which comments further on the terms of service issue. The situation really is kind of sad, although I guess we can all have the money we paid to use Google Scholar refunded …