R Workshop with Dominic Nyhuis › view all

"Automated Web Data Collection for the Social Sciences Using R"

April 05 - April 06, 2018 - 10:00 - 16:30
University of Bremen, UNICOM, House 7, Conference Room (7.3280)
Mary-Somerville-Str. 7
28359 Bremen

Since Computational Social Science is a growing topic that covers many various facets, the BIGSSS Methods Center and SOCIUM Methodenforschung present a pair of events on this topic. A workshop takes place on April 5-6, 2018, 10:00 - 16:00h, at University of Bremen, UNICOM, House 7, Conference Room (7.3280):

Dr. Dominic Nyhuis from Goethe-Universität Frankfurt gives a two-day workshop on "Automated Web Data Collection for the Social Sciences Using R".


The vast availability of data on the web is fundamentally changing the research practices in the social sciences. By mastering the tools needed for automated web data collection, a single researcher can construct a data set that would have required tremendous efforts and expenses not too long ago. The course is intended to provide an applied overview of the skills required for automatically collecting data from the web. It will give a cursory introduction to some of the most important skills and techniques. In particular, the workshop will provide an introduction to the basic structure of HTML to enable an understanding of the underlying architectures and mechanics of websites. XPath will be introduced as a syntax to address specific elements of websites and a tool to extract them as needed. Regular expressions are covered which allow further processing textual data gathered from the web. Finally, client-server interactions in the HTTP protocol and the structure of URLs are discussed to understand web interactions in practice.  The applied elements of the workshop will make use of the programming language R. Therefore, a basic familiarity with R is a prerequisite for attending the course.