Saving posts to a Facebook group to file

Thu 05 June 2014 by Eoin Travers

For one reason or another, it came up today that I would like to download a record of all posts to a certain Facebook group. This is also something which might be of interest to those doing qualitative research, given that Facebook is (or at least was) probably the biggest discursive medium around, particularly amongst teenagers, who qualitative researchers love to study. Doing so took a little work, and a little Googling, but here's how it's done.

Graph

Facebook have a tool called Graph, which is used by developers and analysts to access the same content as normal users would see presented nicely in their browser, or in the app. Using your Facebook account, you can register as a developer for free, and use Graph to explore the data underlying Facebook. You'll need two pieces of information to download your group's data, the group's ID, which I got here, and your own access token, which you can generate by following the instructions on the main Graph page.

Using the Graph API Explorer, it's pretty intuitive to browse the data - make sure the little dropdown menu in the top left is set to 'GET', not 'POST' or 'DELETE', paste in your group ID into the box, click 'Submit', and you should see some information about the group. Add '/feed' to the end of the group ID, and you'll see the posts, much as your browser sees them.

Automating the process

While you can click through each page this way and download the content, this quickly becomes dull with long feeds. Using Python, and the information I found in this StackOverFlow post, you can automatically download all of the feed as follows (make sure you run this script in the directory you want to save the data to):

from facepy import GraphAPI
import json

# Credentials you used on the Graph website
group_id = "\YOUR_GROUP_ID" # Note backslash
access_token = "YOUR_ACCESS_TOKEN"

graph = GraphAPI(access_token)
pages = graph.get(group_id + "/feed", page=True, retry=3, limit=1000)
i = 0
for p in pages:
    print 'Downloading page', i
    with open('content%i.json' % i, 'w') as outfile:
        json.dump(p, outfile, indent = 4)
    i += 1

Comments