Digital Archiving Resources
Here’s a short list of resources that I’ve compiled for doing digital backups and archival work, and while this is certainly not exhaustive, it just includes the things that I find most helpful.
Storage Solutions
Our UChicago accounts give us a lot of access to cloud-based storage, all with the caveat that we’ll lose access to them upon graduation. Through Microsoft, we have OneDrive, and through Google, we have Google Drive. However, I personally prefer Box, which is considered more secure (and thus IRB approved!), and it integrates very well into your personal computer. So you can edit files and move files around normally as you would in Window’s File Explorer or Mac’s Finder, and they’ll automatically upload to the cloud.
For when our accounts run out, I think a small SSD card is a must. Samsung has some really great, inexpensive options (some with password protection) that are both very fast, secure, and IRB approved.
Videos
Many videos are available to download immediately with a simple right-click. If not, ClipGrab (https://clipgrab.de/update/en) is a really useful way to download things like YouTube videos. It can’t download twitch VODs, so I’ve used Clipr (https://clipr.xyz/twitch-video-downloader) and Twitch Leecher (https://github.com/Franiac/TwitchLeecher/releases). As with all software mentioned here, be careful when installing and just make sure you read through the installation procedures. These kinds of software are infamous for installing malware, but in my experience these three have been fine.
For recording video on your screen, it is much easier on iPhones, though there are some apps available for Android. On the flipside, I don’t own a Mac, but all Windows 10 computers come with the Xbox game bar preinstalled. This game bar can be used to record gameplay, but by playing around with some settings, any app can be considered a “game,” allowing you to record most all of your screen. There are also free applications for screen recording, though they might include watermarks or be pretty intensive for lighter computers.
Audio
Like most videos, a lot of audio can be instantly downloaded with a right-click. If not, there are a few options. Audacity (https://www.audacityteam.org/) is a great, open-source audio editing software that also allows users to recording any sounds coming from their computer (https://manual.audacityteam.org/man/tutorial_recording_audio_playing_on_the_computer.html). This is super helpful when you can’t download, but it can be annoying if you have notification sounds enabled, so watch out for that. As a last resort, if you know a little HTML, you can use the “inspect element” feature on Google Chrome. Most files are saved somewhere on the website, and if you can find them, you can download them. An easy place to practice this is on the Poetry Foundation’s (https://www.poetryfoundation.org/podcasts/75245/the-mother) website: find a poem that is read aloud, then use the “inspect element” feature to download it.
Pictures
Also easy to download with a right-click, but if you can’t do that for whatever reason, the snipping tools on Windows and Mac make it easy to grab screenshots, or alternatively phone screenshots. Chrome’s “inspect element” can be helpful here, too.
Social Media Posts and History
For many social media posts, it’s probably best to just grab screenshots of them. As I learned with the recent Twitter culling of my informants, these posts can be precarious. However, there are some innovative ways of downloading your activity online. For example, most social media platforms let you download your archive, which will include text version of your messages with other users and the full text of posts (such as tweets that you’ve liked). These archives often don’t include images or videos, so it is likely best to download those individually. If you can’t download certain images or photos (like Instagram posts) but don’t want to lose any quality, Chome’s “inspect element” can work, too. Some tools have been developed for Python and R that allow you to download your and others’ social media histories, such as R’s rtweet package. However, not all are so easily downloaded, so I recommend the next tactic I describe for some sites.
Webpages/Webscraping
Many websites are easy enough to download as PDFs or HTML webpages, albeit with some data loss such as clean web design. Other websites are perpetually archived by the Internet Archive (https://archive.org/). But the best option, depending on the type of data you want to collect, is webscraping. I have only ever webscraped used R, and it isn’t the most straightforward process, but there’s a great video (https://www.youtube.com/watch?v=NwtxrbqE2Gc&) that I found helpful. The way webscraping works, in essence, is that you choose a websites, and, using code, find elements on that website to download. A few processes are at work: you need to be able to change the URL to move to different posts, and you need to have an HTML element that you’re downloading.
For example: I downloaded thousands of horoscopes from a horoscope website in which the URL changed based on the date and sign. The URL for the daily horoscope of an aries would be something like: www.wesbite.com/12/07242020 (12 = aries, 07242020 = the date) The horoscope itself was stored in a <p> tag on the website. So I wrote a program in R that incrementally changed the URL by adding or subtracting numbers, and then would download whatever text was present in that URL’s <p> tag. I’m sure this sounds complicated, but essentially, R would do some math to turn “12/07242020” into “12/07232020” and download whatever was there. I am bad at coding, but luckily R is OK with inelegant solutions.
If you’re making websites for social research, I recommend using Glitch (https://glitch.com/) to prototype handwritten websites, using GitHub (https://github.com/) to host websites (though their “pages” feature), and using Google Analytics (https://analytics.google.com/) for data harvesting. I’ve used these three tools in the past to build simple online games and see how players use them.
Interviews
Skype is great because it is much more accessible than Zoom for many people, but it doesn’t automatically separate audio and video. However, I’ve found that most people really like talking on the phone. Google Voice is a great option for this because it allows you to record your entire (four hour!) phone call and then download it as an MP3. You get a free Google Voice number, and you can download the app to just have phone calls on your personal phone. For transcription, I recommend using Temi (https://www.temi.com/) and Otter.ai (https://otter.ai/). Unfortunately, I despise all qualitative coding software that I’ve ever used, so if anyone has open-source recommendations, I would love that!
With everything, make sure to gather consent when appropriate and be aware of copyright when considering publications. Remember what Google searches led you where, and always be archiving anything you find interesting (it might be impossible to find that tweet or tiktok in the future). If you have any other helpful resources, throw them in the comments!
Leave a Reply
You must be logged in to post a comment.