Technical specs for this website

website
Published

April 4, 2025

I’m using this website as a way to help organize and share key documents and resources. The research protocols are in flux at this stage in the project’s development, and this will make it easier to distribute up-to-date drafts with partners, while simultaneously enhancing transparency.

This post outlines the technical specifications for this website and outlines a roadmap for its further development. It will therefore be continually updated as the site evolves.

Fundamentals

This website is based on Quarto, a platform for writing and publishing scientific and technical writing. I had used quarto before but without fully understanding it, and now I am starting to see its elegance.

I had started off using Hugo, but there were too many limitations that Quarto was able to accomodate. You can find an older version of this post reflecting that setup here: #2346852.

The site is hosted on GitHub Pages. The repo is located at https://github.com/zackbatist/CITF-Postdoc.

Generating PDFs

As an avid user, one thing I really like about Quarto is the ability to generate PDFs alongside html versions served over the web. I started tinkering with includes but I need to review how Quarto passes info from YAML frontmatter. It is not at all straightforward and I will need to experiment a bit more with this to get the hang of it.

Archiving and Version Control

Every change is tracked using git. I would also like to archive each research protocol in Zenodo once they reach a point of stability. This would ensure that they ca be aassigned DOIs and detailed metadata, which will make them easier to reference.

However, I do not want to rely on Zenodo’s GitHub integration for two reasons: (1) I want this to be as platform-agnostic as possible, and (2) that system relies on GitHub’s release system which operates on the level of the whole repository rather than specific files.

I might be able to write a custom CI workflow to archive specific files to Zenodo using their API. But, I want to be able to toggle this option, rather than have it occur for every single detected change. Maybe I can accomplish this by pushing the changes that I want to archive to a dedicated branch that the CI workflow is configured to operate on. Or it might be easier to simply do this manually, since I’m not sure I will be using it that often anyway.

Git Submodules

Since I’m dealing with potentially sensitive information, including recordings, transcripts and notes deriving from interviews that participants may opt to keep confidential, I need to isolate those files so they remain private and secure. To accomplish this, I will set up private git repos on the MCHI-administered GitLab instance and integrate them into my main project repository (hosted on GitHub, and hence less secure) as submodules. I will then configure my quarto rendering options to ignore any subdirectories deemed sensitive enough to not share publicly (which may not include everything in the submodule). I intend to follow this guide written by Tania Rascia to pull this off.

Just as a quick reminder of the procedure for committing and syncing changes, based on my testing:

  1. Make changes to files in the submodule.
  2. Commit and sync those changes to the remote submodule.
  3. Obtain a record of those changes from the remote submodule:
    • git submodule update --remote
    • This can be verified using git status; changes should be indicated as (new commits).
  4. Commit the record of commits to the main repo’s remote.

qc

I’m using qc, a novel qualitative data analysis tool designed for computational thinking. It’s a command line tool that makes QDA compatible with textfile based social science workflows, which more-or-less aligns with my overall approach. qc also plays nice with my plan to isolate potentially sensitive information on a private git submodule, and is loosely compatible with my strategy for posting memos and research outputs publicly via this website.

I’m also communicating with its maintainer, Chris Proctor, who is an education researcher and computer scientist based at SUNY Buffalo. This is the perfect combination for this kind of work since he specializes in instilling computational literacies, he has the technical chops to actually develop this system, and most of all: qualitative data analysis is the bread and butter of education research, so he is developing from an informed perspective. I really like what I see so far, and I’m eager to collect data of my own so that I can maximize the full potential that this system affords. I will write another post about my experiences once I have more to report.

Installing qc

I had a bit of a hard time installing qc using the recommended method (using pipx install qualitative-coding) due to NumPy and spaCy dependency issues, so I had to install from source. First I created a virtual environment:

python3.12 -m venv qc
source qc/bin/activate

I then downloaded the source code from the GitHub repo and installed it within the virtual environment:

pip install qualitative-coding-main.zip

Then cd into the directory and initialize a qc project:

cd qc
qc init

I applied some custom settings because I’m using Codium instead of VSCode. These basically replace the editor that’s called up when I use the code command.

codebook: codebook.yaml
corpus_dir: corpus
database: qualitative_coding.sqlite3
log_file: qualitative_coding.log
memos_dir: memos
qc_version: 1.7.3
verbose: true
editor: codium
editors:
  codium:
    name: Codium
    code_command: 'codium "{corpus_file_path}" "{codes_file_path}" --wait'
    memo_command: 'codium "{memo_file_path}" --wait'

I also set verbose: true to help troubleshoot any issues that might come up and reduce my dependence on Chris.

Regarding the submodule, there is a conflict with how the qc directory had to be generated. I needed to create the qc directory using venv, but I also had to create it by pulling it down from the GitLab server. So I had to create the virtual environment, exit the environment, rename the directory to a placeholder name, pull down the fresh submodule, commit and push the new submodule to the parent git repository, and then copy over all the contents from the placeholder directory into the submodule (and then commit changes and push). Not really a major issue, but a minor incompatibility in the workflow that’s worth noting.