• Bootstrap an independent data scraper

    • From repository rileyjshaw/rileyjshaw-new

    Project scraper

    The projects on my site are automatically scraped and formatted at publish time using the scripts in this directory. Read more about my reasoning below, or skip to the directory structure.

    Why?

    Gatsby's source and transformer plugins are powerful, and I used them in the initial development of this site. I eventually decided that separating my collection process would be good for flexibility, control, and offline work.

    Flexibility

    GraphQL's filters and transforms are powerful, and Gatsby's APIs add more options for how data is fetched, cached, and transformed. However, complicated or non-standard data transforms and sanitization are much easier outside of Gatsby's ecosystem. For instance, the API starts to feel clunky for one-off treatment of specific content nodes,

    Control

    I've had a good experience with Gatsby but I may decide to migrate my site to another platform or format someday. Keeping my data entirely separate from the from the site's framework makes migrating my data as easy as copy/pasting this directory. It's just a few JS files!

    Offline

    Gatsby stores requests made through its source plugins in the .cache directory by default. The .cache directory is deleted after:

    • gatsby clean is called.
    • package.json changes, for example a dependency is updated or added.
    • gatsby-config.js changes, for example a plugin is added or modified.
    • gatsby-node.js changes, for example if a new Node API is invoked.
    • …etc.

    I found I was frequently triggering .cache wipes during development. At best this meant I was pinging APIs and atom feeds more than necessary. At worst, it made working offline with project data impossible.

    Directory structure

    Here's how the scraper is organized for now:

    scrape-projects.js
    	The megafile to replace Gatsby's source plugins. This pulls project data
    	from all online sources and saves them into `_generated/`.
    
    _generated/
    	Files generated by the `scrape-projects.js` above. DO NOT EDIT THESE FILES
    	MANUALLY! They will be overwritten.
    
    	scraped-projects-raw.json
    		Not quite the raw response, but pretty close. This file
    		contains all the data that I may decide to use someday, but
    		haven't yet. Organized by `type` in a nested object.
    
        scraped-projects-formatted.json
    		Standardized into a smaller format that can be smashed together with
    		`curation/` data. Flattened into an array with `type` annotations on
    		each node, as well as unique, unchanging project IDs (`UID`).
    
    curation/
    	This is where all custom curation and processing go, eg. tagging content.
    	Projects are modified based on their generated UID.
    
    	tweaks.js
    		Mainly for one-off changes eg. fixing formatting errors from immutable
    		online sources. This file can also be used to apply changes on groups
    		of files.
    
    	tags.js
    		TODO: figure out where `tags`, `lastTagged`, and `coolness` data are
    		going to live.
    
    sources/
    	Offline data files and collections to compliment the online data cached in
    	`_generated/`.
    
    	standalone-projects.json
    		TODO: Move these over from the `src/data` directory.
    
    tools/
    	Custom tools to help classify, organize, or edit project nodes without
    	opening a text editor. Custom tools are only built for data that is too
    	difficult to keep updated or standardized manually.
    	TODO: Hook these up to a Node server so they edit the JSON files directly.
    
    	tagger.html
    		Finds untagged or incorrectly tagged projects, as well as projects
    		that were last tagged before a new tag type was added. Provides an
    		interface to preview and re-tag each project.
    
    	cool-sort.html
    		TODO: sort or insert nodes based on their "coolness".
    
    test/
    	Quick test files to ensure data is downloaded without any dropped nodes,
    	UIDs are unique, etc.
    

    Continue reading

  • Archive pre-2019 Heroku site; Update README.md

    • From repository rileyjshaw/xoxo-bingo

    Excerpt from the new README:

    ## timeline
    2015: first bingo! [eli](https://twitter.com/veryeli) and i used the attendee
    directory to generate a unique card for everyone (twitter login kept it private
    🔒). squares on your card were other attendees - if you met someone on your
    card you got to check it off. we made it cuz we’re shy. most of it is in the
    `pre-2019` folder!
    
    2016: we made the cards prettier by pulling in people’s twitter photos and
    doing imgmagick to them 🔮
    
    2017: no xoxo, no bingo… missed u all
    
    2018: xoxo was in the midst of changing their infrastructure, so i lost access
    to the attendee directory. [hannah](https://twitter.com/herlifeinpixels),
    [jason](https://twitter.com/justsomeguy) and i met in a cafe before the kickoff
    ceremony and designed a static version with input from the community. hannah
    and jason made 25 icons in like two minutes, it was incredible!!!
    
    2019: i've been too cheap to get https://xoxo.bingo in previous years, but
    [andy](https://twitter.com/andymcmillan) noticed a thread on slack and hooked
    us up! thx andy.
    

    leading up to xoxo2018, i realized we wouldn't have access to the new attendee registry. andy and i discussed ad-hoc private access and other ways to make it work, but it was too much. so hannah, jason and i made a static version with "achievements" sourced from the slack community.

    it was fun to get excited about things specific to that year, like the podcast airstream and the blue ox. and it's gonna be that way from now on! feel free to create an issue or msg on slack if you have ideas for this year's bingo squares.

    since it's staying a static site, i moved everything off of heroku. all site content will live in 2019-and-on/.

    Continue reading

  • Firehose: proof of concept

    • From repository rileyjshaw/rileyjshaw-new

    I'm experimenting with auto-generating nodes for https://rileyjshaw.com/lab from a variety of data sources. This project may eventually replace https://rileyjshaw.com.

    This is the initial commit, completed quickly as a proof of concept. There's nothing much to show, but I want to deploy ASAP so I can test the full pipeline.

    So far, everything has worked! Data from a variety of sources is already appearing on my local server. To reproduce:

    So far, I'm surfacing data from:

    Setting this up was EASY, which makes me excited for the future of this experiment :)

  • Add an index for each individual project

    • From repository rileyjshaw/canvas

    I started this repository in the spirit of OpenFrameworks and TouchDesigner: I wanted all the libraries I might need close at hand, with a simple, abstracted API for drawing to , SVG, etc. I wanted a personal playpen / pigpen to test ideas in.

    For that reason, I didn't need nice features like routing or pages. :) if I wanted to see an old sketch, I'd change the root component and re-render. It worked for me!

    But I planned to eventually make an easier way to browse existing experiments. It would benefit me a bit, and casual viewers a lot.

    I haven't updated this repository in nearly two years, and I honestly never expect to again. I'm doing less browser-based creative coding these days, and trying to stretch my work in other directions.

    About an hour ago, I decided to create an index page or dropdown to close this project out and keep it accessible in perpetuity. When I cloned the repo and started looking at the build pipeline, I almost noped the whole idea. I built this with create-react-app, so even adding new pages for each project the recommended way involves:

    1. Installing some sort of React-compliant router.
    2. Spending… hours? figuring out which version of which router works with the project's outdated dependencies, OR,
    3. Upgrading the entire project, likely involving major upgrades to Webpack, Babel, etc.
    4. Installing something called react-snapshot, which apparently builds static files for you? But there's still a pushState history API? The README listed some tutorials, so I opened them.
    5. …once I'd reached this point, I realized I'd need another method if I wanted to be done within the hour.

    At that point, I could have searched the web for "create-react-app static routes 2017 easy" and gone down that rabbit hole before giving up. OR, I could have given up immediately. Or I could do what I did, which was a good idea:

    I changed the root component 38 times by hand, typed "npm run build" into my terminal by hand, and dragged the built files BY HAND into unique directories that I created BY HAND.

    Gasp!

    I spent another minute in my editor surrounding the output of ls -d with anchor tags for a root index. (yes, by hand)

    The most time-intensive part of the process was writing this commit message. I'm confident if I'd tried to automate the process or rebuilt the project "the right way", I'd be at this for a few more hours.

    The result is a little sketchy. Namely, I'm sure the total payload of each page is a bit bigger, and caching takes a hit. But I think an extra kilobyte will be tolerated by the 3 people who ever visit this corner of my website.

    And wow it was so easy. And if I ever decide to add a new sketch, I can do the same simple steps by hand. No dependency mismatch with my local versions. No reading old docs. Just build, drag, repeat, forever.

    I guess I'm writing this as a reminder to myself: it's usually possible to break back out to 1995 in a pinch.

    Continue reading

  • Add Dwitter data and some initial scraper options

    • From repository rileyjshaw/rileyjshaw.github.io

    I love the tidal wave of projects on /lab, and I want to emphasize that for v3.0 of the website. I update pages across the web daily; Glitch, Codepen, gist.github.com, Dwitter, Hackster, etc. Plus there's social media…

    I'm okay with manual curation for the most part, but for websites like Dwitter where contributions are inherently unpolished / untitled, it doesn't make sense for me to hand-pick and manually update a giant list.

    Also: I'm not sure how long Dwitter will be around for. Periodically saving the underlying code / images / etc. gives me more ownership over the presentation and preservation of my data. It changes my relationship with these sites from content hosts to publishing platforms. That makes me feel more secure with my zillion links.

    TODO(?): Automatically fetch new content during the publishing step?

    Continue reading