Web Scraping and Analyzing YoYo Tricks

Web Scraping and Analyzing YoYo Tricks

Overview

Today I’m going to skim over a recent passion project of mine that involves yoyo tricks. I have been throwing around yoyos since I was a teenager. Proof: Me, junior year, in the high school talent show (I’m no better today):

The purpose of this post is to provide a high-level overview of an end-to-end analysis in R, starting with unstructured data from YoYoTricks.com and ending with an interactive graph.

The motivation here was to obtain/create a catalog all yoyo tricks and how they relate to one another. This could be useful in determining the most foundational yoyo tricks, the most complex yoyo tricks, or to even in creating a recommendation engine to give users a selection of new tricks to learn based on what they know.

Scrape

So, the first step in the process was scraping this index to obtain a list of all yoyo trick names and the link to each trick tutorial. For this, I used R (rvest for HTML parsing, dplyr for transformations) and SelectorGadget to help define what to CSS classes to collect.

The code below will scrape the site map to obtain a data frame containing the trick name (title) and it’s corresponding link.

read_html('https://yoyotricks.com/sitemap/') %>% html_nodes('.cat-item-1 a') %>% html_attrs() %>%
  lapply(function(row) {
    data.frame(title=row['title'], link=row['href'], stringsAsFactors = FALSE)
  }) %>%
  bind_rows() %>%
  filter(grepl(x=link, pattern='https://yoyotricks.com/yoyo-tricks/.+')) %>%
  head(5) %>% kable()
title link
First Base 90 Second Review https://yoyotricks.com/yoyo-tricks/first-base-90-second-review/249259/
Rolling Inverted GT https://yoyotricks.com/yoyo-tricks/rolling-inverted-gt/248367/
Snack Yoyo Mini Review https://yoyotricks.com/yoyo-tricks/snack-yoyo-mini-review/247082/
Slack Heart Mount https://yoyotricks.com/yoyo-tricks/slack-heart-mount/246171/
Large Haugan Collider https://yoyotricks.com/yoyo-tricks/large-haugan-collider/244899/

Enrich

From here, I iterated over each trick page, scraping for enrichment data, such as trick category, tags, and (most importantly) references to other tricks.

id title author category tags link
1 Vanish Grind Adam B. Yoyo Tricks;Yoyo String Tricks (1A) Green-Triangle;grind https://yoyotricks.com/yoyo-tricks/vanish-grind/233896/
2 Daniel Day-Lewis Adam B. Yoyo Tricks;Long String Tricks https://yoyotricks.com/yoyo-tricks/daniel-day-lewis/232973/
3 Bouncy Castle Jake E. Yoyo Tricks;Yoyo String Tricks (1A) Hop;Green-Triangle https://yoyotricks.com/yoyo-tricks/bouncy-castle/228525/
4 Monochrome Cory H. Yoyo Tricks;Yoyo String Tricks (1A) Slack-String https://yoyotricks.com/yoyo-tricks/monochrome/224191/
5 Wax On, Wax Off Jake E. Yoyo Tricks;Yoyo String Tricks (1A) Repeater;chopsticks https://yoyotricks.com/yoyo-tricks/wax-on-wax-off/224190/

With an enriched trick catalog in-hand, I did some R-fu to produce a data frame of references between tricks.

In the table below, each number corresponds to a trick in the catalog table. The column weight indicates how many times the from trick references the to trick.

from to weight
1 294 1
2 90 2
2 235 2
2 281 2
2 335 3
3 4 2

Visualize & Explore

We now have everything needed to create a network graph visualization:

  • nodes (trick catalog)
  • edges (trick references)

Whoa, that’s messy! Regardless, underlying graph structure is useful for a variety of analyses.

Let’s see which tricks are referenced the most:

Next, let’s a few tricks that are the most similar, based soley on referenced tricks in common.

Note that the similarity measure is ignoring many important factors, such as trick style, category, etc, but this is sufficient for this exercise. Also, to simplify things, only complex tricks are considered (tricks which reference more than 5 other tricks).

At a glance, the tricks “Gravity Whip” and “Independent Tangler” have high similarity, as indicated by the width of the edge connecting them.

A glance at the sub-graph of these two similar shows they do indeed share many references to other fundamental tricks:

Just how similar are they? One measure, the jaccard similarity coefficient, is obtained by dividing the number of tricks in the intersection by the number of tricks in the union of the two trick reference sets.

## [1] "Similarity score: 0.555555555555556"
## [1] "55.56% of referenced tricks in common"

Conclusion

If you’re interested in further data exploration, I encourage you to spin up a kernel on Kaggle with the YoYoTricks dataset:

https://www.kaggle.com/dm3ll3n/yoyotricks

If you discover something interesting, let me know!

Until next time,
Donald

Related

comments powered by Disqus