(Acest articol a fost publicat pentru prima dată pe pacha.dev/blogși a contribuit cu drag la R-Bloggers). (Puteți raporta problema despre conținutul de pe această pagină aici)
Doriți să vă împărtășiți conținutul pe R-Bloggers? Faceți clic aici dacă aveți un blog sau aici dacă nu.
Am fost ocupat cu examenele de teren, așa că nu am avut prea mult timp să lucrez pe blog.
Pachetul SpuriousCorrelations a început ca un proiect distractiv pentru unul dintre tutorialele mele.
Iată un caz de corelație interesantă: numărul de persoane care s -au înecat căzând într -o piscină și numărul de filme în care a apărut Nicholas Cage.
library(spuriouscorrelations) library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats': filter, lag
The following objects are masked from 'package:base': intersect, setdiff, setequal, union
library(ggplot2) unique(spurious_correlations$var1)
(1) Suicides by hanging, strangulation and suffocation (2) Number of people who drowned by falling into a pool (3) Number of people who died by becoming tangled in their bedsheets (4) Murders by steam, hot vapours and hot objects (5) Computer science doctorates awarded in the US (6) Sociology doctorates awarded in the US (7) Civil engineering doctorates awarded in the US (8) People who drowned after falling out of a fishing boat (9) Drivers killed in collision with railway train (10) Total US crude oil imports (11) Number of people who drowned while in a swimming-pool (12) Suicides by crashing of motor vehicle (13) Number of people killed by venomous spiders (14) Mathematics doctorates awarded 14 Levels: Civil engineering doctorates awarded in the US ...
drownings <- spurious_correlations %>% filter( var1 == "Number of people who drowned by falling into a pool" ) %>% select(year, var1, var2, var1_value, var2_value) cor(drownings$var1_value, drownings$var2_value)
Acum să complotăm datele.
# compute a scale factor so that max(var2_value * factor) ≈ max(var1_value) max1 <- max(drownings$var1_value) max2 <- max(drownings$var2_value) ratio <- max1 / max2 ggplot(drownings, aes(x = year)) + geom_line(aes(y = var1_value, color = "Drownings")) + geom_line(aes(y = var2_value * ratio, color = "Films")) + scale_y_continuous( name = "Number of drownings", sec.axis = sec_axis(~ . / ratio, name = "Number of films" ), limits = c(0, NA) ) + scale_color_manual( name = "", values = c( "Drownings" = "blue", "Films" = "red" ) ) + theme_minimal() + labs( title = "Number of people who drowned by falling into a pool vs.nNumber of films Nicholas Cage appeared in", caption = "Source: Spurious Correlations (Vigen 2015)" )
pak::pkg_install("pachadotdev/spuriouscorrelations")