Strong Bad Email Statistics

From Homestar Runner Wiki

(Difference between revisions)
Jump to: navigation, search
(+2 images)
(Added By Length section)
Line 1: Line 1:
Various statistics of interest involving [[Strong Bad Email]] data.
Various statistics of interest involving [[Strong Bad Email]] data.
-
==Graphs==
+
==Strong Bad Email By Length==
-
===Line Graph===
+
This section involves data taken from the list [[Strong Bad Email By Length]].
-
[[Image:SBELinReg.png|thumb|400px|left|Linear Regression line shows an increase over time]]
+
[[Image:SBEScatter.png|thumb|200px|left|A scatter plot of chronological number vs. length, with outliers.]]
-
===Scatter Plot===
+
 
-
[[Image:SBEScatter.png|thumb|left|Scatter Plot]]<br>
+
 
 +
*The scatter plot shows a fairly strong positive correlation between Email Number and Email Length.  The r value between these two variables without deleting outliers is .844. 
 +
**A r value of 1 would indicate a perfect, positive correlation.  A value of -1 indicates a perfect, negative correlation.  Therefore, .844 indicates a fairly strong, positive correlation.
 +
*This plot shows there are a handful of clear outliers which are likely effecting the correlation.  In the plot below, the outliers have been removed.  A Least Squares Regression Line (LSRL) has also been added.
 +
**The outliers were defined as those emails with a residual value of 40 or greater, or -40 or less.
 +
 
 +
 
 +
 
 +
[[Image:SBELinReg.png|thumb|200px|right|A scatter plot of chronological number vs. length, without outliers.]]
 +
 
 +
 
 +
 
 +
*The LSRL can be used to extrapolate, or guess the length of future emails.  The r value of this line is .946.
 +
**The equation for the LSRL is y = 1.3848x + 44.831.  y = Time (seconds); x = Email number
 +
*This method of guessing is not 100% accurate, since it is unlikely the e-mails will ever be, say, 20 minutes long.  This equation should not be considered a foolproof method for guessing the length of an e-mail.

Revision as of 08:02, 27 March 2005

Various statistics of interest involving Strong Bad Email data.

Strong Bad Email By Length

This section involves data taken from the list Strong Bad Email By Length.

A scatter plot of chronological number vs. length, with outliers.


  • The scatter plot shows a fairly strong positive correlation between Email Number and Email Length. The r value between these two variables without deleting outliers is .844.
    • A r value of 1 would indicate a perfect, positive correlation. A value of -1 indicates a perfect, negative correlation. Therefore, .844 indicates a fairly strong, positive correlation.
  • This plot shows there are a handful of clear outliers which are likely effecting the correlation. In the plot below, the outliers have been removed. A Least Squares Regression Line (LSRL) has also been added.
    • The outliers were defined as those emails with a residual value of 40 or greater, or -40 or less.


A scatter plot of chronological number vs. length, without outliers.


  • The LSRL can be used to extrapolate, or guess the length of future emails. The r value of this line is .946.
    • The equation for the LSRL is y = 1.3848x + 44.831. y = Time (seconds); x = Email number
  • This method of guessing is not 100% accurate, since it is unlikely the e-mails will ever be, say, 20 minutes long. This equation should not be considered a foolproof method for guessing the length of an e-mail.
Personal tools