Talk:Strong Bad Email Statistics

From Homestar Runner Wiki

(Difference between revisions)
Jump to: navigation, search
(Updates: reply for DMurphy)
(Updates: +reply for Stux re: excel files for email database (xls))
Line 84: Line 84:
* Also, I'm not entirely familiar with Wiki scripting, but I'm not sure it would be able to, for example, calculate the least squares regression line or the rank of the length, so it may be best served in an excel file and it's just a matter of (a) figuring out how to host the file on the wiki or (b) finding a dedicated host that will update it after every e-mail, but who also knows how to calculate the LSRL equation (as explained in the article).  I think this article could be a LOT better if we figure out how to open the database to the community.  But, I know I'm tired of spending time updating graphs, so if there's someway to have wiki calculate the LinReg graphs and other things, please, be my guest.  And if it's too much to update the LSRL every week, we could just base it on the first 100 data points... that should be accurate enough anyway. -[[User:DMurphy|DMurphy]] 23:30, 31 January 2006 (UTC)
* Also, I'm not entirely familiar with Wiki scripting, but I'm not sure it would be able to, for example, calculate the least squares regression line or the rank of the length, so it may be best served in an excel file and it's just a matter of (a) figuring out how to host the file on the wiki or (b) finding a dedicated host that will update it after every e-mail, but who also knows how to calculate the LSRL equation (as explained in the article).  I think this article could be a LOT better if we figure out how to open the database to the community.  But, I know I'm tired of spending time updating graphs, so if there's someway to have wiki calculate the LinReg graphs and other things, please, be my guest.  And if it's too much to update the LSRL every week, we could just base it on the first 100 data points... that should be accurate enough anyway. -[[User:DMurphy|DMurphy]] 23:30, 31 January 2006 (UTC)
*:Hi DMurphy! Again, thanks for the reply to my inquiry.  I have emailed you about a copy the Excel file. As for wiki-fying some of the graphics: at the very least one of the graphics can be wikified: [[:Image:SBESTimeSpentChart.png|SBESTimeSpentChart.png]] doesn't need all that text on a picture, and can instead be turned into text for [[Strong_Bad_Email_Statistics#Total_Time_Spent_Using_Each_Computer|this]] section with the static graphics used in the original picture. That is probably the easy part.  I seriously doubt that this wiki has dynamic code generation implemented for security reasons.  My guess is that the best bet to do this is have a template of sorts which a local program (such as a Perl Script) can generate the appropriate wiki code that can be cut and pasted onto the real page. This would at least allow on-the-fly recreation for most of the data once a new email is released.  The hard part is then making those graphs. However I do think that there are command-line programs that would let you make them, but they'd likely look different from the excel versions seen now. This would also flood the file history with new uploads (not necessarily a bad thing).  It might be more feasable if this graph is updated every 1 or 2 months based on SBE frequency.  Regardless, some local scripting would make the job a lot easier. Once I get the Excel files i'll play around with things and see what I can get. Thanks! --[[User:Stux|Stux]] 20:21, 1 February 2006 (UTC)
*:Hi DMurphy! Again, thanks for the reply to my inquiry.  I have emailed you about a copy the Excel file. As for wiki-fying some of the graphics: at the very least one of the graphics can be wikified: [[:Image:SBESTimeSpentChart.png|SBESTimeSpentChart.png]] doesn't need all that text on a picture, and can instead be turned into text for [[Strong_Bad_Email_Statistics#Total_Time_Spent_Using_Each_Computer|this]] section with the static graphics used in the original picture. That is probably the easy part.  I seriously doubt that this wiki has dynamic code generation implemented for security reasons.  My guess is that the best bet to do this is have a template of sorts which a local program (such as a Perl Script) can generate the appropriate wiki code that can be cut and pasted onto the real page. This would at least allow on-the-fly recreation for most of the data once a new email is released.  The hard part is then making those graphs. However I do think that there are command-line programs that would let you make them, but they'd likely look different from the excel versions seen now. This would also flood the file history with new uploads (not necessarily a bad thing).  It might be more feasable if this graph is updated every 1 or 2 months based on SBE frequency.  Regardless, some local scripting would make the job a lot easier. Once I get the Excel files i'll play around with things and see what I can get. Thanks! --[[User:Stux|Stux]] 20:21, 1 February 2006 (UTC)
 +
::Sorry for the late reply. I will get the excel database to you as quickly as possible. As for the link, I will try to get it working in the next day or so. {{User:The Paper/sig}} 21:39, 6 February 2006 (UTC)

Revision as of 21:39, 6 February 2006

Ding! Strong Bad Email Statistics is a featured article, which means it showcases an important part of the Homestar Runner body of work and/or highlights the fine work of this wiki. We also might just think it's cool. If you see a way this page can be updated or improved without compromising previous work, feel free to contribute.


Contents

Comments on the Progress

This page is coming together quite nicely - I like what DMurphy has created thus far - I think the "No Loafing Pie Chart" was a great choice - it adds a perfect amount of humor to this page. The descriptions of the interpretations of the data are thorough and that's grood, er..good for anyone interested in the interesting field of statistics. I just hope the average fans of Strong Bad Emails will be as enthusastic about the statistical information as those who have a subscription to Nerdular Nerdence! Nevertheless, I think everyone (ability in mathematics notwithstanding) can learn something from the data charts. I hope to see/make a pie chart soon! --The Paper 01:49, 27 Mar 2005 (MST)

...Yep, I have no clue what you all are saying. --Color Printer 06:42, 27 Mar 2005 (MST)

Database

Anyone wishing to help with this project that doesn't want to undertake the tedious task of data entry can request a database from The Paper or me. The database is a very user-friendly excel file which has data on Time, Location, Name, and various other categories for each of the 126 emails. Just e-mail me and I'll be happy to send you a copy of the file. --DMurphy 15:45, 27 Mar 2005 (MST)

Wow!

DMurphy and The Paper- you guys did a great job on this! It really makes this wiki look smart. This rocks! →FireBird 23:58, 27 Mar 2005 (MST)

This page is awesome and beautiful and beautifully awesome. Aurora Szalinski 11:56, 28 Mar 2005 (MST)

My god...

You guys have WAY too much time on your hands.

Yep, they do. --Color Printer 10:56, 28 Mar 2005 (MST)

Not really... any statistics major could put this together in 30 minutes or less. The hard part is data collection, which was already done before I got the idea to start this page. I know it looks like this would take days of work, but Excel makes the project easy to complete. Anyway, it's spring break for me, and I didn't have anything planned for the night I started this, so I decided to make a new page. --DMurphy 16:23, 29 Mar 2005 (MST)

bottom 10

Why do you say bottom 10 has 2 emails? The second one was just Fwd:'s and Re:'s.

Still, after the joke, I think we were supposed to assume that an email eventually popped up. — It's dot com 05:43, 30 Aug 2005 (UTC)

Nominated for featured article

This article has been nominated for article of the week and I for one would like to see that happen — it's a really great article. One thing that needs to happen first, though, is it needs a good opening summary paragraph. I might try my hand at writing one, but it would probably sound better coming from someone who really understands all this statistical stuff. Maybe summarize why these charts where created and what we can learn from studying the emails this way. — wikisig.gif Joey (talk·edits) 04:48, 22 Aug 2005 (UTC)

You know what? I realy like this page, so whan I'm back from school I'll try my hand at writing one. Elcool (talk)(contribs) 05:16, 22 Aug 2005 (UTC)
I'd go for it if a good paragraph was written for it. Joey: a better place to talk about this would be HRWiki:Featured Article Selection. —BazookaJoe 12:39, 22 Aug 2005 (UTC)
So? What do think? Is it good enough? Elcool (talk)(contribs) 15:17, 22 Aug 2005 (UTC)
I'm sure my English teacher would think this is great, E.L. --Ookelaylay 00:55, 31 Aug 2005 (UTC)
Thanks! Elcool (talk)(contribs) 05:21, 31 Aug 2005 (UTC)
E.L. Cool, that's a great intro paragraph. I'm glad to see this on the front page. Way to go! — wikisig.gif Joey (talk·edits) 00:38, 2 Sep 2005 (UTC)
Why, thank you! Elcool (talk)(contribs) 04:17, 2 Sep 2005 (UTC)

Intro paragraph

I think the intro paragraph takes the purpose of the page in the wrong direction. It shouldn't be worded to sound like this is a place for making predictions, because that's not what I came to this page for. I came to look at the charts. (In other words, keep the focus in the past, and not the future.) —BazookaJoe 22:55, 22 Aug 2005 (UTC)

The only part that is about making predictions is "...or how the future emails will be". If you want to change go write ahead. Elcool (talk)(contribs) 05:16, 23 Aug 2005 (UTC)
I like what has been added to the opening paragraph. It reads much more smoother now and I think the users of the wiki will notice. Thanks for the active interest in keeping this article up-to-date and looking smart. I think DMurphy may have left the project, but we're certainly keeping his creation in tip-top shape. Much appreciated. =) —THE PAPER PREEEOW 00:51, 24 Aug 2005 (UTC)

Block Computer

Did you guys count the Block Computer from "Other Days"? Technically, that has 2 emails.--Martin925 23:15, 29 Aug 2005 (UTC)

We have rather "strict" (read arbitrary) guidelines meaning that first computer (or device) that Strong Bad uses in each particular email is the one we consider "used". In other words, we are aware of Block but we do not consider it one of the "others". —THE PAPER PREEEOW 05:39, 30 Aug 2005 (UTC)

Time spent with...

Perhaps a new chart - time spent with each computer? Just a product of "Percentage of emails by computer era" and "Average length of emails by computer era".

Maybe even "Percentage of time Strong Bad spends physically in front of the computer," by email number or by era, but that would mean a lot of new data collection so probably not worth the effort. --phlip TC 04:16, 7 Oct 2005 (UTC)

Scatterplots

I don't know about you, but on the next scatterplots, I would love to have Email titles with arrows pointing to all the outliers, above and below. —BazookaJoe 03:28, 12 October 2005 (UTC)

I will certainly take this into considering when/if I make a new scatterplot graph. Thanks for the helpful suggestion. —THE PAPER PREEEOW 03:45, 12 October 2005 (UTC)

Back

Well, I stopped in to see how things are going today. I've been quite busy with school and other things, but I do have some time at the moment. I added a general statistics section. I'm also about to upload the Time Spent With Each Computer chart someone suggested. As far as adding arrows to the scatter plot goes, it's possible, but would require adding all the labels by hand. As there are about 15 outliers, not only would it be time consuming but it also might get a bit messy. Perhaps just a label for the outliers with the highest residual and lowest residual? Those would be Vacation and Colonization, respectively. --DMurphy 03:51, 12 October 2005 (UTC)

"Unreliable" Date?

I just deleted sb_email 22 was released between Vacation Postcard #5 and invisibility. It's as "reliable" as any others from LiveJournals. Why was this in the "unreliable" section in the first place? Thunderbird 02:00, 17 December 2005 (UTC)

Updates

It would be nice if this page would be updated again. This of course, looks like a lot of work. It would be nice if there was a way to streamline this process and have some scripts that would automate the creation of some of the wiki text elements and charts (nothing too fancy) for inclusion into this page. My guess is that using Excel's string function capabilities would help. Anyway, speaking of excel, the link to the original .xls file is gone. It seems strongfans has reorganized its site but i have no idea where to even begin looking for this file. Does anyone have a copy? It would be nice if the excel file could be kept in the wiki vaults instead of offsite, since we depend on that data for this page so much. --Stux 15:45, 28 January 2006 (UTC)

  • I attempted to upload the file a while back, but the wiki doesn't allow it. I gave a copy to The Paper and InterruptorJones a while back, so they may have maintained the DB. Otherwise, I have a DB that's not updated (I think it goes through 125ish). It really doesn't take a lot to update the DB... the hard part is making all the pretty graphs and it's such a time-sensitive page that it's tough to keep it updated. I think it would be best to just wikify the data and put it on a separate page. No way I'm up for doing that though... heh. If you want to Wikify the graphics too, be my guest, but I have absolutely no idea how to do that. Email me at dolan.murphy@gmail.com for the db. -DMurphy 23:11, 31 January 2006 (UTC)
  • Also, I'm not entirely familiar with Wiki scripting, but I'm not sure it would be able to, for example, calculate the least squares regression line or the rank of the length, so it may be best served in an excel file and it's just a matter of (a) figuring out how to host the file on the wiki or (b) finding a dedicated host that will update it after every e-mail, but who also knows how to calculate the LSRL equation (as explained in the article). I think this article could be a LOT better if we figure out how to open the database to the community. But, I know I'm tired of spending time updating graphs, so if there's someway to have wiki calculate the LinReg graphs and other things, please, be my guest. And if it's too much to update the LSRL every week, we could just base it on the first 100 data points... that should be accurate enough anyway. -DMurphy 23:30, 31 January 2006 (UTC)
    Hi DMurphy! Again, thanks for the reply to my inquiry. I have emailed you about a copy the Excel file. As for wiki-fying some of the graphics: at the very least one of the graphics can be wikified: SBESTimeSpentChart.png doesn't need all that text on a picture, and can instead be turned into text for this section with the static graphics used in the original picture. That is probably the easy part. I seriously doubt that this wiki has dynamic code generation implemented for security reasons. My guess is that the best bet to do this is have a template of sorts which a local program (such as a Perl Script) can generate the appropriate wiki code that can be cut and pasted onto the real page. This would at least allow on-the-fly recreation for most of the data once a new email is released. The hard part is then making those graphs. However I do think that there are command-line programs that would let you make them, but they'd likely look different from the excel versions seen now. This would also flood the file history with new uploads (not necessarily a bad thing). It might be more feasable if this graph is updated every 1 or 2 months based on SBE frequency. Regardless, some local scripting would make the job a lot easier. Once I get the Excel files i'll play around with things and see what I can get. Thanks! --Stux 20:21, 1 February 2006 (UTC)
Sorry for the late reply. I will get the excel database to you as quickly as possible. As for the link, I will try to get it working in the next day or so. —THE PAPER PREEEOW 21:39, 6 February 2006 (UTC)