Screen Shot 2014-08-19 at 7.12.44 PM

RIP.

It’s the twilight of the nerds. Along with Cam Charron, Darryl Metcalf (creator of Extra Skater) has been hired to join the nascent analytics department in Toronto. That’s terrific news for them personally and for Leafs fans in general, but it means a massive brain drain for NHL fans.

I expect ExtraSkater.com to be gone for good. Metcalf likely won’t have time to devote to it, and it might represent a conflict of interest to work on it. With that hiring, the Maple Leafs have taken away the single best resource for hockey fans and geeks and writers and coaches. Plus the other 29 GMs.

We can’t allow that. We need to replace Extra Skater.

If you’re not aware, ExtraSkater.com launched last fall as an all-purpose statistical resource for the NHL. The site scraped the NHL’s official data and presented it in a much more useful and convenient way. I was a huge fan and used it in nearly every article I’ve written on RMNB in the last year.

Point of order: the shutdown of Extra Skater, it turns out, seems to have had nothing to do with the NHL updating its websites’ terms of service last week. Here’s what the TOS say now:

For example, you may not:

[...]

Engage in unauthorized spidering, scraping, or harvesting of content or information, or use any other unauthorized automated means to compile information;

As Jen LC said (btw, follow her on Twitter; she’s brilliant), the new NHL.com TOS are kind of vague and seem to be targeted more towards pirated video feeds and user-data harvesting rather than stat geeks.

Plus, Elliotte Friedman’s take on the the Extra Skater shutdown explicitly said the TOS change wasn’t a factor:

And Greg Wyshysnki at Puck Daddy seemed to echo Jen’s impression in speaking with league representatives:

The League claims that’s not the case [that they're opposing stat sites]. They know sites are harvesting data from NHL.com with software; they say the provision was “not added to counter anything that we know of at this moment.”

So it’s clear that a stat site like Extra Skater is not naughty. I’d also propose that it’s essential. It’s Metcalf’s choice if he wants to remove his site, but hockey’s intellectual commons need a resource like it. It creates deeper understanding of the game, it helps writers fill column inches, and it’s vital to comprehensive arguments for staffing changes.

I think a replacement for Extra Skater is needed, possible, and imperative. So let’s do it. Let’s start the conversation right here.

Is this a good idea?

Yes.

Although there may be other entrants into the space, I think we can create something novel and explicitly public. I think the technology (discussed below) will make the job easier. I think Darryl’s site (still available on the Wayback Machine!) is a great model to build on. I think the appetite and passion to create it are there.

It’s a good project that can be done well.

What’s the pitch?

We combine a ton of existing technology to create the Son of Extra Skater (please don’t suggest names in the comments– that should come WAY later).

We’ve already got a lot of advantages:

  • The lessons of Extra Skater– both UI and data
  • Platforms for interaction and visualization like Twitter Bootstrap and Google Charts API
  • Existing scripts for reading NHL play-by-play and time-on-ice data
  • Either volunteer labor or crowdfunding
  • A centralized source control solution like GitHub
  • Cheap cloud hosting like AWS or Azure

Finally, and this is the important part: this would be open source and under a Creative Commons license. The source code for this new site would be hosted on a site like GitHub so that it the community won’t ever lose the accumulated knowledge of the project.

The eventual site would likely sell ads to cover hosting and maintenance.

In addition to supplying similar functionality to ExtraSkater.com (RIP), the new site might also link to torrents to actual data tables for use in applications like Tableau and R. We could increase the level of peer review in the industry by standardizing becoming more transparent.

Who’s gonna do this?

Me, for starters. I’ve got 15 years of experience of building complex systems for thousands of users. I can write requirements, specifications, and project plans. I know the technology and I can make the prototypes. I can organize this thing, but I’m not gonna write the final code.

I’ve already heard from a dozen developers today who have offered to help out, but we’d need a team. We’d need a DBA for the back-end and the data scraping. We’d need a front-end UI developer– maybe two. We’d need an IT person to get the server up. More on this below.

How should we fund it?

It depends on how we build it. I can imagine three options:

  1. Crowdfund it. Then we’d write specs, make a prototype, build a development team (likely offshore), and make the thing.
  2. Volunteer only. In this case, the start-up costs would be probably be smaller.
  3. Hybrid. Raise funds for core functionality and then allow volunteer developers to add to the code base via GitHub.

How much will it cost?

Depends on how we build it. If we were to hire offshore developers, I’d estimate less than $10,000.

If we were to pay everyone for their work, maybe as high as $22,000.

If someone has a big piece of the puzzle all ready (i.e. the back end), then around $15k.

If everyone works for free, it’ll just cost our operating fees.

Those are wild guesses. We’ll find out when we pick a development plan and lock down requirements.

What technology should we use?

I’m open to suggestions, but I think the best bet would be as follows:

  • PHP server-side programming language
  • MySQL database
  • Twitter Bootstrap for user interface
  • Google Charts API for data visualization
  • GitHub for source control and project management
  • Amazon Web Services or Azure for hosting

Plus a whole lot more. I know Darryl used most of that tech for ExtraSkater.com. I’m familiar with all of it, and based on what I’ve seen most of the volunteers are too.

And it’d be all open?

Yes. The source code would be free for anyone to use or copy or whatever. That source code would be around even if someone dies or gets hired by Edmonton (same thing). The source code would be free and open source, available for download on GitHub. I would not own it, and RMNB would not own it. It’s for the people.

(The site would be ad-supported– that’d help us pay for hosting and maintenance.)

Of course, anyone could download the source code and spin off their own site whenever they feel like it.

I’m in. What’s next?

Well, first we gotta build consensus. Does anyone think I’m full of it? Is there something I’m missing? Are we okay with building something and then giving it away? Do we wanna build it ourselves or just raise money and then I’ll hire some developers to do it? Does anyone have any existing technology we can build off? Is anyone planning a similar project? Should we join forces with them? Does Darryl just wanna put his source code up on Git right now and we’ll forget this ever happened?

Once we’ve made those decisions, here’s how I’d see us going

  1. Write requirements. They’d be public and transparent, perhaps authored by me, with as many collaborators and commenters as are interested.
  2. Develop a prototype. Make a clickable model of the interface to validate our requirements and provide a resource for developers.
  3. Develop back-end functionality. This is the stuff users don’t see. We’d use existing scripts to get and organize NHL data, then store it in a useful schema, and then optimize it to be used by the UI. This is maybe the hardest part.
  4. Develop the front-end functionality. This is the actual interface that everyone uses. Like ExtraSkater.com, it’d rely heavily on functionality provided by the Bootstrap platform.
  5. Develop iteratively additional functionality. Bells and whistles.
  6. Test and launch! Kegger at my house afterwards.

For now, I want to open up the conversation. Let’s use the comments below to exchange ideas.

If you’re interested in volunteering labor, tell us who you are what you know. I’ll build a spreadsheet of everyone’s contact info and expertise.

If you’re interested in donating, let us know. It’ll help us gauge interest.

We can do this.

Tagged with:
 
  • http://stjude.org Matthew Helm

    Crowd source it. $500 donation= garunteed NHL front office job.

  • swhirly

    This is awesome. I’m a graphic designer, therefore have nothing to offer but applause and goid wishes.

  • Shaun Phillips

    Sent you guys a PM on facebook. Hope I can help!

  • http://www.russianmachineneverbreaks.com Ian Oland

    I’m so down to help and call dibs on the CSS and logo.

  • Chris Cerullo

    Sounds awesome and I hope it eventually pans out. I’m somewhat computer dumb so I am of no help.

  • Shaun Phillips

    Always need beta testers. :)

  • Jeff Fairchild

    I would like to help. Will you definitely be using PHP? May I suggest Ruby?

  • Brad Tumy

    Maybe look at an existing open source analytics packages vs trying to write your own? http://www.elasticsearch.org/overview/

  • chris

    I would go with Rails or Python/Django over PHP.

  • shiny

    I’d love to help but I’m barely computer literate as it is. I’ll chip in a few bucks (it’s all I got, I’m broke) to help get it off the ground as necessary

  • EricJG

    I sent a Facebook message regarding this.

  • Michael Gallimore

    I will gladly fork over $. No real technical skills to offer.

  • HockeyCoachBen

    Willing to help however needed…technical or otherwise.

  • http://puckplusplus.wordpress.com Matt Cane

    I’d be happy to help out with the back-end side of things. My expertise is mostly on the scraping/parsing and database/SQL side of things but willing to pitch in with whatever needs doing.

  • Sarah

    I’m just relieved to hear that we didn’t lose ExtraSkater either to the #WrathOfBettman or to Google’s sharks. I’ll cheerfully donate Chipotle gift cards and sympathy to this brave endeavor.

  • Jeff Yoders

    AWS. It’s awesome.

  • Graduate Blackhawks

    I’m learning PHP, so I can help with that. I’ll send y’all an email as well.

  • Shaun Phillips

    Same here. Yay for data/schema geeks! :)

  • Lisa

    Create a way to donate and I will. Good luck.

  • Shaun Phillips

    Agreed. We’ve used it for some webmap hosting. There’s been a few kinks to work out but that’s mostly been on the ESRI software side of things and our own IT dept, not Amazon.

  • Shaun Phillips

    Will trade SQL code for Chipotle.

  • Iggy

    I’d be happy to donate

  • Iggy

    Didn’t mean for that to post immediately. I’m also wondering if there can be a forum area where people can discuss specific stats, processes, or try to come up with new measurements.

  • Sarah

    Hey, you can’t code on an empty stomach. Want guac?

  • Shaun Phillips

    I will pass on the guac to avoid burrito confusion and accidentally killing our fearless leader in this hockey+stats geek+comp sci braingasm.

  • Sarah

    Yikes, good point. Still, by now I think Chipotle owes Peter some sort of backing for all this free advertising he’s given them, whether it’s sponsoring the new site or guacless burritos for life.

  • James Mirtle

    Sounds good. I’d pay for a site as good as Extra Skater for sure.

    You may have competition, too, which will be interesting.

  • DustinPenncakes

    I like the idea and would help fund it.

  • Brouwer Rangers

    I’m worried this sounds like a heavy lift that might distract you from the important work you’re doing at RMNB.

    That said, if you need pun-based, human-mascot friendly ideas once the name issue is addressed, you know how to get in touch with us.

  • Bob

    And PostgreSQL instead of MySQL.

  • Diller M

    I don’t understand any of what was just written. Except the extra skater part

  • Sarah

    I’m worried that a full time job, RMNB, and now this will distract Peter from deciding on a girlfriend. And that could result in extra rants.

  • CM

    Happy to help. In my real life I’m a mathematician specializing in statistical target detection and numerical modelling (despite the names sounding relevant, neither of those would be helpful!) I’ve also got plenty of experience with SQL, and Python/Django.

    My heart would be in doing analysis on the scraped content.

    Good luck. I really hope someone succeeds in this space.

  • http://puckplusplus.wordpress.com Matt Cane

    Agreed on that.

  • Jon Farrell

    HELL YES PETER! Best idea I’ve heard all day, I would donate to this for sure.

  • Pat Magee

    https://www.youtube.com/watch?v=FQRW0RM4V0k

    I’m sure Greenberg would be down to clown….

  • Nick Thurston

    I’d love to be a part of this project. I am a software/web developer by trade, currently working at a company on several PHP/MySQL based web applications. The majority of which use bootstrap front end for UI. So I’m very familiar with the stack you’ve outlined and would love to get involved. I’m curious if you’re considering any PHP frameworks or if you intend to write something custom. I’ve started using Laravel for most of my new projects and cannot recommend it enough. Let me know what I can do to help.

  • http://www.russianmachineneverbreaks.com Ian Oland

    THE CORSI HORSIES

  • http://www.russianmachineneverbreaks.com/ Peter Hassett

    stay tuned

  • Sarah

    Huzzah! Sounds like Panera girl turned out to be a Caps fan. This is almost as exciting as a new stats site! #FifthMatryoshka #RMNBwedding2.0

  • Andrew Merewitz

    I’d second Python over PHP

  • Trevor Schiavone

    Sent you guys a PM on facebook. I’d love to help out with this if it gets going

  • ezzeloharr

    Was secretly planning on starting something like this myself – I’d be happy to contribute monetarily or with my time as needed.

  • Joanna Farmer

    This sounds like a great idea! I’d be willing to help too. Most of my experience is in JavaScript (experience with Google Charts too), plus HTML/CSS. I’ve dabbled in PHP, but from the comments below, looks like you’ve already got a few back-end volunteers.

  • Royal Arse

    Invested a few hours to source some raw data and upload it. Needs a TON of work, but we have a working prototype here called Next Skater.

    Visit nextskater.meteor.com for more info.

    My background is doing web development and stuff like this: http://www.hamiltoneconomicsummit.ca/economy-animation.html

    Love to hear more feedback, I will be continuously improving on this.

  • Dave

    I’d third it. Same with Postgres.

  • http://www.russianmachineneverbreaks.com/ Peter Hassett

    I’ve been on a few separate email threads tonight talking with interested parties, but I’m not sure I’ve got what I need yet. I think I’ll know tomorrow.

    My goal is to build a high-quality, utilitarian stat site that is based in free, open-source software. If it turns out that another group is building something similar, I’ll either support them or try to work in parallel.

    Hmm.

  • http://www.russianmachineneverbreaks.com/ Peter Hassett

    I’m agnostic on language, but I had an offshore team I could use in PHP MySQL. I’m flexible.

  • Enigmatic Russian

    I’d happily donate.

  • dylan wheatley

    talk to the r/hockey troglodytes they’ll help with this for sure

  • Jim

    I’d gladly help out on the DB end, specifically design & support. You get the data in there, and give me a UI to plug into, and I can make sense of the in between.

  • Chris Cerullo

    I would definitely do this. I need this site to happen for my future articles.

  • http://openid.aol.com/jaredvolkl jaredvolkl

    I’d contribute to this if it were to be done in Ruby.

  • Fedor

    It was one of my life to beat Peter to #RMNBwedding2.0, but uh, okay.

  • JenniferH

    So glad I’m not the only one.

  • Shaun Phillips

    Ugh. Hate PostgreSQL. Maybe just because I learned MySQL/SQL Server first.

  • lizmcneill

    Even if you can’t create web design/CSS yourself, they could probably use your feedback.

  • Shaun Phillips

    Just curious why? I haven’t done enough postgre to really know any advantages/disadvantages of it.

  • John

    Insies.

  • https://github.com/ThrowsException Chester O’Neill

    Postgres is far stricter with datatypes and usually considered much safer. MySql has the tendency to play a little fast and loose with your data types and tries to automatically convert them in update and set statements. That said MySQL is generally easier to use and get started while Posgres is a much “heavier” database engine because it is much more SQL compliant than MySql.

  • https://github.com/ThrowsException Chester O’Neill

    This sounds like an awesome idea Id be happy to help. I’m a career programming that’s worked on large scale web projects in the past. How did extra skater get their stats in the past anyway?

  • https://github.com/ThrowsException Chester O’Neill

    Seconded

  • Shaun Phillips

    Thanks! I’ve only used MySQL, SQLite, and SQL Server. I’ve looked at some Postgres a bit and ran away in terror over the strange terms/syntax. :)

  • https://github.com/ThrowsException Chester O’Neill

    Yea, the syntax is a bit odd once you’ve looked at the other engines but it is actually the most SQL compliant and arguably the most powerful. Postgres has been getting a lot more traction lately over mySQL because even simple applications now are starting to deal with a lot of data. But this conversation is probably better taken somewhere else =P

  • Anže

    I’d love to help. Can help with server-side stuff or front-end stuff. Regarding the technology I’m adaptable, but if I were to choose I would skip PHP for rails or django.

  • Matt Steeves

    I’d be happy to donate to the cause… stats for the people!

  • jrowny

    AWS is great, they have everything you need in one place. File storage with S3, CDN, ec2 instances etc. But when you’re just getting started, you may look into DigitalOcean. Ridiculously easy VPS for $5/month. Linode is another cheap VPS host with a great reputation at $10/month.

    Second, a large amount of data on your site will be cachable. Use VARNISH! Varnish will reduce your hardware costs and increase productivity significantly!

  • jrowny

    PHP is totally fine, it’s just so unpleasant to work with. Python is good too, but I’m less fond of Django (A python web framework). It just does a little “too much.”

    I’d also suggest that if you have competent front-end people, you don’t need Bootstrap. Bootstrap is incredibly heavy unless highly customized and you do not need 90% of what they give you. It’s great for nerds who can’t design but a waste of bandwidth if you have some decent front-enders.

  • jrowny

    The latest versions of Postgres also allow you to store JSON as a data structure. So it’s like having a mongo or couch in your RDMS! Pretty neat.

  • http://www.russianmachineneverbreaks.com/ Peter Hassett

    90% of my DB experience is in SQL/MySQL and then the leftover is all NoSQL. I’ll have updates on this later, but I think it’s fine if we choose a technology that the developers prefer. I didn’t know about using JSON in the DB. That would make a ton of data-access layer and integration stuff (.e.g Google Charts datavis) a lot easier.

  • Royal Arse

    Also, the code is on Github: https://github.com/DeBraid/nextskater

  • Philippe Serhal

    I knew someone would propose this within days. Alright, I’m in. A few notes:

    – AWS should be preferred if only because almost every Web developer out there today is familiar with it already.

    – For similar reasons and others, I’m not sure why you’d start a new data-driven dynamic site from scratch using PHP+MySQL in 2014. Python is now the lingua franca here, and NoSQL approaches should definitely be considered (but lack of wide adoption might be a big con here, following my own reasoning).

    – Bootstrap is great, but I think you’re overestimating its scope? It’ll give us some nice navbars and pretty buttons and modals and alerts, but that’s all aesthetic.

    – I would give a huge, huge vote to d3.js (https://github.com/mbostock/d3/wiki/Gallery) over the Google Charts API. D3 is the most amazing JavaScript library ever written. I’m not kidding. I’d love to play around with Parallel Coordinates (demo: http://exposedata.com/parallel/) and Crossfilter (http://square.github.io/crossfilter/) and so on.

    Anyway, I’m in. I can work on pretty much the whole stack, but I can’t design interfaces or anything that’s meant to look pretty to users.

  • EB in LA

    Replacing Extra Skater is a worthwhile endeavor. I’d like to help.

    I’m currently a web dev, developing in PHP and MySQL. I’ve done lots of web scraping in perl. I’m pretty adaptable if you end up going with different technologies.

    More important, I’m a long time hockey fan, long time Caps fan and follower of advanced stats in hockey and other sports. (I’ve followed this site for a while, but this is my first post).

  • dannypage

    I’m experienced in Ruby on Rails and Python/Django frameworks. If we get a github up, it will be a breeze to get this started! Let me know if you need more info for contact. (@dannypage on Twitter)

  • Schugaze

    down to be a co-sugar daddy.

  • Brian C

    I’m a software engineer with several years of web app development experience – .net stack and python/django, mostly middle and backend. Would love to volunteer and help out.

  • Mike M

    I’d recommend setting up something like a Trello Board where everyone can brainstorm/collaborate. Failing that, lets get a github project going as soon as possible so people can start committing code.

  • Graham Jackson

    I would love to help out. I would recommend for a stack: Ruby on Rails (back-end), Ember (front-end), and PostgreSQL as a DB.

  • Miles Davis DaCat

    I’m in as long as it remains open-sourced. Can assist w/ programming & back end stuff — not really a front-end GUI or WordPress layout guy but can learn anything. Have experience in a variety of programming languages, scripting languages, dbms’, data harvesting, etc.

    I’m agnostic as well, but old school-trained (Oracle, IBM, Java, Perl, PHP, MySQL, etc.) but have learnt the *new* stuff when necessary for various projects on the side and at work (i.e. JSON, REST, OAuth, etc.).

    Should this evolve into an open source project (i.e. sourceforge, OSI, etc.) let me know & I’d be more than happy to assist.

  • Louis Roy

    I’m a full-stack developer with 7+ years of experience (PHP / MySQL / HTML / CSS / JavaScript / Insert any other Web technology here)

    I’ve been looking at starting a hockey related project for some years now and I’d be happy to help.

    GitHub profile : https://github.com/louisroy/

  • http://twitter.com/1bscarbro Benjamin Scarbro

    I had a geocities website once.

    I know how to bold something.

    Otherwise, I’m useless. Can I still come to to kegger?

  • Shaun Phillips

    Mongo like candy

  • Trevor

    I have no technological skills to offer, but I’d be happy to donate 20 or 30 bucks. This would be so great

  • Matt Riegler

    Hey name’s Matt Riegler and I am volunteering my services any way that I can.

    Primarily I’m offering my services as a writer and stathead, as I’m not savvy with coding.

    Grew up a stat and puck head since childhood, keeping my own stats and working as Editor In Chief at multiple sites.

    Whatever I can do to help make this site a reality, and the best that it can be, I’m in for.

    You can find me @MattRiegler. Please respond to me there.

  • Christy Rodriguez

    Hi! I wanted to help and contribute in any way I can…I do a lot of IT tech support work as well as testing for both workstations and servers..I have experience with HTML and XML scripting. I know it may not be much but I’m more than willing to help in any way that is needed. Will test and work for cookies, please!

  • Rob

    As I said on Twitter I’m in to help with anything technology related, even as a volunteer.

  • Dave Berenbaum

    I’m interested in helping. I have experience with data analysis, modeling, and a wide variety of programming and web development, including Python (and Django), C# (and .NET), Java, JavaScript (and some d3 and various other APIs), HTML, SQL, R, and a little bit of C and Bash scripting. I’m also a big hockey fan and would love to support an effort to move hockey along in the advanced stats world.

  • Emir

    Most definitely interested in contributing. I don’t have any programming or coding experience, but would be willing to help out in other ways (beta testing, project management, future developments, etc.).. feel free to get in touch

  • andvari101

    I’d be available to help code. 10 years of industry experience in Python and Php, but I’ve used just about every technology mentioned in the comments so far in one form or another.

    I like the idea of a trello board. We use that service for projects at work along with pivotal tracker and flowdock, which also might be worth looking into.

  • John M

    If it provides a distraction from writing more articles about Brooks Orpik, maybe that will be a healthy development.

  • simmsation

    The parsing of the PbP files generates csv files. A second set of parsing shall then turn the on ice events into linemates, attempts, scores, strength, opposition line etc. That shall occur in Excel with logic. That is my area of expertise. Then we upload to the database of choice. Present it as you wish.

  • John K

    If possible: take data from capgeek.com – they at least allow you to grab the current years salary amount, and begin some statistics such as p/60 per 1M and the like (scoring efficiency vs cost). It would be an interesting way to look at how well teams spend money in general.

  • http://www.russianmachineneverbreaks.com/ Peter Hassett

    The really clever thing Daryl did is begin to scrape other sites and match up the data. That could be something this site could do eventually to generate stuff like GVS on the fly.

  • GrnEggsNHam
  • http://blog.23x.net @jearle

    Avoid “flavour of the month” systems. I’m going to suggest HA LEMP with Varnish. Yes, stacking Varnish and Nginx may seem like overkill, but it isn’t. PHP means a broad amount of support from the community, where FotM systems will get you nothing but unscalable code with political one-upmanship. :)

    Ps. Sysadmin here, not a developer. Use that to shape your opinion, either way, of my suggestion.