Net Benefits  

Go Back   Net Benefits > Main Forums > On Case
Register FAQ Members List Calendar Mark Forums Read

Reply
 
Thread Tools Display Modes
  #1  
Old 02-04-10, 03:44 PM
RSwanson RSwanson is offline
Registered User
 
Join Date: August 4th, 2008
Location: Chicago, IL
Posts: 169
Research to Predict NPTE Success

So, in my quest to become a better speaker, I have decided to emulate every aspect of Ankur's personage. So, I am doing some econometrics research to use things like Speaker Points, z-score, opp wins, prelim wins, etc. to model elim round success, most likely at the NPTE. As such, I need tournament data to run through tests to get a model...

So I am calling upon all you TD's and other knowledgeable people who have data, preferably in Excel format (although if you just have the data files that TRPC or some other program produces, I'll gladly take that). It would take me forever (probably to the point of me abandoning this project) to enter in all the data myself, so anything you have is greatly appreciated. Please email me at rswanson@pugetsound.edu if you can help out.

Also, if anyone has any suggestions about other predictors (independent variables) I could look into/things they would be interested in seeing, feel free to let me know via email or a post here. My exact research and method isn't set completely in stone, so if anyone has a good idea, I might incorporate it in my research.

Thanks a lot,
Rob Swanson (nerdier half of the cheetahswan)
__________________
Rob Swanson
RIP Cheetahswan
Reply With Quote
  #2  
Old 02-05-10, 12:40 AM
richmindseed richmindseed is offline
Registered User
 
Join Date: January 15th, 2007
Location: Berkeley, CA
Posts: 176
Send a message via AIM to richmindseed
First, I strongly suggest you get a better role model. Might I suggest the guy who outspoke everyone at your last tournament by 3 HL points? =)

Second, I would suggest the transitive W/L method created at (http://art-of-logic.blogspot.com/200...ransitive.html) as something worth looking at. If you're as nerdy as I am when it comes to the whole "how we should tab tournaments" thing, some of the older posts on strength of schedule pairings, etc might be interesting. The downside here, of course, is that computing these metrics is non-trivial (incidence matrix = sadness), but it's still an interesting look. I also suggest reading the "a numbers game" blog; they've got a lot of similar research (directed at policy debate, but still) that might give you ideas.

Third, make sure to include both JVAR and Z-Score: there is a slight difference in their calculation, so they're actually independent. If I remember correctly, Z-Score adjusts for the population mean/stddev and JVAR does not (link to an old thread explaining the precise formula was posted on the nmh thread). If I am incorrect and they BOTH adjust for the population mean, constructing an unadjusted version to examine would be interesting...

Fourth, a concern with measuring success at the NPTE would be that it's a little harder to measure elim round success in a double-elimination game: would you just look to how far a team got? Would ballot counts factor into it at all? Is there a strength-of-outround-schedule component (whether computed by seeding going into elimination rounds or by final finish of teams or by npte rankings before the start of the tournament) that needs to be included? I think the simplest solution here is to just accept as god-given the final rankings that the npte itself gives teams, since that ensures a stable metric and automatically breaks ties (which could otherwise screw with your model), but it bears thinking about.

If you can't tell, I'm a big fan of this idea, and look forward to reading whatever findings you get. I'd really like to see tournaments publishing more of this kind of data (mentioned a few ideas in that NMH thread), and I welcome any analysis that gives us more insight into our game works. Relatedly, let me know if you want any help for any part of this - I think the number-crunching is best done at one place, but there might be some other hand I could lend, so...
__________________
Ankur Mandhania
Berkeley '10
Reply With Quote
  #3  
Old 02-05-10, 01:46 AM
RSwanson RSwanson is offline
Registered User
 
Join Date: August 4th, 2008
Location: Chicago, IL
Posts: 169
Ankur, that post was tremendously helpful. Thanks, and I'm sure we will be in contact (more than usual) as I go through this research.

Transitive wins looks like a nightmare to compute, especially on the scale that I am planning (I want lots and lots of data).

I think your distinction between z-score and JVAR is correct (at least my intuition says you're right). I will obviously be sure of this before I'm done...

Yes, the final rankings at the NPTE would probably be easiest.. I haven't thought any of the details through quite yet, but one downside would be that there would probably be some multicollinearity involved (seeing as how ties are broken based on prelim seed, which is determined by a number of the factors I intend to include as independent variables).

(To anyone, although especially Ankur): Feel free to suggest other independent variables for me to look into... I've gotten suggestions of number of regular season rounds, maybe total prelim win % during the regular season... all these are interesting.
__________________
Rob Swanson
RIP Cheetahswan
Reply With Quote
  #4  
Old 02-05-10, 02:20 AM
richmindseed richmindseed is offline
Registered User
 
Join Date: January 15th, 2007
Location: Berkeley, CA
Posts: 176
Send a message via AIM to richmindseed
Another parameter that comes to mind: #splits vs #sweeps in NPTE prelims. My gut is that teams that can win convincingly are more likely to do better, but I see an argument for "we can hang with anyone, so all we had to do was get hot in outrounds"...

Total prelim win% has the nice feature of already being computed for you via NPTE rankings. #regular season wins would have to be hand-calculated to account for where each tournament breaks to, etc...this is not hard, given that the data is on the website, but still something to do.
__________________
Ankur Mandhania
Berkeley '10
Reply With Quote
  #5  
Old 02-05-10, 10:06 AM
asmitty's Avatar
asmitty asmitty is offline
regulating on ****ty posters
 
Join Date: October 7th, 2005
Location: stuck in the past
Posts: 765
Send a message via AIM to asmitty
rob--this would be difficult to compute, but one or more of the following 4 stats:

a) total prelim win % vs teams qualifying to the NPTE
b) total elim win % vs teams qualifying to the NPTE
c) total prelim win % vs teams in elimination rounds at the NPTE
d) total elim win % vs teams in elimination rounds at the NPTE

seems like it would have pretty strong predictive power.
__________________
"i was talking to cee-lo backstage, and i asked him "when you were growing up in atlanta, did you encounter any racism?" and he said something really interesting. he said, "i'm kanye west"
--sarah silverman
Reply With Quote
  #6  
Old 02-05-10, 11:48 AM
RSwanson RSwanson is offline
Registered User
 
Join Date: August 4th, 2008
Location: Chicago, IL
Posts: 169
Quote:
Originally Posted by asmitty View Post
rob--this would be difficult to compute, but one or more of the following 4 stats:

a) total prelim win % vs teams qualifying to the NPTE
b) total elim win % vs teams qualifying to the NPTE
c) total prelim win % vs teams in elimination rounds at the NPTE
d) total elim win % vs teams in elimination rounds at the NPTE

seems like it would have pretty strong predictive power.
Yeah unfortunately this would be nearly impossible since I won't have tournament data from all regular season tournaments that go into the NPTE rankings. You are probably right that this would be a pretty strong predictor.
__________________
Rob Swanson
RIP Cheetahswan
Reply With Quote
  #7  
Old 02-05-10, 12:00 PM
asmitty's Avatar
asmitty asmitty is offline
regulating on ****ty posters
 
Join Date: October 7th, 2005
Location: stuck in the past
Posts: 765
Send a message via AIM to asmitty
You might be able to limit your data set to all tournaments getting the full bonus--this would make the project much more manageable in scope and would probably not interfere with the validity of your findings vis a vis npte success b/c these tournaments make up the core of basically any npte elims team's schedule.
__________________
"i was talking to cee-lo backstage, and i asked him "when you were growing up in atlanta, did you encounter any racism?" and he said something really interesting. he said, "i'm kanye west"
--sarah silverman
Reply With Quote
  #8  
Old 02-05-10, 09:07 PM
dseltzer's Avatar
dseltzer dseltzer is offline
Registered User
 
Join Date: January 4th, 2006
Location: Carbondale, IL
Posts: 1,702
There was some discussion a couple of years ago about correlations between success at particular tournaments and placements at NPTE. Oddly enough, I think I remember something about UPS and Point Loma seeming to be predictive??? (That combination just seems so odd, but I really think someone looked at it in 2006 or so.)
__________________
Debbie
(lover of John Dewey's educational theory and a good debate, for all the same reasons!)
Reply With Quote
  #9  
Old 02-06-10, 08:05 PM
JWill's Avatar
JWill JWill is offline
Registered User
 
Join Date: April 6th, 2009
Location: long beach
Posts: 94
Send a message via AIM to JWill
i think elim losses should be evaluated based on ballots

i.e. if 2 teams are getting to the same elim rounds but one team is consistently dropping on 2-1's and the other is dropping on 3-0's

and that raises another issue: if they lose to a "good" team (if that's weighted into it), ballots should count; i.e. losing on a 2-1 to a good team should still show some kind of success because obviously it wasn't a crush

i think that should probably mean something . . .especially at a tournament with split panels


obviously figuring all this out would be pretty ridiculous though . . .
__________________
long beach jenkins/williamson
Reply With Quote
  #10  
Old 02-06-10, 11:40 PM
aec's Avatar
aec aec is offline
Registered User
 
Join Date: August 28th, 2004
Posts: 1,294
You'd probably be better off spending the time researching or practicing. Not trying to discourage quantitative meta-research, which would be incredibly interesting and potentially helpful, just saying.
Reply With Quote
  #11  
Old 02-06-10, 11:51 PM
RSwanson RSwanson is offline
Registered User
 
Join Date: August 4th, 2008
Location: Chicago, IL
Posts: 169
Quote:
Originally Posted by aec View Post
You'd probably be better off spending the time researching or practicing. Not trying to discourage quantitative meta-research, which would be incredibly interesting and potentially helpful, just saying.
This is in no way an attempt to make myself do better at the NPTE... just some research I'm doing.
__________________
Rob Swanson
RIP Cheetahswan
Reply With Quote
  #12  
Old 02-18-10, 04:21 PM
RSwanson RSwanson is offline
Registered User
 
Join Date: August 4th, 2008
Location: Chicago, IL
Posts: 169
so i've been thinking about this, and there's likely to be a TON of multicollinearity with a lot of these variables (total speaks and high-low speaks are probably the most obvious example, although i might run two different regressions, one with each of these two...). this means that it will probably be difficult to say for a lot of the variables which individual ones are best explaining how well teams do. any ideas for solving this?

this also means that i need lots of data to help distinguish the variables with multicollinearity. so, if your name is not joe gantt or or konrad hack (who have both very generously volunteered data) and you have tournament data, then I would be greatly appreciative if you could contact me (rswanson@ups.edu)
__________________
Rob Swanson
RIP Cheetahswan
Reply With Quote
  #13  
Old 02-24-10, 11:10 PM
RSwanson RSwanson is offline
Registered User
 
Join Date: August 4th, 2008
Location: Chicago, IL
Posts: 169
Can some knowledgeable person define z-score and judge variance for me and explain the difference?

thanks again for all the help,
rob
__________________
Rob Swanson
RIP Cheetahswan
Reply With Quote
  #14  
Old 02-24-10, 11:29 PM
richmindseed richmindseed is offline
Registered User
 
Join Date: January 15th, 2007
Location: Berkeley, CA
Posts: 176
Send a message via AIM to richmindseed
Quote:
Originally Posted by RSwanson View Post
Can some knowledgeable person define z-score and judge variance for me and explain the difference?

thanks again for all the help,
rob
Rob,

I'm not the kind of person you reference in your post, but I can try!

JVAR is, according to http://www.net-benefits.net/showpost...7&postcount=40,

JVAR = (((Assigned Score – Judge’s Mean)/Judge’s STD) * Pop STD) + Pop Mean

The metric I'm calling Z-score is simply

If
I is an index variable used for judges

and SCORE_I = (Assigned_Score - I_Mean)/I_STD

and J ={I | I judged competitor X}

Then

the Z-Score for X is SUM(SCORE_I) over J.

In other words, compute the Z-score for each person relative to their judge, and then sum over all judges the person has. This metric lacks the adjustment for population mean that JVAR does, which is the key differentiating factor. I could be wrong about this in terms of the labeling used by actual tab software (which is probably a Derek/Joe question)...calculating this from the raw data is reasonably trivial, however, so you should be okay.
__________________
Ankur Mandhania
Berkeley '10
Reply With Quote
  #15  
Old 02-25-10, 08:50 AM
syphos's Avatar
syphos syphos is offline
The Pro from Dover
 
Join Date: April 28th, 2003
Location: Binghamton, NY
Posts: 4,711
Send a message via AIM to syphos
Quote:
Originally Posted by RSwanson View Post
Can some knowledgeable person define z-score and judge variance for me and explain the difference?

thanks again for all the help,
rob
Some quick thoughts:

What's your unit of analysis? What's your dependent variable? It's difficult to suggest exact covariates if I don't know what you are doing. If the unit of observation is the team, then I think you could pilfer much of the data from the NPTE website and reshape it as necessary. My first stab at this would probably be to take placement at NPTE as the dependent variable (1-whatever place it is now), and just run a simple linear regression on it. While the data is ordinal, the dependent variable has a pretty large range that would make ordinal probit/logit useful for interpretation - also, adopting linear models seems to be a norm (at least in international relations) when the ordinal values reach eight or more categories. While it means you are violating a bunch of fun assumptions, get out of bound predictions, and it may not be BLUE, it probably still is (by mere assertion). However, make sure you follow the guidelines of whatever your professor would want you to do.

If you only do the elimination rounds, you lose a lot of information as your data is either right censored (cannot place lower than an 16) or you artificially truncate the sample and ensure that it is biased. A full listing of placing would be useful. You can also expand the sample by including all teams in the NPTE database and list any non-NPTE team as the lowest rank of the NPTE tournament. This does create some cross-observation correlation that violates a few assumptions of our standard regression models, but omitting those observations is usually a bigger sin than transforming the data. If you are particularly unhappy about having a missing observation for those cases, you could always look into multiple imputations as a way to create values for that dependent variable (basically, predicted performance), but that could be beyond what you want to do (however, it is the win for missing data).

The more standard way to control for a biased sample is a Heckman selection model. This is probably the method I would employ if I was using the 1-x rank value for the dependent variable. The first stage of the equation tests NPTE attendance (a binary dependent variable that employs probit) and the second stage tests performance at the NPTE. If you can, you want the first and second stage covariate list to be different as it is easier for each equation to be identified and, perhaps, your theory can inform the models as to what variables might influence attendance but not performance. For example, if you used google maps to find the travel distance between the NPTE tournament and the attending team's school, you would include that in the first stage, but not the second.

On multicolinearity: a variable and its modified variable may not always introduce the colinearity that is bad for a regression, and you still may be fine (for example, it is normal in the IR literature to use economic indicators as well as change in those economic indicators - and often the correlation between the two is low - we even include linear and squared terms that are highly colinear). However, if they are significantly colinear, or multicolinear, you maybe be able to drop some of those based on theory (though it sounds like you are doing this for economics, and thus, theory based empirical research is less the reason to eliminate/include variables).

I would probably include a set of 3-5 binary variables for regional scores (the base region for the excluded variable could probably be the largest one - Southwest). You will likely have to input the data, but it shouldn't take too long (though, if you are using excel instead of a command base stats package, this may be more time consuming). Likewise, once you have coded the regional variable, you can make a single binary variable if the tournament was held in that region - possibly to measure a home field advantage due to the judging pool. This binary variable serves more as an interaction term with the regional scores, but could be of interest.

If you are using multiple NPTEs over the tournament's history, then a count variable for the size of the tournament can be informative (assuming that the dependent variable is an ordinal measure of placement at the npte). Obviously, if you attended the npte in 2004, you cannot place 47th, and that would inflate your placement. Or, you could just have a control for the year it had taken place (this could be just a progressive count from the first year you observe in the data).

Also, include other relevant school specific factors: school size, private/public, 4 year or 2 year (if there is enough variation), and school endowment are all probably related to either the ability to travel or the ability to perform.
__________________
Dr. Cox: Lady, people aren't chocolates. Do you know what they are mostly? Bastards. Bastard-coated bastards with bastard fillings. But I don't find them half as annoying as I find naive bubble-headed optimists who walk around vomiting sunshine.
Reply With Quote
  #16  
Old 02-25-10, 02:20 PM
topspin8257's Avatar
topspin8257 topspin8257 is offline
Boyer-esque
 
Join Date: October 19th, 2005
Location: Easton, PA
Posts: 38
Send a message via AIM to topspin8257
Rob,

I've been engaging in a similar analysis of results from the 2009 NFA National Tournament's L.D. results. I bring this up not because I'm trying to compete, but to point out that those results are published online in PDF format and they might be a good set of sample data to help prepare and test your programming / statistics / etc. before you apply it directly to the data you're analyzing.

I'll forward that data to the address above.
__________________
Joseph Dudek
University of the Pacific Debate

"What we have to do... is to find a way to celebrate our diversity and debate our differences without fracturing our communities."
- Hillary Rodham Clinton
Reply With Quote
  #17  
Old 02-25-10, 04:39 PM
RSwanson RSwanson is offline
Registered User
 
Join Date: August 4th, 2008
Location: Chicago, IL
Posts: 169
Thanks Joe!

Michael: I have yet to read over your post, but I'm sure there is some incredibly useful stuff, so I thank you in advance as well.
__________________
Rob Swanson
RIP Cheetahswan
Reply With Quote
  #18  
Old 02-25-10, 04:54 PM
UberNovi UberNovi is offline
Registered User
 
Join Date: December 22nd, 2004
Posts: 147
Quote:
Originally Posted by syphos View Post
Some quick thoughts:

However, if they are significantly colinear, or multicolinear, you maybe be able to drop some of those based on theory (though it sounds like you are doing this for economics, and thus, theory based empirical research is less the reason to eliminate/include variables).
Not trying to distract from the discussion, but can you explain the last part of that sentence?
Reply With Quote
  #19  
Old 02-26-10, 08:42 AM
syphos's Avatar
syphos syphos is offline
The Pro from Dover
 
Join Date: April 28th, 2003
Location: Binghamton, NY
Posts: 4,711
Send a message via AIM to syphos
Quote:
Originally Posted by UberNovi View Post
Not trying to distract from the discussion, but can you explain the last part of that sentence?
There are plenty of quantitative economics studies (and research methodologies) that are more projects in data mining and finding a good predictive model (perhaps, like the project here, but the research design is not entirely clear to me yet) rather than finding a model that is seeking explanatory causes (what I am used to dealing with). The inclusion of atheoretical variables to see what sticks in the model is more likely to induce colinearity, while a theory based approach may eliminate some of those variables due to spurious relationships - some relationships are caused by antecedent factors and ought not be included.

However, beyond the seriousness direction of the point, the broad generalization was partly in jest and was inherently wrong as most generalizations are. It's more about what you are trying to do with your empirical models and what your audience finds acceptable.
__________________
Dr. Cox: Lady, people aren't chocolates. Do you know what they are mostly? Bastards. Bastard-coated bastards with bastard fillings. But I don't find them half as annoying as I find naive bubble-headed optimists who walk around vomiting sunshine.
Reply With Quote
  #20  
Old 02-26-10, 11:17 AM
UberNovi UberNovi is offline
Registered User
 
Join Date: December 22nd, 2004
Posts: 147
I suppose I could see your point, it just runs contrary to my experience/instruction with econometrics. Speaking in terms of OLS/Empirical studies, it was taught to me that theoretical analysis should guide the selection and analysis of explanatory variables.

Collinearity in my experience is frowned upon even if it improves the overall model fit.

Atheoretical variables should be frowned upon, such data mining projects haven't been my personal experience.

Could just be my experience. I'm not intimately familiar with political quantitative research (I'm pretty sure that's what you said your background was), so the differences between econometrics and political research could be lost on me.

On the direct topic - I think the dependent variable could easily be defined by number of ballots won at the npte. I think this helps with some of the measurement problems with defining "winning" that have been discussed.
Reply With Quote
  #21  
Old 02-26-10, 01:22 PM
syphos's Avatar
syphos syphos is offline
The Pro from Dover
 
Join Date: April 28th, 2003
Location: Binghamton, NY
Posts: 4,711
Send a message via AIM to syphos
Quote:
Originally Posted by UberNovi View Post
I suppose I could see your point, it just runs contrary to my experience/instruction with econometrics. Speaking in terms of OLS/Empirical studies, it was taught to me that theoretical analysis should guide the selection and analysis of explanatory variables.

Collinearity in my experience is frowned upon even if it improves the overall model fit.

Atheoretical variables should be frowned upon, such data mining projects haven't been my personal experience.

Could just be my experience. I'm not intimately familiar with political quantitative research (I'm pretty sure that's what you said your background was), so the differences between econometrics and political research could be lost on me.
The partial jab at economics is not a jab at econometrics nor econometricians. Econometrics are tools employed by a variety of fields with a diverse idea of proper application - you can read the same logit regression applied in economics, psychology, political science, or bio-sciences with a different interpretation of acceptable bias, proper refinement techniques, post-estimation analysis or sampling procedures. Much of the acceptable practice in medical journals would be reject in first round reviews in an economics or political science journal (partly due to how samples are generated, we have to be more tricky with what we do).

Generally, the school of thought I was raised in suggests that theory should generally guide our variable selection when possible - so I would agree with that sentiment expressed above. However, for a predictive model of performance, causality may not be as important - it really depends on the confines of what you want to do (or what is demanded by the assignment). For example, if you include regular seasons wins as a predictor of success of the NPTE, it is not a causal variable (does winning itself generate more wins, not directly), but perhaps it is a proxy for team strength. Generally, the variables you really want are not something you can directly measure (research hours by team, number of practices, team size, effective coaching staff, time spent on net-benefits, etc.) Yet, ideally, we want to avoid complete correlations that have little to do with the actual events we are measuring, as they will eventually be wrong, even if they are generally right:

Redskins performance predict presidential elections

Height and Name length predict presidential elections


Perhaps there is a strong correlation between success at the NPTE and who wears the pinkest shirt or top.
__________________
Dr. Cox: Lady, people aren't chocolates. Do you know what they are mostly? Bastards. Bastard-coated bastards with bastard fillings. But I don't find them half as annoying as I find naive bubble-headed optimists who walk around vomiting sunshine.
Reply With Quote
  #22  
Old 04-16-10, 02:35 PM
RSwanson RSwanson is offline
Registered User
 
Join Date: August 4th, 2008
Location: Chicago, IL
Posts: 169
Time to resurrect this thread. My project has changed substantially since the beginning of this thread, as I just didn't have the right data to test what I was initially saying I was looking at. But I think I have an equally (if not more interesting) project now. I just finalized my results last night, and the paper will be coming in a little under a month, so I'll talk about my results then.

I ended up looking at differences between MPJ and strikes tournaments in terms of which prelim statistics best predict elim success. The prelim stats I looked at were wins, prelim seed, z-score, total speaks, H/L speaks, and opp wins. I compared regressions for MPJ tournaments and strikes tournaments and came up with some interesting results (yes, they are different).

I figure I'd let people know what my project is now in case they have anything to say about it. I'm also curious if anyone wants to take a stab at predicting what my results were

Hope everyone is having a good post-nationals life,
Rob
__________________
Rob Swanson
RIP Cheetahswan
Reply With Quote
  #23  
Old 04-17-10, 10:53 AM
APotter's Avatar
APotter APotter is offline
Registered User
 
Join Date: April 20th, 2009
Posts: 69
Sorry if Dingess and I screwed them up at all.
__________________
William Jewell BP (Brooks/Potter)

The DP will rise again!

Beaks Up!
Reply With Quote
  #24  
Old 04-17-10, 01:47 PM
RSwanson RSwanson is offline
Registered User
 
Join Date: August 4th, 2008
Location: Chicago, IL
Posts: 169
Quote:
Originally Posted by APotter View Post
Sorry if Dingess and I screwed them up at all.
Haha if this is a reference to you guys rocking it at this year's NPTE, then the answer is no, not at all. Mainly because this year's NPTE isn't included in the data set. But one observation (aka one team at one tournament) is hardly enough to bias the entire result (except for maybe if the very bottom breaking seed at the NPDA won the tournament) since I had 280 points for strikes tournaments and 196 for MPJ tournaments. Non-breaking teams were also not included in my data set, which contained 15 different tournaments.
__________________
Rob Swanson
RIP Cheetahswan
Reply With Quote
  #25  
Old 04-17-10, 01:58 PM
SoCalian's Avatar
SoCalian SoCalian is offline
Registered User
 
Join Date: December 4th, 2006
Location: San Diego, CA
Posts: 863
Send a message via AIM to SoCalian Send a message via Yahoo to SoCalian
did you look at (or consider looking at) tournaments that use neither MPJ or strikes?
__________________
Ian Sharples
Point Loma's Team Wildcard
Reply With Quote
  #26  
Old 04-17-10, 05:32 PM
RSwanson RSwanson is offline
Registered User
 
Join Date: August 4th, 2008
Location: Chicago, IL
Posts: 169
Quote:
Originally Posted by SoCalian View Post
did you look at (or consider looking at) tournaments that use neither MPJ or strikes?
No, unfortunately I did not have the data to do that. That would, however, be an interesting subject for future research.
__________________
Rob Swanson
RIP Cheetahswan
Reply With Quote
  #27  
Old 04-19-10, 11:45 AM
syphos's Avatar
syphos syphos is offline
The Pro from Dover
 
Join Date: April 28th, 2003
Location: Binghamton, NY
Posts: 4,711
Send a message via AIM to syphos
Quote:
Originally Posted by RSwanson View Post
Time to resurrect this thread. My project has changed substantially since the beginning of this thread, as I just didn't have the right data to test what I was initially saying I was looking at. But I think I have an equally (if not more interesting) project now. I just finalized my results last night, and the paper will be coming in a little under a month, so I'll talk about my results then.

I ended up looking at differences between MPJ and strikes tournaments in terms of which prelim statistics best predict elim success. The prelim stats I looked at were wins, prelim seed, z-score, total speaks, H/L speaks, and opp wins. I compared regressions for MPJ tournaments and strikes tournaments and came up with some interesting results (yes, they are different).

I figure I'd let people know what my project is now in case they have anything to say about it. I'm also curious if anyone wants to take a stab at predicting what my results were

Hope everyone is having a good post-nationals life,
Rob
I would be interested in the paper/data when you are done.

I am guessing that by comparing the two tournaments, you initially ran a pooled sample estimation followed by two identical regressions for each sub-sample?
__________________
Dr. Cox: Lady, people aren't chocolates. Do you know what they are mostly? Bastards. Bastard-coated bastards with bastard fillings. But I don't find them half as annoying as I find naive bubble-headed optimists who walk around vomiting sunshine.
Reply With Quote
  #28  
Old 04-19-10, 10:26 PM
RSwanson RSwanson is offline
Registered User
 
Join Date: August 4th, 2008
Location: Chicago, IL
Posts: 169
Quote:
Originally Posted by syphos View Post
I would be interested in the paper/data when you are done.

I am guessing that by comparing the two tournaments, you initially ran a pooled sample estimation followed by two identical regressions for each sub-sample?
I first ran a pooled sample estimation, but I haven't included those results in my analysis. I compared the results from each sample directly. What additional info do you think the pooled estimate would provide?
__________________
Rob Swanson
RIP Cheetahswan
Reply With Quote
  #29  
Old 04-19-10, 10:31 PM
DEADMONEY DEADMONEY is offline
Moderator
 
Join Date: August 13th, 2004
Posts: 439
^----------------Nerd
Reply With Quote
  #30  
Old 04-19-10, 11:02 PM
syphos's Avatar
syphos syphos is offline
The Pro from Dover
 
Join Date: April 28th, 2003
Location: Binghamton, NY
Posts: 4,711
Send a message via AIM to syphos
Quote:
Originally Posted by RSwanson View Post
I first ran a pooled sample estimation, but I haven't included those results in my analysis. I compared the results from each sample directly. What additional info do you think the pooled estimate would provide?
It depends on what your research question is. If your question is merely "what determines outround success" then I think a pooled sample is warranted as you are looking for general correlates in the population of outrounds (give your sample). However, if you just want to compare the two judging systems, then it may be less relevant.

The pooled sample is often used to see if the covariates perform well across the board in a statistically discernible manner. Generally, since you constrain the variance of the subgroups to be identical while also averaging out any of the covariates, you provide a stricter test for the hypotheses you are testing. Perhaps opp wins will always do well (strength of schedule shows promise for out round performance no matter what the format is) and that is pretty useful to know. Our regressions are looking at mean behavior and a mean of means for the regressors can inform us as to what are generally useful predictors of success without having to engage in a mpj v. strikes analysis. Also, the pool might have a better grasp at the covariates depending on the regression technique and the sample size of each subgroup - if either sample is too small (relative to the regression type), your standard errors may be growing too large to make the analysis useful.

Running the same model in two samples tells you that the two samples may be different, but you are harder pressed to treat them as proper experiments and discern causation given the selection bias and non-randomness of it all. The change in correlation across the two models are statistically hard to compare if they are not derived from the same sample and analysis risks some inferential fallacies that the statistical assumptions would not normally support. The pooled model allows you to identify variables that perform well across both samples. Generally, if something does not perform well, but it does in one particular disaggregation, you can infer some usefulness out of it.

For example, if the pooled sample and the MPJ sample provides statistically insignificant results for H/L speaks while the strikes model has significance for the same variable, I think you can devote at least a paragraph as to why that may be the case and is interesting. Deductively speaking, the argument would be straightforward, but the statistical evidence could be compelling and only really available with the pooled model.

Finally, if the pooled model provides scant information about the independent variables, but your sub-sample regressions provide strong results, that is telling as well. It could indicate that different processes are affecting these two samples and to pool them is incorrect (which we need the pooled results to know). The variables could have opposing directions and wash out significance in pooled models.

Do you have any reason to believe that the slopes for the covariates should be different for the mpj v. strike samples? If not, then you could just add a binary variable for mpj in the pooled equation as an expectation of the mean to change (teams do better/worse given mpj). However, if you think mpj or strikes mollify/enhance just 1-2 variables, then that would justify a pooled model with a binary variable for mpj as well as interactions terms for the affected slopes.
__________________
Dr. Cox: Lady, people aren't chocolates. Do you know what they are mostly? Bastards. Bastard-coated bastards with bastard fillings. But I don't find them half as annoying as I find naive bubble-headed optimists who walk around vomiting sunshine.
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT -6. The time now is 07:39 PM.


Powered by vBulletin® Version 3.6.7
Copyright ©2000 - 2010, Jelsoft Enterprises Ltd.
Copyright Net-Benefits 2001-2003