Page 1 of 10

Request for Transparency

PostPosted: April 30th, 2015, 8:55 am
by -BY
Another LDC ended and the drama knocks, pleading people to finally let it in.
Going by the chat discussions and very soon, also the topic responses, there really is something off with the current judging system.
I'm myself requesting a change, people actually might be able to agree on.

It's in fact pretty simple. When the judges are done with their work, they should contact each other and spend a bit of time on discussing their judgings.
Doing everything anonymous while planting bombs, waiting for the results to explode are definitely not the way to go. Never was, never are and never will be.
I'm not talking about everyone going for uniform scores. But really extreme differences between the scores shall get evened out. There has been no single moment in the entire LDC history, in which these differences have been justified. You may call it subjective. But it's in fact people blindly hating, for whatever reasons. While we're at it, you might also compare all levels to each other in that final discussion, wondering if you actually find out that specific levels should get a bit of a lower/higher score in general. These serious discussions should also help everyone reflecting on your own thoughts.
Maybe you will find out that you have been biased without proper noticing it in the end.

So yeah. I'm at a point I cannot keep watching this anymore. Let's discuss about this in all seriousness and let's see if we can implement this in the future LDC's.

Re: Request for Transparency

PostPosted: April 30th, 2015, 9:10 am
by Venexis
Just said this in IRC, but as a backup, I would love to see this become the norm.

My biggest worry was of not being consistent- I could forgive myself even if nobody else did if I had rated every single level, but that wasn't the case, and I've been known to be a bit more strict than the average in the past. The most stressful part was that one of my scores would fall well outside the "acceptable range" as defined by full judges, and... it kinda happened. I rated MK's level significantly harsher than any other judge, and MoD's/SK's half a point better than anyone else.

I guess when you break it down those gaps aren't that significant compared to levels like CY's, who was affected by nearly 1.5 points, but... it'd definitely be cool to communicate freely with other judges, if not participants (I think there are merits for this too, did I totally miss the point of an abstract level like MKs? Never gonna know for sure unless I talk to the creator), as I at least encountered a significant amount of difficulty trying to not unintentionally ♥♥♥♥ up a score by rating as I typically would as a full judge.

I feel like my scores overall were much higher than normal to attempt to passively compensate for that, by 1-2.5 points depending on the case.

Re: Request for Transparency

PostPosted: April 30th, 2015, 10:01 am
by *Emelia K. Fletcher
yeah, a moderation session between judges would be nice for getting a general idea of things

although i still stand by the fact that playing a level is about the experience a player (and THE player) will have, and not the experience a creator or anyone else wanted them to have unless the idea hinges on it (which technically was the case for my level but hell did i communicate that poorly)


i mean look at the ♥♥♥♥ disparity between mine and anyone else's rating

Re: Request for Transparency

PostPosted: April 30th, 2015, 10:52 am
by MessengerOfDreams
https://docs.google.com/spreadsheets/d/ ... sp=sharing

Just some data before we go into accusations:

The placements for each judge show that they each have some levels, around a small handful, that they differed from everyone else on, some moreso than others. Also, nearly everyone had one level that they scored way higher or way lower than average, by anywhere between 2 and 6 points.

However, from what I've found, the average placements almost dead-on reflect the average score, if both were ranked by those averages. If we ranked in a WITBLOAT style, where we ranked from 1 to 18, the only differences would be that Doram and BY switch and some mild shifting in the lower ranks.

I don't think there's negative, biased trends there. That seems to be derived entirely from low personal opinion from some spectators to some judges. As it stands, the patterns are consistent but affect little.

Later, I'll collect data on how scores would have changed under the old Supershroom system if everyone's highest and lowest scores had been removed, to see if it truly can bring equity with 6 judgings.

Re: Request for Transparency

PostPosted: April 30th, 2015, 11:35 am
by Karyete
Wait, does that graph tell me that MK gave me the highest score of all his judgings? Fo reals?

Re: Request for Transparency

PostPosted: April 30th, 2015, 11:41 am
by *Emelia K. Fletcher
Karyete wrote:Wait, does that graph tell me that MK gave me the highest score of all his judgings? Fo reals?

<Chaukai> But for the most part your score trends similar to the average
<Chaukai> Except your karyete score
<EmeliaK> because karyete made a goddamn good level that was fun to play
<EmeliaK> it was probably the most fun i had out of the entire ldc

Re: Request for Transparency

PostPosted: April 30th, 2015, 11:49 am
by Karyete
now I feel all warm and fuzzy

Re: Request for Transparency

PostPosted: April 30th, 2015, 12:38 pm
by Venexis
MessengerOfDreams wrote:I don't think there's negative, biased trends there. That seems to be derived entirely from low personal opinion from some spectators to some judges. As it stands, the patterns are consistent but affect little.


To be perfectly clear, I agree with this. I don't really think this is necessary, but it sure would be incredibly nice to have, especially as a resource for backup judges.

Like, usually my "average" score is in the 10-12 range as opposed to the commonly seen 13-15 range... that could definitely be interpreted as bias, even if it is unintentional, as those harsher ratings would only apply to the few levels additional judges are needed for. A group communication session would go a long way toward detecting and preventing similar issues that otherwise would not have been evident until after the contest's results post.

Re: Request for Transparency

PostPosted: April 30th, 2015, 12:51 pm
by MessengerOfDreams
Well, judges having higher standards or higher praise doesn't really affect overall standards if that's just how they are unless, like you, they're alternate judges judging only a small amount of levels, or one is simply worried about overall LDC standings comparable through history

Also, I updated my chart significantly. The first is taking the "let's cut off the worst and best judgings" verdict from way back to show it makes nearly zilch difference. The next is just laying out exactly which order each judge took. Some are weird orders, but overall there are just a few special cases in a uniform order.

Re: Request for Transparency

PostPosted: April 30th, 2015, 8:03 pm
by Harmless
If we're suddenly going for 'let's have all the judges rate a level evenly' then we might as well pick only one judge per LDC from now on.

The variety of judges and their outlooks/expectations are the reason why I took up participating in LDC's. So I can hear multiple viewpoints. Yurimaster may have thought Supershroom's level was the best goddamn thing on the Earth, but EmKay clearly said otherwise that it felt untested and very frustrating.

Though a meeting between the judges and encouraging the judges to talk with each other is a good idea. I did discuss some of this with MoD during the LDC, and having judges combine their thoughts would probably lead to better judgings. It would be ridiculous to ask for all the judgings to be roughly the same though.