Page 3 of 7

Re: Revolution on SM63 judgings (?)

PostPosted: April 22nd, 2016, 6:48 pm
by NanTheDark
In chat I just described what I refered to as a "Mario Kart judging system". I think I could... elaborate on that. While I do think this could be a valid system, I don't think people will like it very much, but eh.

I called it a Mario Kart judging system because it kind of takes inspiration from how scores work in those game's Grand Prix modes.

The idea is that each judge takes the entries, and organizes them depending on how they think the placings for the contest should be, from 1st to whichever the last place is. Depending on the placing, each participant gets a certain amount of points. Then those points are added with the ones from other judges, and whoever gets the most points overall wins.

The amount of points given could be equal to n-(p-1), with n being the number of participants and p being their placing. However we could also go full Mario Party and actually assign a proper point value for each place, like 15 points for 1st place, 12 for 2nd, etc. (That might actually be better)

To better illustrate it, let's have an example. Let's say the following contestants join:
Hyperfungi
AngelOfNightmares
~MIDI Minimizer~
Astro lord
BurningIce
Treemaster
Applejuice

We have Judge A, B, and C. For the sake of this example I'll use the first point value method I described. Judge A judges like this:
1st ~MIDI Minimizer~ (+7)
2nd AngelOfNightmares (+6)
3rd Treemaster (+5)
4th Astro lord (+4)
5th Hyperfungi (+3)
6th BurningIce (+2)
7th Applejuice (+1)

Judge B judges like this:
1st Hyperfungi (+7)
2nd Treemaster (+6)
3rd BurningIce (+5)
4th Astro lord (+4)
5th AngelOfNightmares (+3)
6th ~MIDI Minimizer~ (+2)
7th Applejuice (+1)

And Judge C judges like this:
1st AngelOfNightmares (+7)
2nd Hyperfungi (+6)
3rd ~MIDI Minimizer~ (+5)
4th Treemaster (+4)
5th Astro lord (+3)
6th BurningIce (+2)
7th Applejuice (+1)

Adding up, the final result is this:
1st AngelOfNightmares (6+3+7=16)
1st Hyperfungi (3+7+6=16)
2nd Treemaster (5+6+4=15)
3rd ~MIDI Minimizer~ (7+2+5=14)
4th Astro lord (4+4+3=11)
5th BurningIce (2+5+2=9)
6th Applejuice (1+1+1=3)

Well in this particular case there was a tie, but yeah. That's how it would work, more or less.

Re: Revolution on SM63 judgings (?)

PostPosted: April 23rd, 2016, 6:59 am
by ~Yuri
It wouldn't quite work because the very judgings can have ties. And that would mean you are literally putting somebody above another, and maybe that's how competitons work, but not how judgings on art and other things do. In a competition, you necessarilly need to be above the other; in judgings on art works and this kind, you don't.

Re: Revolution on SM63 judgings (?)

PostPosted: April 23rd, 2016, 7:19 am
by nin10mode
I'm kinda against having story and music being diminished down to a point each. I'm with Moy as far as judicial freedom goes, so if one person considers one aspect as a major aspect or even detriment, it should show in the score. Besides, forcing them into just one or two points is essentially making separate categories for them, except the categories are miniscule and serve pretty much what Other/Misc did in the current system.

Re: Revolution on SM63 judgings (?)

PostPosted: April 23rd, 2016, 7:27 am
by MessengerOfDreams
Also, the difference between silver and gold can be rather arbitrary anyways and, in the grand legacy of the levels, not hold it back. WITBLOs can still have massive lead changes with runner up or bronze or even ribbon levels passing each other in the lists, and I gather that many non winning levels can rise to prominent fame. Only in LDCs does placement one over the other matter to such an extent.

Re: Revolution on SM63 judgings (?)

PostPosted: April 23rd, 2016, 3:45 pm
by Oranjui
Okay, apparently I completely missed a few aspects of the first post when I first read it.

While the idea is good in theory--standardizing score formatting, reducing bias, preventing large discrepancies--I don't think it would work well in practice because it doesn't really allow for any freedom in judging style. Different aspects of gameplay and atmosphere have different levels of importance to each individual judge. Subcategories designed by any specific person just force each aspect to be weighted in a way that might not accurately reflect a judge's feelings about a level holistically. And that even applies to the 12-8 split between gameplay/fun and scenery/atmosphere. I'm basically restating what Nin and Moy have said at this point.

Also, the system as it currently stands is intentionally designed to make it difficult to get near-perfect scores. I can give a 10/10 in fun, 5/5 in scenery, but a 3/5 in other because nothing else stood out: an 18/20 overall. With the proposed system, I might still want to give both gameplay and atmosphere perfect scores (12/12 and 8/8, or 10/10 and 10/10), but I'd be forced to deduct from one category or another in order to express that I didn't think it was necessarily perfect, even if I didn't think it was specifically a fun/gameplay-related issue or an aesthetic/atmosphere-related issue, and I might reluctantly just give them a 20/20 because it doesn't feel right to take away points which I think they legitimately earned otherwise. That is to say, a 2-category system wouldn't accurately reflect every judge's sentiments and could easily cause score inflation (or deflation, even), which seems to be the same fear Moy has.

Also, as a final note, I'd prefer it if you could add other/open-ended options in your poll that don't assume this is automatically going to go through using only the changes you want to make; I'm fine with updating a few things with the judging format (maybe renaming Graphics to Atmosphere if enough other people want to), but I'm not going to vote if the only choices I have are two options, neither of which apply to me at all.

Overall, I feel like most of the changes proposed are just in general not very necessary to make, and they put too many limits on judicial freedom. I like our 10-5-5 system.

Re: Revolution on SM63 judgings (?)

PostPosted: April 24th, 2016, 5:46 am
by nin10mode
I think those last few points were edited in. Thanks for mentioning that though, I almost didn't see.

The major change is shifting to 2 categories, instead of 2 categories and a third which is what basically amounts to filler and brownie points.
The standardized amounts for lag, music etc. are what I assume everyone has a problem with, which I see and agree with. Without them, I think the 10:10 12:8 system is fine, though.

So Shroom, I'd recommend resetting the poll with new options.

Re: Revolution on SM63 judgings (?)

PostPosted: April 24th, 2016, 7:51 am
by Supershroom
Hold on, I need to catch up first ...

MessengerOfDreams wrote:I'd say if it was 10/5/5 before make it 10/10 if you want to essentially combine graphics and details together.

Well, details split up into gameplay-wise stuff (bugs, loading times, innovation, effort) and atmospheric-wise stuff, so I suggested putting the first half to Gameplay and the second half to Atmosphere, thus also more targeting towards 12/8 than 10/10. Unless you also shift bugs, loading times and innovation to the second, non-Gameplay category, but that's not intuitive. First, it's moving away from that stupid additive system of Other, towards the way Doram handled it in the ROLDCs. Next, it's clearly assigning each thing into the category where it intuitively belongs to, so there are no doublings.

Alt music and plot should definitely stay a part of judgings. We actually also never complained about this before, or have we?

Venexis wrote:My beef with this is that Youtube is not a part of the designer. We have level designer contests, not "best scavenge suitable resources from all corners of the internet" contests.

I disagree here - we are level designing and hold Level Designing contests, not Level Designer contests. Actually, the engine isn't as advanced as SMBX's so you can dump in whatever song you like and it directly plays in the level, and so we must create an "imaginary environment" by having the track running in a background tab while playing, to get around this deficit. If you say "we didn't create the theme", you can also say that we didn't create the tiles and items and the entire program either. Similar stuff goes for story / plot. Some people (not only MoD) have executed this way to design to the best, and considering Story / music less would give them an unfair disadvantage. Outside resources have an influence, but it's surely not too big. We even have a library to help people finding alt music, and I actually also didn't handle it in a way of always removing points for using in-game music - if it fits well, you can also get a few kudos for it, but also be aware that it's not as original as e.g. using an SU track.

Slight discrepancies between silver and gold will happen from time to time. You can always easily look for another "reason" why the runner-up didn't make it in the end than having a worse music choice than the winner. Sometimes small, subtle and subjective trifles make a difference ...

In the end, we would all use either in-game tracks or pure silence while playing. Yaaaaaaaawn ...

Banners, however, have never been considered for judging since they not only require sources that are not available for everyone, but they also aren't really a part of playing a level itself, you mostly look at it before and afterwards. They're still neat to have if you do one, though.

Venexis wrote:It's not unreasonable to say that 18 is the maximum obtainable score under the current system, [...] therefore, it's also not unreasonable to assume that under this proposed system, 20s would become much more common

That is partially the aim for the future (of course with the implied hope that the levels getting a 19+ really deserve it). I've already described that as "cases where you wonder about how in the world you could ever get at least a 4/5 in Other from a judge", and that judge isn't necessarily a bad judge - in the recent 5th mini-LDC, I really wanted to give someone at least a 4 in Other, but I didn't feel able to do it without bringing unnecessary bias - the additive system ♥♥♥♥ it up, it ♥♥♥♥ up as well since stuff sometimes appears doubly, and that's the main reason why several people have said that Other never was reasonable for SM63, and why this thread exists. Getting close-to-perfect scores stays hard to do with the new system, but with the old one, it's literally impossible, by theory / definition. If you want to remove points from perfection, you usually need to explain stuff that could have been done better, but when not giving a 5 in Other, you DON'T have any possibility of that often enough.

As I've said, since dropping Other also removes a certain degree of transparency, we need certain guidelines. Yeah, they're called guidelines and not rules, but they're meant to say that e.g. a -3 for loading time / lags is too heavy and out of proportion. In overall experience, it has proven to be balanced IMO that story/plot and lag/loading times together make 10 or 15% of everything, Graphics make 25% and the actual gameplay (including bugs) makes 50%. There doesn't have to be the necessity of splitting the 2 or 3 points (right now, it looks like 3) for plot+music equally. You can just go there and say "we have 12/8 or 10/10 points for these categories, they're supposed to be filled with that and that". But to make this work without bias, you also need a host who is aware of proportions and probes into a scoring that doesn't seem to comply with its reasoning (such as 6/12 for "the entire level was brilliant but the heavy lag throughout the entire level really killed it").

At last, the "Mario Kart system" is NEVER a good idea for LDCs, not only for being prone to ties. Not having any categories at all and making simple scores out of 20 also isn't a good idea, since some clues and points of reference ARE necessary.

People tend to mention tiny flaws and problems about the new system, but the current one has even bigger flaws, I'm entitled to say that. In the end, there are only a few details needed to be decided, and there also seems to be a majority for "let judges use either 12/8 or 10/10 as they wish", so if some people really want the old other category to have the same weight, they can switch over to 10/10. Though I don't know for myself how that is supposed to work well ...

Re: Revolution on SM63 judgings (?)

PostPosted: April 24th, 2016, 10:20 am
by Oranjui
Supershroom wrote:Yeah, they're called guidelines and not rules, but they're meant to say that e.g. a -3 for loading time / lags is too heavy and out of proportion.
Do you have any specific documented instances where judges did this? I don't seem to recall any. If it's not a problem, then there's no reason to "fix" it.

Supershroom wrote:
Venexis wrote:It's not unreasonable to say that 18 is the maximum obtainable score under the current system, [...] therefore, it's also not unreasonable to assume that under this proposed system, 20s would become much more common

That is partially the aim for the future (of course with the implied hope that the levels getting a 19+ really deserve it). I've already described that as "cases where you wonder about how in the world you could ever get at least a 4/5 in Other from a judge", and that judge isn't necessarily a bad judge - in the recent 5th mini-LDC, I really wanted to give someone at least a 4 in Other, but I didn't feel able to do it without bringing unnecessary bias - the additive system ♥♥♥♥ it up, it ♥♥♥♥ up as well since stuff sometimes appears doubly, and that's the main reason why several people have said that Other never was reasonable for SM63, and why this thread exists. Getting close-to-perfect scores stays hard to do with the new system, but with the old one, it's literally impossible, by theory / definition. If you want to remove points from perfection, you usually need to explain stuff that could have been done better, but when not giving a 5 in Other, you DON'T have any possibility of that often enough.
This is just my own opinion, but I think that it should be near-impossible to get a perfect score (please don't say that it's impossible by definition, because it's not), because there's always room for improvement. I like having an Other category with a default 3/5 score, because it forces designers to go the extra mile to strive for perfection--whether that's superb music choice, a stunning banner, engaging sidequests and subplots, extremely innovative design, really smooth animations, interesting blosses, or anything else a judge finds noteworthy. I'm getting a little on the philosophical side of things, but I greatly prefer a system that encourages people to push their own personal boundaries (current 10/5/5 system), not a system that only asks people to get up to par with a judge's criteria for a good level (12/8 or 10/10 system).

Supershroom wrote:People tend to mention tiny flaws and problems about the new system, but the current one has even bigger flaws, I'm entitled to say that. In the end, there are only a few details needed to be decided, and there also seems to be a majority for "let judges use either 12/8 or 10/10 as they wish", so if some people really want the old other category to have the same weight, they can switch over to 10/10. Though I don't know for myself how that is supposed to work well ...
You're entitled to say that, but you also have completely silenced the apparently large group which doesn't feel comfortable with the options you've given in the poll at all, despite both Nin and I asking you to add options for us. I'm also entitled to say what you find to be "tiny flaws and problems about the new system", I find to be major flaws and problems, all of which far outweigh the possible benefits the proposal might have. There have definitely been some issues with our 10/5/5 system in the past, but we've been able to work them out easily and everyone's been pretty happy in the end for the most part, and it's not fair to assume that your system is objectively better and we're only nitpicking.

Re: Revolution on SM63 judgings (?)

PostPosted: April 24th, 2016, 11:03 am
by Venexis
Spoiler: show
[12:24 PM] Venexis: I still don't like the expectation of needing to do things like alternate music as a backup but I mean, as long as it's not really about getting the most accurate scores and instead just convenience then it doesn't matter
[12:26 PM] Nin-tan: I disagreed with the point values he gave to stuff but the two categories are [there should be an emote here]
[12:26 PM] Oranjui: I mean, plenty of people have pulled off high-scoring levels in the past without using alt music, right
[12:26 PM] Venexis: They have
[12:26 PM] Venexis: My issue is less about music and more about standardizing backup judge protocol
[12:27 PM] Oranjui: I think two categories might be neat to at least try out
[12:27 PM] Oranjui: but I still have a lot of concerns about it
[12:27 PM] Oranjui: and I don't like the predefined subcategories either
[12:27 PM] Nin-tan: The idea is that it was literally "+1 for music hurr" and with this people could use it as more than just a slider so judges have their personal placings right
[12:28 PM] Oranjui: maybe I should judge next LDC
[12:28 PM] Oranjui: and show yall your places in the world
[12:29 PM] Nin-tan: "Blah graphics and I think the music you chose fits amazingly well with the story you are going for as well 10/10"
[12:29 PM] Nin-tan: k oj
[12:30 PM] Venexis: I wouldn't mind removing backup judges entirely
[12:30 PM] Oranjui: that's -6 for saying that nin
[12:30 PM] Venexis: that sounds like a good thing
[12:30 PM] FrozenFire: You lost
[12:30 PM] Oranjui: In all seriousness I want to judge one eventually
[12:31 PM] Oranjui: may as well be this one
[12:31 PM] FrozenFire: Maybe you should make one first
[12:31 PM] FrozenFire: As in a level
[12:31 PM] FrozenFire: Make a level first
[12:31 PM] Oranjui: hey no
[12:31 PM] Oranjui: I entered both ROs and 25quared
[12:31 PM] FrozenFire: Not acceptable
[12:31 PM] Oranjui: except SM63 RO was super rushed
[12:31 PM] Oranjui: and so was the LL one but hey I did the thing
[12:32 PM] FrozenFire: Everything you do is unfinished/rushed :/
[12:32 PM] Nin-tan: I'm glad I'm discussing this without the attitude to handle jokes
[12:32 PM] Oranjui: I'd blame it on school but I probably just have commitment issues with things online
[12:33 PM] Nin-tan: What sounds good ven
[12:33 PM] Oranjui: anyway back on topic
[12:33 PM] Venexis: removing the idea of backups
[12:33 PM] Oranjui: removing backups and just getting 4 full judges for every ldc?
[12:33 PM] Venexis: it doesn't matter how someone rates as long as they're consistent on all entries
[12:33 PM] Venexis: backups do not do that
[12:33 PM] Nin-tan: Was there anything about that in a op
[12:33 PM] Oranjui: or just letting people just their own levels
[12:34 PM] Venexis: If I was a full judge instead of a backup I could not give music points or any other points to my heart's content
[12:34 PM] Venexis: and it wouldn't ♥♥♥♥ anyone over because I'm rating that way for every level
[12:35 PM] FrozenFire: Is deducting points because the alt music didn't fit acceptable?
[12:36 PM] Venexis: as long as you're consistent
[12:36 PM] Venexis: IMO
[12:37 PM] Venexis: as a backup you just basically toss any semblance of consistency right out the window
[12:37 PM] Venexis: but for some reason people think that the judging format is what causes the unfairness instead of the massive glaring issue
[12:41 PM] Oranjui: I would just feel uncomfortable judging my own entry
[12:42 PM] Oranjui: but that might only be a personal thing
[12:42 PM] Nin-tan: I can't judge my own so
[12:43 PM] FrozenFire: I'd probably give it a very low score(edited)
[12:44 PM] Nin-tan: The idea of having judges able to enter is sketchy in RL events
[12:44 PM] Venexis: yup
[12:44 PM] Nin-tan: I realize its a game but
[12:44 PM] Venexis: It's not a game
[12:45 PM] Venexis: No offense but have you seen how worked up people get over these things
[12:45 PM] Venexis: they're ♥♥♥♥ numbers
[12:45 PM] Nin-tan: its a game as much as I think smash is a game
[12:45 PM] Nin-tan: Does that clear up my stance
[12:45 PM] Venexis: I've said it the whole time, that the issue hasn't ever been with the scoring format or whatever
[12:45 PM] Venexis: it's with our ass-backward practices
[12:46 PM] Nin-tan: I'm out of batteries gg
[12:46 PM] Venexis: but nobody seems to want to change that because of completely arbitrary restrictions like levels needing 3 judges for WITBLO
[12:46 PM] Venexis: and we can only get 3 judges if people who entered also judge
[12:47 PM] Nin-tan: In tournaments its less of a deal because its skill based but ldcs are scored subjectively
[12:48 PM] Nin-tan: So we have to deal with "but muh judge's rights" if we want to fix that(edited)
[12:48 PM] Venexis: I guess my stance on all of this is that it's not addressing the issue at all so I don't really care one way or another


tl;dr: my issue is solely with the concept of backup judges- they are bad and stressful. If I was a full judge instead of a backup I could not give music points or any other points to my heart's content, and it wouldn't ♥♥♥♥ anyone over because I'm rating that way for every level. This is my specific issue, not the breakdown of points of the legitimacy of things like plot or music in a level's score. I no longer want to be that guy who is made to conform to everyone else's standards, solely because 10% of the entries need an extra judge.

Sorry Nin I ruined your name. But in my defense it did kill the entire forums and make me retype this. :p

Re: Revolution on SM63 judgings (?)

PostPosted: April 24th, 2016, 11:18 am
by MessengerOfDreams
Backup judging snafus is what leads to more LDC discrepancies and unfairness than people utilizing good alternate music. The 7th LDC had Blab's level only get two scores which contributed heavily to its win and one of those was a backup judge. I say don't remove them because then you're prohibiting people from entering the LDC out of need for judges who didn't enter only. Though I hate it's either that or "take away creative freedoms in the name of 'fairness'"