Page 3 of 4

Re: Official SM63 Judging System Discussion

PostPosted: May 17th, 2016, 2:18 pm
by Oranjui
poll wrote:What would you like to see happen with the Other category? (select all that apply/top few options)
Poll ended at Tue, 17 May 2016 15:29:29

OptionVotesPercent
No changes24%
Remove the category12%
Merge with Graphics to create "Atmosphere"20%
Replace the additive score with a holistic score32%
Give it a different scoring weight0%
Replace with a new category like "Originality"12%
Something else (specify in the thread)0%
Total votes: 25


General sentiment seems to be that if any changes are to be made, we should at the very least replace Other's additive scoring system with a holistic one. But it also seems like a number of people are fine with keeping the current system as is, and a slightly smaller number in favour of merging it with Graphics to create Atmosphere. A small number would also like to see the name of the category changed. It seems like everyone is comfortable with Other's current weight in the score.

There are a number of possible courses of action which the staff will discuss and then decide on for LDCs to come.
- No changes
- Change Other to holistic scoring system; no other changes
- Change Other to holistic scoring system, and change its name; no other changes
- Merge Other with Graphics to make Atmosphere and a 2-category system
- Merge Other with Graphics to make Atmosphere, but keep the third category under a different name

Speaking strictly personally, I think the second is the most likely to go through. Drastic changes to category organization like merging two categories or changing names has its own problems, mostly with regards to renormalizing past judgings, but it's possible.



The next topic of discussion will be announced at some point in the next few days, and a poll for it will go up once there's been sufficient discussion.

Re: Official SM63 Judging System Discussion

PostPosted: May 17th, 2016, 3:54 pm
by Venexis
I am not currently and will not ever endorse something that combines Other with any other category. Get rid of it, fine. Redistribute those points to make them more attainable, sure. But simply saying "people only used Other for [some other category] anyway" isn't going to fly with me, because it isn't true.

tl;dr: those last two options are terrible

Re: Official SM63 Judging System Discussion

PostPosted: August 17th, 2016, 4:27 am
by Supershroom
This has been pretty much on a hiatus. Yesterday I stated unsatisfaction about Yuri's world-record score and I was asked to elaborate what I think about problems with judging in general so I'm doing it here.

Now, what’s so bad about judgings like they're done these days: It’s all too unstandardized, and everyone is pretty much just doing their own ideosynchrasies and comes up with some incomprehensable reasonings for deductions or non-deductions. This isn’t a new thing in fact, remember how e.g. there were debates about Buff trampling over story levels and SMIC disliking levels without puzzle elements and alike? Point is, personal preferences (and bias thereof) are way too dominant over objective reasonings; this pertains all areas – gameplay, graphics, music, everything. In fact it’s been like this since the very beginning but now the more advanced level designing skills are, the more controversy shows up inevitably, and I just feel that many people (not wanting to name anyone here) are lacking qualification to be reliable judges because they're way too unforgiving on LDing styles other than their favourite, and because they're also too whiny when it comes to difficulty and just being a little patient if they die in the level a few times.

Even beyond that, some way-spread judging standards are just grotesque. Yuri's level scored high just for being an impressive concept despite not being a full effort like it could have been, while someone like FluorescenceLight actually really managed to bring across these vibes of “Hey, I’m playing an adventure!”, and he was punished too much because he went a risk with his gameplay and tiling mechanics and just because his levels are larger and it just ONLY takes ten seconds to load it … really, scorings for lags and loading times are WAY too extensive. This is something that bugs me off the most. If Yuri actually made his level bigger and filled it with more decoration (which is the main factor for loading time), he probably would have barely won any extra points with this compared to what he gathered anyway, but he would have probably lost some points just because of loading time and the level being large. On the other side, sometimes there are bonuses awarded for no loading time although they're so small that they really couldn't have loading time or lag unless you were extremely tampering. Isn’t that complete bollocks? I'm repeatedly suffering from it because I value length and decoration. Didn’t I try to weed out decoration where it wasn’t needed just to have a lower loading time? It’s just totally inevitable for me to have high loading times – if I try to fix that then it happens on costs of my graphics and there’s no proper way out under this judging system.

Now, another important point is: “Star power”. It's been discussed two years ago by Nan and it still exists. Having a bonus or a malus just because of your name and past works, no matter what your current work is. Yuri is having a prominent name and I’m not losing the impression that this is significant to why everyone was so stunned at his innovation and why he won so high. If the very same level was made by a less prominent guy like e.g. Dualfreezor, they would have still scored well but definitely not this high. On the other hand I’m still disadvantaged just because I’m giving some challenges in terms of platforming and everyone who’s just lacking gameplay skill already scores me low, since I'm previously known for overly frustrating levels. Regarding any aspects of "star power" bias, we don't have a simple solution like anonymous submitting since levels are still recognizable per style often, but we just need to get away from judging books by their cover.

Re: Official SM63 Judging System Discussion

PostPosted: August 17th, 2016, 5:00 am
by Karyete
I'm a little busy right now so I can't say much.

All I'm saying is that you complain about personal preference over objective reasonings, but you seem to be stating your own opinions as facts, i.e. How you feel about certain levels and users.

Re: Official SM63 Judging System Discussion

PostPosted: August 17th, 2016, 9:27 am
by MessengerOfDreams
That is a very biased post- probably more biased than the judging system you condemn. Especially now that they're nice and give you what you want. Hell, Venexis has launched a one-man crusade for you and you're upset you didn't get a winning score? That he scored Yuri higher than you? And you've changed none of your own flaws? You can't expect everyone else to change to benefit you. That's at least what I would do after getting caught doing the same things- or at least, I WOULDN'T COMPLAIN ABOUT HOW EVERYONE EXCEPT ME SUCKS.

Re: Official SM63 Judging System Discussion

PostPosted: August 17th, 2016, 10:06 am
by NanTheDark
Ok. I'm on. As a Judge of this contest, and a rather unexperienced one you might say (despite the fact that I've been following the LDing scene for like 5 years already), I feel I got some stuff to defend. If you guys think I suck at judging or giving scores, fine. That's alright. I might actually be bad at this. But I have to stand up for the other judges, for the LDing scene and everyone else involved in this.

Addressing first, the lack of "standardized methods" for judging. The thing is... there is no objective way to judge a level. You refer to "effort", that Yuri's level didn't take much "effort". And what is effort, according to you? Tons of decorations and tiles, huge levels? Because listen, huge levels? It's not a bad thing, if well designed (but you guys really wanted to test my PC's performance, didn't you all :P ). Graphics? They look nice. But they aren't what makes a level. A level carries a certain "flow", a way a player gets from one place to the next. But can you really define that objectively? I don't think it's something you can just grade according to a chart.

As for difficulty, at least myself, I often stated in my reviews "it's hard, but it's fair". What do I mean by that. I find difficulty to be fair when it's not greatly punishing (like, it being very easy to die, requiring you to walk all the way back to that spot again, to then die again, and having to walk all the way over there again), or not pixel-perfect (someone's level needed you to do a very long sprint to climb up a switch activated block staircase, which was activated by a switch that was pretty far and with a not very long timer), or when you die just because you didn't KNOW something was coming (enter a door and fall into a pit, Kaizo traps). In fact, regarding YOUR entry, I did not deduct score based on difficulty. I found it to be fair in difficulty.

Now, lag. The thing is, I'm sure you think you put a lot of effort into your levels. You work very hard at them right? The problem is, you work at it so hard that my PC can't load your freaking level. You put so much tiles and decorations and graphics that aren't even the main thing about making a level (Fun/Gameplay is the main category, at least it still is), that my PC takes like, half a minute to load your level.

But that's ok I guess, I mean, maybe my computer (a 5-year old Sony VAIO, Intel Core i3, stock graphics card and 4 GB of RAM) just sucks too much, maybe other people would take less to load it. Maybe it's just me.

Going through transitions takes like at least 10 seconds. That's alright, I can just be patient about it right?

And then, something I mentioned in my review of your level, I go through a transition. I wait 10, 30 seconds, 1 minute, 5. That transition freezes the game. How am I supposed to judge that part? How? I ended up just looking at it from the designer.

Now here's two things:

1. Judging. We even set deadlines for ourselves. We have to deal with both real life stuff and our duties as judges. And we have so much time to judge levels. And we feel the pressure. We know everyone wants to know who won. And often we can't play that many levels a day. So we have to make as much out of our time as possible. And when a level is a lagfest, that's time we lose. Time we do not have. We can't just spend a ton of time looking at super graphics-intensive levels. Your enormous golden statue means nothing if we can't lift it. And that takes me to...

2. Flash. This game is made in Flash. Flash was awesome back in the day, now it's pretty much accepted that it sucks. If this ran on something else, your beautiful levels would probably run fine, always 30 fps and whatnot. However, it's made in Flash. And when you design a game (because for some reason we love to see our levels as full blown games, don't we), you have to take the limits of your platform into consideration. You have to know how much you can and can't do. How many polygons can I have on screen, how many objects before it slows down, etc, etc, etc.

And THAT is why lag affects the scores, in my opinion. It really affects the player experience. I mean think if this were a level played by any random schmoe roaming the portal. Do you think they'll have a great time running around at the speed of 10 fps or so?

Now why did Yuri won. You might not have noticed, but these Level Designer Contests have this one thing, what is it called, I forget... hmm... oh yeah! THEMES. This LDC's theme was Dream. Yuri's concept was very fitting of a theme, it was a confusing and unreal experience, that used not just signs to tell a story but also made things like blocks or other things that made no sense. OJ's level wasn't that great, but it fit the theme, which caused the good Venexis to give him a higher score. OJ was one of the lowest scores I gave, personally, but his level did carry the dream thing alongside it. Beating Band Land, the Aztaroth thing, etc, etc.

Your level was a space mountain with no context at all, or nothing too special about it. It was a beatiful space mountain, but that's. It.

Also Star Power, hmmm...

These were the awardees:
1st - ~Zero (known guy)
2nd - Forgotten (who is this man?)
3rd - DualFreezor07 (first time contestant)
3rd - FluorescenceLight (I've seen him around the forums I think)
5th - Supershroom

I don't think you have enough evidence to claim this is Star Power. Then again, the only participants which are "veterans" in this contest (at least that I know of) are Yuri and you. And all you got, is 5th.

You want a new judging system? Sure, we can work that crap out, discuss things, maybe we will figure out something better than our current stuff. But do not try to fix the system just because you didn't get 1st place. Do not try to fix the system because you feel your oh so game breaking Treemaster efforts aren't appreciated. Instead try to focus your efforts into figuring out what is it that makes a level good. Not to mention, to design for fun, because you like it, because you want to. Not to get a fake award in some website.

Re: Official SM63 Judging System Discussion

PostPosted: August 17th, 2016, 11:27 am
by l.m
Yo, leaving my input here: Yuri's level didn't win because it had the best graphics or the best platforming. That was simply not his focus. He won because he performed it based around a concept, an idea, and he developed that idea extremely well, without the need for tree spam, without the need for complex storyline, without the need for extremely thought-out puzzles. It was a fresh idea, something that is really called for. To reiterate Yuri's position on Art, it's something that requires both imagination and skill: You must have good ideas, and you must know how to execute them. It's something that's always refreshing, renewing itself- the blatant reason why this lame old 2009 fan Super Mario flash game hasn't died yet - and boy I understand how exciting it should be for the judges to find something that's not a hundred thousand pixels large in both height and width, something that doesn't have the same old chunks of platforming being executed over and over again, something new. And speaking of new: Shouldn't you be proud to see the many fresh-faced users starring their first levels on an LDC, and getting EXTREMELY GOOD scores? First place ain't nothing compared to this fact, the fact that Level Designing is refreshing itself.

Did the judges give an extremely high score for Yuri's entry? Yeah, sure, but they also gave everyone an equally high score, and Yuri's score was just slightly above them all. I also don't and will never blame them for being optimistic. Because it's their score, their opinion, their take on LDC Judging, not mine. That's why they will always be right, regardless of whatever statements or facts you bring up here to try to mock their judgings: It's s-u-b-j-e-c-t-i-v-e, it's their opinion, it's THEIR STANDARDS. If you think that Yuri should've gotten a lower place below you and the other newer contestants, that's just your opinion- in fact, a really selfish one, you might do well to just keep it to yourself.

Re: Official SM63 Judging System Discussion

PostPosted: August 17th, 2016, 1:25 pm
by Oranjui
This isn't related to the judging system really, but this is my opinion, as a judge, on some of the talk about "effort", and basically a deeper elaboration on my own reviews. This probably isn't going to contribute much to the conversation (I agree with everything FF and Nan and MoD and Kary and some people on chat said, and it would be redundant to say more), but I like to justify my opinions. Anyway, here's why I think everyone's scores are fine the way they are, and nobody "deserves" anything different:


Shroom, your entry didn't lack physical effort in terms of laying out all the individual tiles and items or designing platforming challenge. It didn't lack effort in terms of balancing the difficulty properly--I noted in my review that it felt mostly pretty fair throughout, if perhaps a little too easy. But it did lack effort in terms of making the experience engaging and unique for the player, and I think that's what counts the most. You might have spent a long time putting work into designing a long, detailed platformer, but with a complete absence of plot, there's little to no sense of investment for the average player, and if anything I think that's what dragged your score down from what it could have been (Note that you still earned an average 15.5/20. Not earth-shattering, but that's still a pretty high score; everyone judged your entry fairly and frankly, you should be grateful that you were allowed to enter at all, considering you were banned from the site). As you mentioned in your post, yeah, Fluorescence totally nailed that sense of adventure, and that's why he earned such a high score, even without any of the perceived "star power" you think influenced Yuri's score2. I felt none of that sense of adventure at all while playing your level, and to me that signals a lack of effort. In addition, your level suffered from a lack of novelty. You didn't implement anything that felt new, or original, or fresh, or different--not in terms of gameplay, graphics, or story, or anything else for that matter. It was just very basic (and very linear, despite being long1) platforming gameplay with few additional elements other than utilizing the "trampoline" mechanic. You did a pretty great job of executing the gameplay and graphics you were going for in your level, and that's why you scored what you did, but there was nothing to push it above and beyond. You obviously put effort into the physical work of making such a long level, but you diverted none of that effort into making it stand out. And that's what counts--the type of effort you put in.

As for lag - you definitely need to be aware of the limitations of your platform (Flash) like Nan said, and you need to be conscious that not everybody--in fact most people--don't have super beefy computers able to flawlessly run a game as unoptimized as SM63. Lag and loading times can, in fact, affect a player's gameplay experience. I tried to refrain from putting too much emphasis on those things because I knew there would end up being complaints like yours, but when it gets bad to the point that it hampers my enjoyment of the level, then yes, I have every right to lower your score. (But hell, I still gave you an 8/10 in Gameplay myself, even with the huge loading times and moderate lag. I'm not sure how much higher I could go, even if you managed to take out all the lag and loading time issues.)


Footnotes:
1I mentioned in my review that I thought your level felt like it was cut short. That doesn't mean you should have gone for a larger level overall, but instead that you should have added more depth to your level. Long platforming sections with all the graphical things tend to have a poor ratio of "time spent designing" to "time spent playing", especially when they're so linear and open. Adding elements of nonlinearity, story, puzzle, or extra challenge (just to name a few) could have helped you out a lot with making the level feel less brief.

2This only applies to myself, but I really haven't been active on the LD scene in the past several years, so I haven't known much about what people have been up to or who's "most famous" or whatever. I did my best to give everyone a fair and unbiased judging--including you, who I still think shouldn't have been allowed to participate in the first place, but that's another discussion that was already concluded. Yuri didn't place so highly because of "star power" but because he came up with an idea that really felt fresh and unique, something which, while somewhat short and perhaps graphically simplistic2.1, really felt like he poured his heart and soul into.

2.1Yup, this is a footnote inside a footnote. Something I tried to stress in my reviews on the graphics side is that I personally believe that simple/minimalistic != "poor effort" if it's executed well and looks good with a limited palette (as in Yuri's case, though something like my own entry was probably too barebones), and inversely, complex/chaotic != "good effort" if it doesn't actually look good and comes across not as "detailed", but as messy instead. This goes back to my main point that concerns all aspects of level design--don't put all of your effort into length and size and simple challenges and pure graphical variety (though nobody really had that problem this contest); put a good amount of effort into engaging depth and good execution and originality and aesthetic elegance. Again, this is my personal opinion, but this is a component of why Yuri earned such a high score from me that I thought I should share.



I would have added this to my review, but it was already getting really long, and it's kind of rude to compare levels in a review anyway. How do you compare two paintings? It's all subjective, and I had hoped you might understand that, but apparently we still need to have objective, systematic quantification of art...

Reading over this post a second time, I want to make it clear that I'm not trying to bash your level (dude you got a 15/20 from me, I definitely think it was a good piece) but trying to help explain what I think you could have done better, and what made other people's entries feel more interesting to me than your own.

---

On-topic, I'm not staff anymore so I don't think it's fair for me to keep control over this thread. If there's still a significant amount of interest (i.e. more than literally just Shroom pushing his personal agenda) in trying to "standardize" judging, then someone else should take over for me. Otherwise it would probably be better for everyone if we just left this thread to die and let judges judge based on their own personal subjective experience--the way art should be judged, if it makes any sense to judge it at all.

Re: Official SM63 Judging System Discussion

PostPosted: August 17th, 2016, 1:41 pm
by MessengerOfDreams
There will be no objectivity as long as human beings are allowed to judge- and even if robots judge I doubt they will give Shroom the score as high as he wants.

Re: Official SM63 Judging System Discussion

PostPosted: August 17th, 2016, 9:47 pm
by Venexis
Two things-

MessengerOfDreams wrote:There will be no objectivity as long as human beings are allowed to judge- and even if robots judge I doubt they will give Shroom the score as high as he wants.


This is very true. Half the reasons levels score so highly is because people that play them are impressed by more than just the bare mechanics that anybody could evaluate and arrive at an identical conclusion, provided they had a sufficiently comprehensive rubric. And at that point... why not just have one judge, if removing all the quirks that appeal to judges and players on a human level means all truly impartial adjudicator will always get the same result, no matter how many times he plays, no matter how well (or poorly) the level has aged in comparison to newer works, no matter what the testing environment is?

We need human judges partly because they're the best we've got to work with, short of not having any judges at all, and because to not have a human factor in reviews would be far more of a loss to all aspects of the community than a point's difference here or there.

[7:14 PM] Venexis: I'd be willing to give it another go if y'all feel like my score's way too high but I really don't want to because doing different things in levels is pretty much the only thing I do and frankly not something people do often enough, especially in competitive environments. I want to reward that partly to prove a point (bigger, more complicated, laggier is NOT always better) and partly because it's what level designing used to be back when everyone had fun with it


l.m wrote:Yo, leaving my input here: Yuri's level didn't win because it had the best graphics or the best platforming. That was simply not his focus. He won because he performed it based around a concept, an idea, and he developed that idea extremely well, without the need for tree spam, without the need for complex storyline, without the need for extremely thought-out puzzles. It was a fresh idea, something that is really called for.


This is kindof related to the above. Most (if not all) serious competitors are here to make something great, or enjoy themselves, above win first place (or even place highly at all, for that matter). It's increasingly obvious with longer-term designers, their hardest critics are themselves. Even when they want to make something, the project is scrapped alarmingly often because it doesn't live up to their vision... not some potential low score at the end of the contest. It's about the journey, the vision, and the execution, all wrapped up in a nifty little pile of code that means a lot to its creator if nobody else. The score is such a tiny, trivial thing in comparison to that, and I think most of the community shares that priority ranking.

Second: Humans fuel innovation. In a system where judges are mechanically perfect, there is one consistent winning strategy that anyone can discover and abuse (20XX meme, anyone?1). There is a hardcoded "perfection requirement"- some combination of elements will be always guaranteed to give a perfect score. Eventually, someone will luck out and hit it exactly... and then what? The secret is out, now anyone can get a 20/20 by following some internet guide. There's no need to try new things, in fact, it would be actively discouraged, since the resulting score would almost certainly be less than perfect. It should be obvious why this is a bad thing. The fact that designing has seen such a dynamic evolution over the last six years, the fact that people are still trying and refining new concepts every single contest, is proof that human judges are working.

Bias is a fair concern, and we've fought to combat it wherever possible. In this contest, we did discuss, we did consider, and we did re-evaluate. That's about all we can do anymore, heh, at least, nobody's really given a concrete way to improve the system. But our judges are all reliable and knowledgeable people- otherwise they would not be judges- and a touch of bias, seeing through a lens of human emotion, can do more good than harm.

1 http://knowyourmeme.com/memes/20xx (It's meme trash, yeah, but a surprisingly good read)