3-5 points is a good goal but honestly, I think differences that are much larger could be fine, under the right circumstances.
For example: Judge 1 rates a level 18. Judge 2 gives a 16.5. Judge 3 gives a 15. Judge 4 gives an 13. Judge 5 gives a 11. That's a huge difference, 7 points (See Figure 1). But is it a problem? Well... probably not. It's spread more or less evenly over that seven point range, which is distinctly different from getting 17's and 18's from Judges 1-4, and an 11 from Judge 5 (See Figure 2).

In Figure 1, things all approximate the general trend (green line). There is
probably not an issue despite the score range- clearly the judges just had differing opinions, and their reviews will no doubt reflect this.

In Figure 2, one point is MUCH below the general trend (green line). There
is probably an issue, as evidenced by the score distribution. Note that Figures 1 and 2 both had the same range, only the distribution was different. Why is Judge 5 rating so much lower than everyone else, or why are Judges 1-4 rating so much higher?
Note that, as I have said a ♥♥♥♥ of time before,
IF A SCORE IS JUSTIFIED, REGARDLESS OF HOW LOW/HIGH IT IS, IT IS FINE. Your role as judge is simply to be sure that it is justifiable, and perhaps take into account feedback from other judges. Did you explain it? Good, leave it then.
But, the anomaly in Figure 2 is
DEFINITELY something that should be looked into.
If all our judges are fine not giving a case like that depicted in Figure 2 a second thought, perhaps they shouldn't be judges. That is all the
DISCUSSION phase is intended to be. It is not a "conform or get out" phase. I do not think a single person here is suggesting that, and frankly I'm getting a little tired of reiterating this.
EDIT: Emphasized the key words
StarEDIT: removed 200 size words because that was unnecessary