scalaunit-testingunicodescalacheck

Created unicode & unicode without whitespace generators in ScalaCheck


During testing we want to qualify unicode characters, sometimes with wide ranges and sometimes more narrow. I've created a few specific generators:

// Generate a wide varying of Unicode strings with all legal characters (21-40 characters):
val latinUnicodeCharacter = Gen.choose('\u0041', '\u01B5').filter(Character.isDefined)

// Generate latin Unicode strings with all legal characters (21-40 characters):
val latinUnicodeGenerator: Gen[String] = Gen.chooseNum(21, 40).flatMap { n =>
    Gen.sequence[String, Char](List.fill(n)(latinUnicodeCharacter))
}

// Generate latin unicode strings without whitespace (21-40 characters): !! COMES UP SHORT...
val latinUnicodeGeneratorNoWhitespace: Gen[String] = Gen.chooseNum(21, 40).flatMap { n =>
    Gen.sequence[String, Char](List.fill(n)(latinUnicodeCharacter)).map(_.replaceAll("[\\p{Z}\\p{C}]", ""))
}

The latinUnicodeCharacter generator picks from characters ranging from standard latin ("A," "B," etc.) up to higher order latin character (Germanic/Nordic and others). This is good for testing latin-based character input for, say, names.

The latinUnicodeGenerator creates strings of 21-40 characters in length. These strings include horizontal space (not just a space character but other "horizontal space").

The final example, latinUnicodeGeneratorNoWhitespace, is used for say email addresses. We want the latin characters but we don't want spaces, control codes, and the like. The problem: Because I'm mapping the final result String and filtering out the control characters, the String shrinks and I end up with a total length that is less than 21 characters (sometimes).

So the question is: How can I implement latinUnicodeGeneratorNoWhitespace but do it inside the generator in such a way that I always get 21-40 character strings?


Solution

  • You could do this by putting together a sequence of your non-whitespace characters, another of whitespace, and then picking from either only the non-whitespace, or from both together:

    import org.scalacheck.Gen
    
    val myChars = ('A' to 'Z') ++ ('a' to 'z')
    val ws = Seq(' ', '\t')
    
    val myCharsGenNoWhitespace: Gen[String] = Gen.chooseNum(21, 40).flatMap { n =>
      Gen.buildableOfN[String, Char](n, Gen.oneOf(myChars))
    }
    
    val myCharsGen: Gen[String] = Gen.chooseNum(21, 40).flatMap { n =>
      Gen.buildableOfN[String, Char](n, Gen.oneOf(myChars ++ ws))
    }
    

    I would suggest considering what you're really testing for, though—the more you restrict the test cases, the less you're checking about how your program will behave on unexpected inputs.