I had to do some validation on a username while coding an iOS app today and so dove into regular expression matching. The specification said “Usernames can include upper and lower case letters, numbers, periods, and underscores” and I’m figuring they’re thinking ASCII only. RegEx of [a-zA-Z0-9_.]+
kind of thing. Thinking of my twitter friend @krzyżanowskim and his posts about not being able to include the proper ż in so many places, I wanted to do better.
So I dove into the world of unicode regular expressions and found this very useful resource. I think I’ve figured out what might be a decent answer (please let me know here or on Twitter @GeekAndDad if you have advice to offer :)).
To include all letters, accented letters, symbols, numbers, an underscore or period (.), but exclude all other punctuation or spaces, I came up with this bit of Swift code:
func validateUsername( username: String) -> Bool {
// length tests go here and return false on failure
let expr = #"^[\p{L}\p{M}\p{N}_.\p{Me}\p{S}]+$"#
let matchesRange = username.range(of: expr, options: .regularExpression)
if matchesRange == nil {
return false
}
// additional validation step here
return true
}
Note that I don’t have to check matchesRange
‘s lowerBound
and upperBound
against the string’s startIndex
or endIndex
because I’ve included the ^
and $
to indicate that everything in the string must match the expression.
Seems to work so far. I’m impressed and grateful that Swift’s regular expressions support these unicode specifiers (!!!).