Regular Expressions for unicode string matching in Swift are pretty cool!

I had to do some validation on a username while coding an iOS app today and so dove into regular expression matching. The specification said “Usernames can include upper and lower case letters, numbers, periods, and underscores” and I’m figuring they’re thinking ASCII only. RegEx of [a-zA-Z0-9_.]+ kind of thing. Thinking of my twitter friend @krzyżanowskim and his posts about not being able to include the proper ż in so many places, I wanted to do better.

So I dove into the world of unicode regular expressions and found this very useful resource. I think I’ve figured out what might be a decent answer (please let me know here or on Twitter @GeekAndDad if you have advice to offer :)).

To include all letters, accented letters, symbols, numbers, an underscore or period (.), but exclude all other punctuation or spaces, I came up with this bit of Swift code:

func validateUsername( username: String) -> Bool { 
  // length tests go here and return false on failure 

  let expr = #"^[\p{L}\p{M}\p{N}_.\p{Me}\p{S}]+$"#
  let matchesRange = username.range(of: expr, options: .regularExpression)
  if matchesRange == nil {
    return false
  }

  // additional validation step here

  return true
}

Note that I don’t have to check matchesRange‘s lowerBound and upperBound against the string’s startIndex or endIndex because I’ve included the ^ and $ to indicate that everything in the string must match the expression.

Seems to work so far. I’m impressed and grateful that Swift’s regular expressions support these unicode specifiers (!!!).

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: