Categories
#rstats Data analysis

Generating unique IDs using R

Here’s a function that generates a specified number of unique random IDs, of a certain length, in a reproducible way.

There are many reasons you might want a vector of unique random IDs.  In this case, I embed my unique IDs in SurveyMonkey links that I send via mail merge. This way I can control the emailing process, rather than having messages come from SurveyMonkey, but I can still identify the respondents.  If you are doing this for the same purpose, note that you first need to enable a custom variable in SurveyMonkey!  I call mine for simplicity.

The function

create_unique_ids <- function(n, seed_no = 1, char_len = 5){
  set.seed(seed_no)
  pool <- c(letters, LETTERS, 0:9)
  
  res <- character(n) # pre-allocating vector is much faster than growing it
  for(i in seq(n)){
    this_res <- paste0(sample(pool, char_len, replace = TRUE), collapse = "")
    while(this_res %in% res){ # if there was a duplicate, redo
      this_res <- paste0(sample(pool, char_len, replace = TRUE), collapse = "")
    }
      res[i] <- this_res
  }
  res
}

Here’s what you get:

> create_unique_ids(10)
 [1] "qxJ4m" "36ONd" "mkQxV" "ES9xW" "5nOhq" "xax1v" "DLElZ" "PXgSz" "YOWIG" "WbDTQ"

This function could get stuck in the while-loop if your N exceeds the number of unique permutations of alphanumeric strings of length char_len.  There are length(pool) ^ char_len permutations available.  Under the default value of char_len = 5, that’s 62^5 combinations or 916,132,832.  This should not be a problem for most users.

On reproducible randomization

The ability to set the randomization seed is to aid in reproducing the ID vector.  If you’re careful, and using version control, you should be able to retrace what you did even without setting seed.  There are downsides to setting the same seed each time too, for instance, if your input list gets shuffled and you’re now assigning already-used codes to different users.

No matter how you use this function, think carefully about how to record and reuse values such that IDs stay consistent over time.

Exporting results for mail merging

Here’s what this might look like in practice if you want to generate these IDs, then merge them into SurveyMonkey links and export for sending in a mail merge.  In the example below, I generate both English- and Spanish-language links.

roster$id <- create_unique_ids(nrow(roster), seed = 23)
roster$link_en <- paste0("https://www.research.net/r/YourSurveyName?a=", roster$id, "&lang=en")
roster$link_es <- paste0("https://www.research.net/r/YourSurveyName?a=", roster$id, "&lang=es")
readr::write_csv(roster, "data/clean/roster_to_mail.csv", na = "")

Note that I have created the custom variable a in SurveyMonkey, which is why I can say a= in the URL.