There’s been a Murder in R City! The R Murder Mystery is designed to be both a lesson to learn R concepts and commands and a fun game for R users to solve an intriguing crime. The mystery is directly based on The SQL Murder Mystery, which again was inspired by a crime in the neighboring Terminal City.

To solve the mystery one must use various R functions and methods.

The mystery

A crime has taken place and the detective needs your help. The detective gave you the crime scene report, but you somehow lost it. You vaguely remember that the crime was a murder that occurred sometime on Jan.15, 2018 and that it took place in R City. All the clues to this mystery are buried in a several files, and you need to use R to navigate through this vast network of information. Your first step to solving the mystery is to retrieve the corresponding crime scene report from the police department’s database.

The data

You have the following files at your disposal. You need to use several of them to be able to solve the mystery: File structure

You might notice that certain variables have a key and arrow associated to them.

Key Key: A unique identifier for each row in the table

Arrow Arrow: Used to reference data in one table to those in another table (i.e. they are identical in both tables).


Get data

Download the data here: Data for R - Murder mystery

NOTE:

  • The police have no good system to collect crime scene reports. Therefore the file structure of the crime scene reports is somewhat different compared to the other data.
  • The download you downloaded is the raw data. It is good practice to place this data inside a data-raw folder.

How to solve the mystery

Without help

You can in in fact solve the entire mystery based on the information given above. This can be difficult, and as such, most people are recommended to follow along with the guided section below.

Guided

Begin in by revealing part 1. Reveal new sections as you complete the previous one.

If you at any point get stuck you can reveal hints to help yourselves along. You can also view the solution code for each section. Begin by revealing the this hint.

Hint 0: Click to reveal!

Try to refrain from using hints. However, if you get stuck, begin by reveal the first hint and repeat. The first hint is: remember to load packages or use package namespaces ‘::’.


Sections

Part 0: Read data

Download and read in the data. If you ever make use of the code solutions, the data for those are stored in the relative path of “data-raw/”. You can use the here() function to set a relative path.

Hint 1: Click to reveal!

The data is packaged. You might need to use a tool to extract it.

Hint 2: Click to reveal!

Use ‘unzip()’ to extract the data.


Solution

# Unzip data
unzip(here("data-raw/r-murder-mystery.zip"),
      exdir = here("data-raw/"))

Part 1: Investigate the crime scene report

Investigate the crime scene report for valuable information. For this task you will need to read in the crime scene file, and filter and select the column and that you need.

Hint 1: Click to reveal!

You want to look at the crime scene reports and look for a murder that happened on jan. 15, 2018 in R City

Hint 2: Click to reveal!

The data is stored in seperate data files. Can you recall a method to combine them?

Hint 3: Click to reveal!

It might be a good idea to use create a function to read in each data file.

Hint 4: Click to reveal!

You can use ‘dir_ls()’ and ‘map_dfr()’ to read the data

Hint 5: Click to reveal!

Data can be read with ‘vroom()’ or ‘read_csv()’

Hint 6: Click to reveal!

You can use dplyr verbs such as filter and select to find the data you are interested in.


Solution

# Function to read data
read_crime_scene_report <- function(file_path) {
    vroom(
        here(file_path),
        col_types = cols(
            date = col_character(),
            type = col_character(),
            description = col_character(),
            city = col_character()
        )
    )
}
# Get a list of directories
crime_scene_report_files <- dir_ls(here("data-raw/crime_scene_report/"))
    
# Read data - This can take some time
crime_scene_report_df <- map_dfr(crime_scene_report_files, read_crime_scene_report)
# Find crime scene report
crime_scene_report_df %>% 
    filter(date == "20180115" & 
                      type == "murder" &
                      city == "R City") %>%
    select(description)

Part 2 - Find witnesses

Find the witness statement.

Hint 1: Click to reveal!

You will not be able to locate the witness statement before finding out which persons gave the statmenents.

Hint 2: Click to reveal!

Investigate the person data file, and look for a person that lives at the last house on Northwestern Dr and a person named Annabel that lives on Franklin Ave.

Hint 3: Click to reveal!

To find the person that lives on Northwestern dr you can use the dplyr function ‘arrange()’ to sort address numbers.

Hint 4: Click to reveal!

To find Annabel you can use a regular expresions and the stringr function ‘str_detect()’.


Solution

# Read person data
person <- vroom(here("data-raw/person.csv"))
# Find person living on the last house on Northwestern Dr
person %>% 
    filter(address_street_name == "Northwestern Dr") %>%
    arrange(desc(address_number))
# Find the person named Annabel living on Franklin Ave
person %>% 
    filter(address_street_name == "Franklin Ave" &
                     str_detect(name, "Annabel"))

Part 3 - Look at witness statement

Find the witness statements.

Hint 1: Click to reveal!

Look for a witness statement with a matching ‘id’

Hint 2: Click to reveal!

Vroom might not be able to guess the file delimited. Check the ‘?vroom’ help file to see if there is any arguments you can use to read the file.

Hint 3: Click to reveal!

Annabel Miller has a ‘id’ of ‘16371’

Hint 4: Click to reveal!

Morty Schapiro has a ‘id’ of ‘14887’

Hint 5: Click to reveal!

The ‘id’ variable from the person table is the same as the ‘person_id’ variable in the inteview table

Hint 6: Click to reveal!

To find the statement you can filter on ‘person_id’


Solution

# Read interview file
interview <- vroom(here("data-raw/interview.csv"), 
                          delim = ",")
# Find Annabel Miller's statement
interview %>% 
    filter(person_id == 16371) %>%
    select(transcript)
# Find Morty Schapiro's statement
interview %>% 
    filter(person_id == 14887) %>%
    select(transcript)

Part 4.1 - Who was at the gym? (Annabel)

Find out who Annabel saw

Hint 1: Click to reveal!

You need to find out who visited the gym at the time Annabel noticed someone.

Hint 2: Click to reveal!

Annabel also noticed that the last part of the members id was ‘48Z’.

Hint 3: Click to reveal!

You might need to use two datasets at the same time.

Hint 4: Click to reveal!

You need the ‘get_fit_now_member’ and ‘get_fit_now_check_in’ data sets.

Hint 5: Click to reveal!

You might be able to combine the data somehow.

Hint 6: Click to reveal!

Using a function such as ‘left_join’ could work.

Hint 7: Click to reveal!

The dplyr and stringr package has some great functions that can help you along.

Hint 8: Click to reveal!

Regex :)

Hint 9: Click to reveal!

A combination of ‘filter()’ and ‘str_detet()’ is always great to use when extracting data about gym members.


Solution

# Read data files
get_fit_now_member <- vroom(here("data-raw/get_fit_now_member.csv"))
get_fit_now_check_in <- vroom(here("data-raw/get_fit_now_check_in.csv"))
# Join data
fitness_person_checkin_combined <- left_join(x = get_fit_now_member, y = get_fit_now_check_in, by = c("id" = "membership_id"))
# Find people that were at the gym Annabel specified
fitness_person_checkin_combined %>%
    filter(check_in_date == 20180109 &
                      str_detect(id, "48Z"))

Part 4.2 - Who owns the car that Morty saw?

Figure out who of the two gym rats that own the car that Morty saw.

Hint 1: Click to reveal!

You can perhaps use the ‘licence_id’ to identify the perpetrator.

Hint 2: Click to reveal!

You might need to combine some tables in order to find the person of interest.

Hint 3: Click to reveal!

Morty saw a licence plate number that contained ‘H42W’


Solution

# Read data files
driver_licence <- vroom(here("data-raw/drivers_license.csv"))
    
# Find licence ID of Joe and Jeremy
person %>%
    filter(id %in% c(28819, 67318)) %>%
    select(c(name, license_id))
# Find car
driver_licence %>%
    filter(id %in% c(173289, 423327))
# Combine data - Easier to read
combined_data <- left_join(x = person, y = driver_licence, by = c("license_id" = "id")) %>%
    filter(license_id %in% c(173289, 423327) &
                      str_detect(plate_number, "H42W")) 
# Print data
combined_data

Congratulation! You have found the killer!

or have you?