Introduction

Column

Abstract

The purpose of this research is to determine the trends of homicide in the United States and, furthermore, in Ohio. Through the close observation of data frames and manipulation of the data, a “unsolved” homicide rate was calculated, which takes the number of unsolved homicides in a region over the total number of homicides there. While California, Texas, and New York are the states with the highest number of deaths, the states with the highest rate of unsolved homicides are New York, Maryland, and Illinois. A similar difference can be seen in Ohio’s counties: Cuyahoga, Franklin, and Hamilton have the highest count of homicides while Knox, Auglaize, and Paulding have the highest unsolved rates. Through graphical analysis, we see that African American males between the ages of 18 to 30 in Ohio face homicide cases that go unsolved more often than others. Finally, almost two-thirds of all homicides in Ohio are committed using some sort of firearm.

Variables considered throughout this analysis include:
• City & State of Homicide
• Year & Month of Homicide
• Whether the case was solved or not
• Victim Sex, Race, and Age
• Perpetrator Sex, Race, and Age
• Relation
• & Weapon Used

Column

Background Information

Every year, at least 5,000 perpetrators get away with murder; this means that, today, about a third of homicide cases go unsolved. National statistics on murder are estimates and projections based on incomplete police reports, as no agency’s are assigned to actually monitor failed cases.

The Murder Accountability Project is a non-profit organization that compiles data from federal, state, and local governments about unsolved homicides in order to educate Americans about the importance of accurately accounting for these failed cases. All of the data is gathered by Thomas Hargrove, a retired investigative journalist as well as former White House correspondent. The data used in this analysis was obtained from Kaggle.

Research Question(s)

  1. What is the distribution of unsolved homicides across the United States?
  2. More specifically, what is the distribution of unsolved homicides in Ohio?
  3. What qualities are most prevalent in unsolved murder victims?
    1. What is the distribution of Victim Sex?
    2. What is the distribution of Victim Race?
    3. What is the distribution of Victim Age?
    4. What is the distribution of murder weapons?

Exploring the Data

Column

US Unsolved Map

US Count Map

Glimpse of the Data

Raw Counts

Ohio Analysis

Column

Ohio Unsolved Map

Ohio Count Map

Ohio Raw Counts

Column

VicSex

VicRace

VicAge

Weapon

Weapon*

Discussion

Column

US Results

There are 628384 observations in our data once we drop Alaska, Hawaii, and the District of Columbia. Overall, there is about a rate of 29.4% for unsolved homicide cases in the United States. Specifically, New York, Maryland, Illinois, Massachusetts, and California have the highest state wide rates. On the other hand, we see the most solved cases coming out of North Dakota, Montana, South Dakota, South Carolina, and Idaho. I would say that these results make sense logically; we see more murders going unsolved in areas with denser populations. As sad as it is to say, more people would probably notice a person going missing in a small North Dakota town than bustling New York City. One might argue that smaller, spread out towns would make it easier to get away with murder, but our numbers seem to dismiss this belief.

There is a dramatic difference seem between the overall amount of homicides in California compared to any other state. With a total of 99,783 murders, they make up about 15.9% of the data. To see the difference, the next highest count is in Texas, with only 62,095 homicide cases appearing in the dataset. With a population already accounting for about 12% of the United States, it was not a surprise when California had the highest amount of homicides. For a bit of background on United States population density, most of the American population lives within a 100 miles radius from any coast, so we can infer that the inland states (such as the Dakotas) probably just have less people to commit crimes in general.

Ohio is rated number 11 for most counts of homicides in the United States. Further analysis will be applied to the Ohio data as a case study.

Ohio Results

The Ohio data contains 19158 observations. About 28.3% of homicide cases remain unsolved. We would suspect the highest number of unsolved cases to correspond to the counties with the highest population, just due to the sheer amount of people located there. These would be the counties with cities like Columbus, Toledo, Cleveland, Cincinnati, and Dayton; so, Franklin, Lucas, Cuyahoga, Hamilton, and Montgomery, respectively. However, we see that the counties with the highest rate of unsolved homicides are Knox, Auglaize, and Paulding. When observing the data more closely in order to determine why this is, we see that all three of the counties have less than ten homicides observed in general. So, this is making the rates look a lot larger than they are when comparing to counties with thousands of homicides.

Through the distribution of victim sex, we see that almost three times the amount of males than females were victims of homicide in Ohio. About 24.26% of female homicide cases remain unsolved, while cases with male victims are about 29.70% unsolved.

The majority of homicide cases in Ohio involve victims who are either White or Black. This is most likely because the majority of the population of the state identify as one of the two races, so let us consider primarily consider these. Based on the Ohio Census, we see that approximately 81% of the population is White and 13% is Black. However, about 41% of homicides in Ohio involve White victims and a whopping 58% of cases invovle Black victims.

The highest amount of homicides happen to victims between the age of 18-30, with the amount of cases decreasing exponentially as the victims get older. We do, however, see about 11% of cases occur to victims who are minors; though, these cases, at least visually, appear to be solved more often.

The overwhelming majority of murders in Ohio involve death by a firearm. In order to more accurately visualize the distribution of murder weapons, a pie chart is created. When firearms are removed, the next three highest categories are : Knives, Blunt Objects, and Unknown. Unfortunately, I am not sure if this says much about Ohio specifically, as firearms seem to be the weapon of choice all over the United States due to current gun regulations.

Column

Limitations

A major limitation of this data is the amount of crimes that remain unreported. As we have already seen, not every homicide is able to be solved. However, there are also plenty of homicides that just may never be reported, due to factors such as no body ever being found or person ever being reported missing.

Another limitation is that the rates of unsolved homicides were not calculated with respect to population, whether it be for state or for county. This makes some unsolved homicides rates appear inflated, when really, they could be better explained by the lack of homicides occurring there in general.

Finally, the “type” variable is so non-specific and could be better developed. There is one category that is “murder or manslaughter’, which I think could explain more homicide trends if split into two. Manslaughter is defined as the unintentional killing of another person and is often viewed as less severe than murder. By separating the two, we might be able to find other underlying trends. For example, there may be locations where murder is more prevalent than manslaughter, as well as different weapons used between the two.

Future Studies

A possible future study using this same data could be seeing if we can detect the presence of a serial killer. This could be potentially done through a closer analysis of the precise locations of the murders. While we do not have access to specific coordinates of where bodies may have been found, we can focus on cities with high counts of unsolved homicides.

Further, we could see which of these homicides had similar weapons used, as well as similar victim characteristics. Most serial killers target a specific type of person, so race/age/gender could help us see any trends. While we can keep the year and months of the incidents in mind, there are times where serial killers may remain dormant for years.

While we focused on unsolved homicides in our data, we were provided with characteristics of perpetrators for those crimes that were solved. This could assist us in further profiling potential killers, as we could see if there are similar trends between them. Also, there could be some perpetrators who were caught for one homicide, yet never confessed to another they did. We could use the same idea of seeing victim trends and times to see if they connect to any known perpetrators.

References

Dataset
Murder Accountability Project
Ohio Census
• Special shout out to Dr. Chen for all of her time and support ! :)

---
title: "Unsolved Homicides in the United States"
output: 
  flexdashboard::flex_dashboard:
    theme:
      version: 4
      bootswatch: pulse
      navbar-bg: "#F393AB"
    orientation: columns
    vertical_layout: fill
    source_code: embed
---

<style>
.chart-title {  /* chart_title  */
   font-size: 18px;
   font-family: Helvetica;
  }
body{ /* Normal  */
      font-size: 16px;
  }
</style>

```{r setup, include=FALSE}
library(flexdashboard)
```

Introduction
================================

Column {data-width=500}
-----------------------------------------------------------------------

### Abstract

```{r data}
library(pacman)
p_load(ggplot2, tidyverse, maps, plotly, viridis)

## reading the dataset and dropping irrelevant data
crime <- read.csv("/Users/erin/Desktop/R/data/crime.csv")
crime <- crime %>% 
  rename(Type = Crime.Type, Solved = Crime.Solved, VicSex = Victim.Sex, 
        VicAge = Victim.Age, VicRace = Victim.Race, PerpSex = Perpetrator.Sex,
        PerpAge = Perpetrator.Age, PerpRace = Perpetrator.Race) %>% 
  select(-c(Record.ID, Agency.Code, Agency.Name, Agency.Type, 
            Incident, Record.Source, Victim.Ethnicity, 
            Perpetrator.Ethnicity, Victim.Count, Perpetrator.Count, 
            Record.Source))

## remove hawaii alaska and dc for map purposes, still enough data points
crime <- crime[crime$State != "Hawaii" & crime$State != "Alaska" & crime$State !="District of Columbia",]

## combining firearm values
crime$Weapon[crime$Weapon=="Handgun"]<-"Firearm"
crime$Weapon[crime$Weapon=="Shotgun"]<-"Firearm"
crime$Weapon[crime$Weapon=="Rifle"]<-"Firearm"
crime$Weapon[crime$Weapon=="Gun"]<-"Firearm"
```

The purpose of this research is to determine the trends of homicide in the United States and, furthermore, in Ohio. Through the close observation of data frames and manipulation of the data, a "unsolved" homicide rate was calculated, which takes the number of unsolved homicides in a region over the total number of homicides there. While California, Texas, and New York are the states with the highest number of deaths, the states with the highest rate of unsolved homicides are New York, Maryland, and Illinois. A similar difference can be seen in Ohio's counties: Cuyahoga, Franklin, and Hamilton have the highest count of homicides while Knox, Auglaize, and Paulding have the highest unsolved rates. Through graphical analysis, we see that African American males between the ages of 18 to 30 in Ohio face homicide cases that go unsolved more often than others. Finally, almost two-thirds of all homicides in Ohio are committed using some sort of firearm. 
<br>
<br>
Variables considered throughout this analysis include:
<br>
• City & State of Homicide
<br>
• Year & Month of Homicide
<br>
• Whether the case was solved or not
<br>
• Victim Sex, Race, and Age
<br>
• Perpetrator Sex, Race, and Age
<br>
• Relation
<br>
• & Weapon Used


Column {data-width=650}
-----------------------------------------------------------------------

### Background Information

Every year, at least 5,000 perpetrators get away with murder; this means that, today, about a third of homicide cases go unsolved. National statistics on murder are estimates and projections based on incomplete police reports, as no agency's are assigned to actually monitor failed cases. 
<br>
<br>
The [Murder Accountability Project](https://www.murderdata.org) is a non-profit organization that compiles data from federal, state, and local governments about unsolved homicides in order to educate Americans about the importance of accurately accounting for these failed cases. All of the data is gathered by Thomas Hargrove, a retired investigative journalist as well as former White House correspondent. The data used in this analysis was obtained from [Kaggle](https://www.kaggle.com/datasets/murderaccountability/homicide-reports).

### Research Question(s)

1) What is the distribution of unsolved homicides across the United States?
2) More specifically, what is the distribution of unsolved homicides in Ohio?
3) What qualities are most prevalent in unsolved murder victims?
    a) What is the distribution of Victim Sex?
    b) What is the distribution of Victim Race?
    c) What is the distribution of Victim Age?
    d) What is the distribution of murder weapons?

Exploring the Data
================================

Column {data-width=500, .tabset}
-----------------------------------------------------------------------

### US Unsolved Map

```{r map1}
## creating us map
us_map <- map_data("state") %>% 
  filter(region != "district of columbia") %>% 
  select(-subregion)
us_map$region <- unname(sapply(us_map$region, str_to_title))

## finding total count of unsolved homicides
## 0 - solved, 1 - unsolved
crime$Solved <- crime$Solved %>% 
  recode("Yes" = 0,
         "No" = 1)
usUnsolved <- aggregate(crime$Solved, list(crime$State), FUN=sum)

## finding total count of homicides
## joining data and finding unsolved rate
us_count <- crime %>% 
  group_by(State) %>% 
  mutate(count = n()) %>% 
  select(State, count) %>% 
  distinct()

us_count <- us_count %>% 
  left_join(usUnsolved, by = c("State" = "Group.1"))

colnames(us_count)[colnames(us_count) == "x"] <- "unsolvCount"

us_count <- us_count %>% 
  mutate(unsolvRate = round((unsolvCount / count), 5))

## left joining datasets
crime_map <- us_count %>% 
  left_join(us_map, by = c("State" = "region"))

us_count <- crime_map %>% 
  group_by(State) %>% 
  summarise(long = mean(long), lat = mean(lat), count = mean(count),
            unsolvCount = mean(unsolvCount), unsolvRate = mean(unsolvRate))
us_count$abb <- state.abb[-c(2,11)]

## creating united states graph
g1 <- ggplot(crime_map, aes(x = long, y = lat)) +
  geom_polygon(aes(group = group, fill = unsolvRate,
                   text = paste0("State: ", State, "\n",
                                 "Total Homicides: ", count, "\n",
                                 "Unsolved Homicides: ", unsolvCount, "\n",
                                 "Unsolved Homicide Rate: ", unsolvRate)), colour = "white") +
  scale_fill_viridis_c(option = "F") +
  theme_minimal() +
  theme(axis.title.x=element_blank(), axis.text.x=element_blank(),
        axis.ticks.x=element_blank(), axis.title.y=element_blank(),
        axis.text.y=element_blank(), axis.ticks.y=element_blank(),
        panel.grid.major=element_blank(),
        panel.background=element_blank()) +
  labs(fill = "Unsolved Rate", title = "Distribution of US Homicide Cases")

ggplotly(g1, tooltip="text")
```

### US Count Map

```{r map2}
## bubble map
us_top15 <- us_count %>% 
  arrange(desc(count)) %>% 
  head(15)
us_top15 <- semi_join(crime_map, us_top15, by = "State")

top15_map <- us_top15 %>% 
  group_by(State) %>% 
  summarise(long = mean(long), lat = mean(lat), count = mean(count))

g3 <- crime_map %>% 
  ggplot() + 
  geom_polygon(aes(x = long, y = lat, group = group, text = State),
               fill = "grey", alpha = 0.5, colour = "white") +
  geom_point(data = top15_map,
             aes(x = long, y = lat, size = count, color = count, alpha = count,
                 text = paste0(State, ":", count))) +
  scale_size_continuous(range=c(1,25)) +
  scale_color_viridis(option="F", name = "Number of Homicides") +
  coord_map() + 
  theme_minimal() +
  theme(axis.title.x=element_blank(), axis.text.x=element_blank(),
        axis.ticks.x=element_blank(), axis.title.y=element_blank(),
        axis.text.y=element_blank(), axis.ticks.y=element_blank(),
        panel.grid.major=element_blank(),
        panel.background=element_blank()) +
  labs(title = "Top 15 Counts of Homicides")

ggplotly(g3, tooltip = "text")
```


### Glimpse of the Data

```{r glimpse}
glimpse.crime <- crime %>% 
  select(State, Year, Month, Solved, VicSex, VicRace, VicAge, Weapon) %>% 
  arrange(Year)
glimpse.crime <- glimpse.crime[sample(1:nrow(glimpse.crime), 1000,
   replace=FALSE),]

DT::datatable(glimpse.crime, colnames = c("State", "Year", "Month", "Unsolved",
                                          "Victim Sex", "Victim Race",
                                          "Victim Age", "Weapon"))
```

### Raw Counts

```{r counts}
count_table <- us_count %>% 
  arrange(desc(count)) %>% 
  select(State, count, unsolvCount)

DT::datatable(count_table, colnames = c("State", "Homicide Count", "Unsolved Count"))
```


Ohio Analysis
================================

Column {data-width=500, .tabset}
-----------------------------------------------------------------------

### Ohio Unsolved Map

```{r map3}
## narrowing down to only ohio data
ohio <- crime[crime$State=="Ohio",]

## same process for map making as us map
ohio_county <- map_data("county", region = "ohio")
ohio_county$subregion <- str_to_title(ohio_county$subregion)

ohio_count <- ohio %>% 
  group_by(City) %>% 
  mutate(count = n()) %>%
  select(City, count) %>% 
  distinct() 

ohUnsolved <- aggregate(ohio$Solved, list(ohio$City), FUN=sum)

ohio_count <- ohio_count %>% 
  left_join(ohUnsolved, by = c("City" = "Group.1"))

colnames(ohio_count)[colnames(ohio_count) == "x"] <- "unsolvCount"

ohio_count <- ohio_count %>% 
  mutate(unsolvRate = round((unsolvCount / count), 5))

ohio_map <- ohio_count %>% 
  left_join(ohio_county, by = c("City" = "subregion"))

ohio_count <- ohio_map %>% 
  group_by(City) %>% 
  summarise(long = mean(long), lat = mean(lat), count = mean(count),
            unsolvCount = mean(unsolvCount), unsolvRate = mean(unsolvRate))

g2 <- ggplot(ohio_map, aes(x = long, y = lat)) +
  geom_polygon(aes(group = group, fill = unsolvRate,
                   text = paste0("County: ", City, "\n",
                                 "Total Homicides: ", count, "\n",
                                 "Unsolved Homicides: ", unsolvCount, "\n",
                                 "Unsolved Homicide Rate: ", unsolvRate)), colour = "white") +
  scale_fill_viridis_c(option = "F") +
  theme_minimal() +
  theme(axis.title.x=element_blank(), axis.text.x=element_blank(),
        axis.ticks.x=element_blank(), axis.title.y=element_blank(),
        axis.text.y=element_blank(), axis.ticks.y=element_blank(),
        panel.grid.major=element_blank(),
        panel.background=element_blank()) +
  labs(fill = "Unsolved Rate", title = "Distribution of Ohio Homicide Cases")

ggplotly(g2, tooltip="text")
```

### Ohio Count Map

```{r map4}
## bubble map
ohio_top10 <- ohio_count %>% 
  arrange(desc(count)) %>% 
  head(10)
ohio_top10 <- semi_join(ohio_map, ohio_top10, by = "City")

top10oh_map <- ohio_top10 %>% 
  group_by(City) %>% 
  summarise(long = mean(long), lat = mean(lat), count = mean(count))

g4 <- ohio_map %>% 
  ggplot() + 
  geom_polygon(aes(x = long, y = lat, group = group, text = City),
               fill = "grey", alpha = 0.5, colour = "white") +
  geom_point(data = top10oh_map,
             aes(x = long, y = lat, size = count, color = count, alpha = count,
                 text = paste0(City, ":", count))) +
  scale_size_continuous(range=c(1,25)) +
  scale_color_viridis(option="F", name = "Number of Homicides") +
  coord_map() + 
  theme_minimal() +
  theme(axis.title.x=element_blank(), axis.text.x=element_blank(),
        axis.ticks.x=element_blank(), axis.title.y=element_blank(),
        axis.text.y=element_blank(), axis.ticks.y=element_blank(),
        panel.grid.major=element_blank(),
        panel.background=element_blank()) +
  labs(title = "Top 10 Counts of Homicides")

ggplotly(g4, tooltip = "text")
```

### Ohio Raw Counts

```{r ohcounts}
ohcount_table <- ohio_count %>% 
  arrange(desc(count)) %>% 
  select(City, count, unsolvCount)

DT::datatable(ohcount_table, colnames = c("County", "Homicide Count", "Unsolved Count"))
```

Column {data-width=500, .tabset}
-----------------------------------------------------------------------

### VicSex

```{r vicsex}
## reverting solved back to character values for graph purposes
ohio$Solved[ohio$Solved == 0]<-"Yes"
ohio$Solved[ohio$Solved == 1]<-"No"

## bar charts of victim characteristics 
## dropping unknown in the graph just bc only 52 observations
p1 <- ggplot(ohio, aes(x = VicSex, fill = as.factor(Solved))) + geom_bar() +
  scale_fill_discrete(name = "Solved?", type = c("#ffcde3", "#aa6785")) + 
  scale_x_discrete(limits = c("Female", "Male")) +
  labs(title = "Distribution of Victim Sex in Ohio", x = "Victim Sex", y = "Number of Homicides") +
  theme_classic()
ggplotly(p1)
```

### VicRace

```{r vicrace}
## renaming some races to better fit graph
ohio$VicRace[ohio$VicRace=="Asian/Pacific Islander"]<-"Asian/PI"
ohio$VicRace[ohio$VicRace=="Native American/Alaska Native"]<-"Native Amer"

p2 <- ggplot(ohio, aes(x = VicRace, fill = as.factor(Solved))) + geom_bar() +
  scale_fill_discrete(name = "Solved?", type = c("#ffcde3", "#aa6785")) + 
  labs(title = "Distribution of Victim Race in Ohio", x = "Victim Race", y = "Number of Homicides") +
  theme_classic()
ggplotly(p2)
```

### VicAge

```{r age}
## some ages entered incorrectly as 998 so we need to fix this... 
## just gonna narrow down data
ohio_age <- ohio %>% 
  mutate(AgeGroup = case_when(
    VicAge < 18 ~ "18-",
    18 <= VicAge & VicAge < 30 ~ "18-30",
    30 <= VicAge & VicAge < 40 ~ "30-40",
    40 <= VicAge & VicAge < 50 ~ "40-50",
    50 <= VicAge & VicAge < 60 ~ "50-60",
    60 <= VicAge & VicAge < 70 ~ "60-70",
    70 <= VicAge & VicAge < 80 ~ "70-80",
    80 <= VicAge & VicAge < 90 ~ "80-90")) %>% 
  na.omit()

p3 <- ggplot(ohio_age, aes(x = AgeGroup, fill = as.factor(Solved))) + geom_bar() +
  scale_fill_discrete(name = "Solved?", type = c("#ffcde3", "#aa6785")) + 
  labs(title = "Distribution of Victim Age in Ohio", x = "Age Group", y = "Number of Homicides") +
  coord_flip() +
  theme_classic()
ggplotly(p3)
```

### Weapon

```{r weapon}
p4 <- ggplot(ohio, aes(x = Weapon, fill = as.factor(Solved))) + geom_bar() +
  scale_fill_discrete(name = "Solved?", type = c("#ffcde3", "#aa6785")) + 
  labs(title = "Distribution of Murder Weapons Used in Ohio", x = "Weapon", y = "Number of Homicides") +
  coord_flip() +
  theme_classic()
ggplotly(p4, tooltip = "text")
```

### Weapon*

```{r weapon2}
### pie chart to see more of the weapon distribution since so many types
weapon_count <- count(ohio, Weapon)
weapon_count$percent <- round(weapon_count$n/sum(weapon_count$n)*100,2)

colList = c("#FEF6F8","#FAD1DB","#B8143D","#F07594","#EB4770","#E6194C",
            "#F5A3B8","#A11236","#8A0F2E","#5C0A1F","#450817","#2E050F")

pie <- plot_ly(weapon_count, labels = ~Weapon, values = ~n, type = 'pie', marker=list(colors=colList))
pie <- pie %>% layout(title = 'Weapon Distribution in Ohio',
                      xaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE),
                      yaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE))
pie
```

Discussion
================================

Column {data-width=550, .tabset}
-----------------------------------------------------------------------

### US Results
There are 628384 observations in our data once we drop Alaska, Hawaii, and the District of Columbia. Overall, there is about a rate of 29.4% for unsolved homicide cases in the United States. Specifically, New York, Maryland, Illinois, Massachusetts, and California have the highest state wide rates. On the other hand, we see the most solved cases coming out of North Dakota, Montana, South Dakota, South Carolina, and Idaho. I would say that these results make sense logically; we see more murders going unsolved in areas with denser populations. As sad as it is to say, more people would probably notice a person going missing in a small North Dakota town than bustling New York City. One might argue that smaller, spread out towns would make it easier to get away with murder, but our numbers seem to dismiss this belief. 
<br>
<br>
There is a dramatic difference seem between the overall amount of homicides in California compared to any other state. With a total of 99,783 murders, they make up about 15.9% of the data. To see the difference, the next highest count is in Texas, with only 62,095 homicide cases appearing in the dataset. With a population already accounting for about 12% of the United States, it was not a surprise when California had the highest amount of homicides. For a bit of background on United States population density, most of the American population lives within a 100 miles radius from any coast, so we can infer that the inland states (such as the Dakotas) probably just have less people to commit crimes in general.
<br>
<br>
Ohio is rated number 11 for most counts of homicides in the United States. Further analysis will be applied to the Ohio data as a case study.

### Ohio Results

The Ohio data contains 19158 observations. About 28.3% of homicide cases remain unsolved. We would suspect the highest number of unsolved cases to correspond to the counties with the highest population, just due to the sheer amount of people located there. These would be the counties with cities like Columbus, Toledo, Cleveland, Cincinnati, and Dayton; so, Franklin, Lucas, Cuyahoga, Hamilton, and Montgomery, respectively. 
However, we see that the counties with the highest rate of unsolved homicides are Knox, Auglaize, and Paulding. When observing the data more closely in order to determine why this is, we see that all three of the counties have less than ten homicides observed in general. So, this is making the rates look a lot larger than they are when comparing to counties with thousands of homicides. 
<br>
<br>
Through the distribution of victim sex, we see that almost three times the amount of males than females were victims of homicide in Ohio. About 24.26% of female homicide cases remain unsolved, while cases with male victims are about 29.70% unsolved. 
<br>
<br>
The majority of homicide cases in Ohio involve victims who are either White or Black. This is most likely because the majority of the population of the state identify as one of the two races, so let us consider primarily consider these. Based on the [Ohio Census](https://www.census.gov/quickfacts/OH), we see that approximately 81% of the population is White and 13% is Black. However, about 41% of homicides in Ohio involve White victims and a whopping 58% of cases invovle Black victims. 
<br>
<br>
The highest amount of homicides happen to victims between the age of 18-30, with the amount of cases decreasing exponentially as the victims get older. We do, however, see about 11% of cases occur to victims who are minors; though, these cases, at least visually, appear to be solved more often. 
<br>
<br>
The overwhelming majority of murders in Ohio involve death by a firearm. In order to more accurately visualize the distribution of murder weapons, a pie chart is created. When firearms are removed, the next three highest categories are : Knives, Blunt Objects, and Unknown. Unfortunately, I am not sure if this says much about Ohio specifically, as firearms seem to be the weapon of choice all over the United States due to current gun regulations. 

Column {data-width=500}
-----------------------------------------------------------------------

### Limitations

A major limitation of this data is the amount of crimes that remain unreported. As we have already seen, not every homicide is able to be solved. However, there are also plenty of homicides that just may never be reported, due to factors such as no body ever being found or person ever being reported missing. 
<br>
<br>
Another limitation is that the rates of unsolved homicides were not calculated with respect to population, whether it be for state or for county. This makes some unsolved homicides rates appear inflated, when really, they could be better explained by the lack of homicides occurring there in general. 
<br>
<br>
Finally, the "type" variable is so non-specific and could be better developed. There is one category that is "murder or manslaughter', which I think could explain more homicide trends if split into two. Manslaughter is defined as the unintentional killing of another person and is often viewed as less severe than murder. By separating the two, we might be able to find other underlying trends. For example, there may be locations where murder is more prevalent than manslaughter, as well as different weapons used between the two. 

### Future Studies

A possible future study using this same data could be seeing if we can detect the presence of a serial killer. This could be potentially done through a closer analysis of the precise locations of the murders. While we do not have access to specific coordinates of where bodies may have been found, we can focus on cities with high counts of unsolved homicides. 
<br>
<br>
Further, we could see which of these homicides had similar weapons used, as well as similar victim characteristics. Most serial killers target a specific type of person, so race/age/gender could help us see any trends. While we can keep the year and months of the incidents in mind, there are times where serial killers may remain dormant for years. 
<br>
<br>
While we focused on unsolved homicides in our data, we were provided with characteristics of perpetrators for those crimes that were solved. This could assist us in further profiling potential killers, as we could see if there are similar trends between them. Also, there could be some perpetrators who were caught for one homicide, yet never confessed to another they did. We could use the same idea of seeing victim trends and times to see if they connect to any known perpetrators. 

### References
• [Dataset](https://www.kaggle.com/datasets/murderaccountability/homicide-reports)
<br>
• [Murder Accountability Project](https://www.murderdata.org)
<br>
• [Ohio Census](https://www.census.gov/quickfacts/OH)
<br>
• Special shout out to Dr. Chen for all of her time and support ! :)