Washington data analysis

It undertakes a comprehensive analysis of demographic factors, including age distribution, race, mental health conditions, gender, and other pertinent variables, within the dataset comprising individuals involved in police shootings. The data utilized for this analysis is sourced from the Washington Post Police Shootings Database. The primary aim of this report is to illuminate the age demographics of individuals impacted by police violence, offering a meticulous and insightful analysis of the findings.

The foundation of this analysis rests upon the Washington Post Police Shootings Database, encompassing data pertaining to incidents of police shootings in the United States spanning the years 2015 to 2023. To ensure the integrity of our analysis, any absent age values were substituted with the dataset’s mean age, and meticulous measures were implemented to address NaN and null values across all other columns. These preprocessing steps were crucial in preparing the dataset for subsequent visualization and analysis.

We use the Python code that utilizes the pandas and matplotlib libraries for dataset handling and graphical representation, respectively. The Washington Post Police Shootings Database, in CSV format, is loaded into a pandas DataFrame named ‘data.’ Subsequently, the analysis focuses on examining trends over time, particularly the number of fatal police shootings each year.

To ensure data integrity, rows with NaN values in the ‘date’ column are removed. The ‘date’ column is then converted to a datetime format, and the corresponding years are extracted and stored in a new column named ‘year.’ The number of fatal police shootings per year is computed using the groupby function, and a line plot is generated using matplotlib.

– Loading the CSV file into a pandas DataFrame
-Dropping rows with NaN values in the ‘date’ column for data integrity
-Converting ‘date’ to datetime and extracting the year
– Calculating the number of fatal police shootings per year
– Creating a line plot

Findings

The examination of the plotted graph reveals a notable trend in the number of fatal police shootings over the years. Between 2016 and 2022, there was a pronounced and consistent increase in the count of such incidents. However, a conspicuous anomaly is observed in the year 2023, where a substantial decline in the count is evident. This abrupt shift prompts a cautious interpretation, raising the possibility of missing data or inconclusive details regarding the circumstances leading to the police shootings. It is essential to consider the potential factors contributing to this unexpected decrease and exercise prudence in drawing definitive conclusions about the incidents during the year 2023.

The following Python code examines the relationship between race and the number of individuals shot, employing the pandas and matplotlib libraries for data manipulation and visualization. Rows with NaN values in the ‘race’ column are removed to ensure data integrity. Subsequently, the count of people shot for each race is calculated, and a bar chart is generated to visually represent the distribution.

-Dropping rows with NaN values in the ‘race’ column for data integrity
– Counting the number of people shot for each race
– Creating a bar chart
– Rotating x-axis labels for readability
– Adjusting the layout to prevent overlapping

The provided Python code addresses the handling of missing or null values in the ‘age’ column by filling them with the median age. Subsequently, the code calculates the occurrences of each unique age, extracts the unique age values, and generates a scatter plot depicting the distribution of ages of individuals shot by the police.

– Filling NaN values or null values in the ‘age’ column with the median
– Counting the occurrences of each unique age
– Extracting the unique age values
– Creating a scatter plot using the unique age values and their counts

The following Python code conducts an analysis on the gender distribution of individuals involved in fatal police shootings. It utilizes the pandas library to handle the dataset and matplotlib for graphical representation. Rows with NaN values in the ‘gender’ column are dropped to ensure data integrity. The code then calculates the number of people shot for each gender and generates a bar chart to illustrate the gender distribution.

– Dropping rows with NaN values in the ‘gender’ column for data integrity
– Counting the number of people shot for each gender
-Creating a bar chart
– Rotating x-axis labels for readability
– Adjusting the layout to prevent overlapping

The following Python code examines fatal police shootings based on geographical locations, specifically focusing on cities and states. Utilizing the pandas library for data manipulation and matplotlib for visualization, the code groups the data by city and state, counts the number of shootings for each, and extracts the top 10 cities and states with the highest number of fatal police shootings. Bar charts are then generated to illustrate these findings.

– Grouping the data by city and counting the number of shootings for each city
– Extracting the top 10 cities with the highest number of shootings
– Creating a bar chart for the top 10 cities
– Grouping the data by state and counting the number of shootings for each state
– Extracting the top 10 states with the most shootings
– Creating a bar chart for the top 10 states.

 

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *