jiyachachan commited on
Commit
c1fe312
·
verified ·
1 Parent(s): c2ad736

Rename pages/Global_map.py to pages/Child Mortality VS GDP.py

Browse files
pages/Child Mortality VS GDP.py ADDED
@@ -0,0 +1,71 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import pandas as pd
2
+ import altair as alt
3
+ import streamlit as st
4
+
5
+ # Load the data
6
+ child_mortality = pd.read_csv("https://huggingface.co/spaces/jiyachachan/fp2/resolve/main/child_mortality_0_5_year_olds_dying_per_1000_born.csv") # Format: Country, Year, Value
7
+ gdp_per_capita = pd.read_csv("https://huggingface.co/spaces/jiyachachan/fp2/resolve/main/gdp_pcap.csv") # Format: Country, Year, Value
8
+
9
+ # Melt datasets to tidy format
10
+ child_mortality = child_mortality.melt(id_vars=["country"], var_name="year", value_name="child_mortality")
11
+ gdp_per_capita = gdp_per_capita.melt(id_vars=["country"], var_name="year", value_name="gdp_per_capita")
12
+
13
+ # Merge the datasets
14
+ merged_data = pd.merge(child_mortality, gdp_per_capita, on=["country", "year"])
15
+ merged_data["year"] = merged_data["year"].astype(int) # Ensure 'year' is an integer
16
+
17
+ # Drop rows with missing or undefined country values
18
+ merged_data = merged_data.dropna(subset=["country"])
19
+ merged_data = merged_data[merged_data["country"] != "undefined"]
20
+
21
+ # Convert gdp_per_capita and child_mortality to numeric
22
+ merged_data["gdp_per_capita"] = pd.to_numeric(merged_data["gdp_per_capita"], errors="coerce")
23
+ merged_data["child_mortality"] = pd.to_numeric(merged_data["child_mortality"], errors="coerce")
24
+
25
+ # Drop rows with missing or invalid data
26
+ merged_data = merged_data.dropna(subset=["gdp_per_capita", "child_mortality"])
27
+
28
+ # Streamlit app
29
+ st.title("Interactive Visualization: GDP vs. Child Mortality")
30
+
31
+ st.text(" ")
32
+
33
+ st.text("The dataset represents global development indicators related to child mortality and GDP per capita for multiple countries over several years. Each row corresponds to a unique country-year combination, with the key fields being country (categorical, representing the country name), year (integer, indicating the year of data collection), child_mortality (numeric, showing the number of children under five dying per 1,000 live births), and gdp_per_capita (numeric, representing GDP per capita in constant 2017 international dollars). The dataset spans a wide range of years and countries, making it suitable for temporal and regional analyses. Missing values are present in some fields, particularly for earlier years or less-developed countries, and were handled during the data cleaning process. The values in child_mortality range from 2.24 to 756.0, while gdp_per_capita spans from $354.00 to $10,000.00, reflecting significant disparities in economic and health outcomes across countries and regions.")
34
+
35
+ st.text(" ")
36
+
37
+ # Filter data for a specific year
38
+ year = st.slider("Select Year", min_value=int(merged_data["year"].min()), max_value=int(merged_data["year"].max()), value=2020)
39
+ filtered_data = merged_data[merged_data["year"] == year]
40
+
41
+ # Select number of countries to display
42
+ num_countries = st.slider("Select Number of Countries to Display", min_value=5, max_value=50, value=10, step=5)
43
+
44
+ # Get top N countries by GDP per capita
45
+ top_countries = filtered_data.nlargest(num_countries, "gdp_per_capita")
46
+
47
+ # Create scatter plot with regression line
48
+ scatter_plot = alt.Chart(top_countries).mark_circle(size=60).encode(
49
+ x=alt.X("gdp_per_capita:Q", scale=alt.Scale(type="log"), title="GDP per Capita (Log Scale)"),
50
+ y=alt.Y("child_mortality:Q", title="Child Mortality (per 1,000 live births)"),
51
+ color=alt.Color("country:N"),
52
+ tooltip=["country", "gdp_per_capita", "child_mortality"]
53
+ ).properties(
54
+ title=f"Relationship Between GDP Per Capita and Child Mortality ({year})",
55
+ width=800,
56
+ height=500
57
+ )
58
+
59
+ # Add regression line
60
+ regression_line = scatter_plot.transform_regression(
61
+ "gdp_per_capita", "child_mortality", method="linear"
62
+ ).mark_line(color="red")
63
+
64
+ # Combine scatter plot and regression line
65
+ final_chart = scatter_plot + regression_line
66
+
67
+ # Display chart in Streamlit
68
+ st.altair_chart(final_chart, use_container_width=True)
69
+
70
+
71
+ st.text("To build the observatory, I began by preparing the dataset, which involved merging child mortality and GDP per capita data based on common fields: country and year. I ensured that the data was cleaned and formatted correctly, converting numerical fields like child_mortality and gdp_per_capita to numeric types and handling missing values by dropping rows with invalid entries. Once the data was ready, I created initial static visualizations using Altair to explore the relationship between GDP per capita and child mortality. The chart shows the relationship between GDP per capita and child mortality rates, highlighting an inverse trend where higher GDP per capita generally corresponds to lower child mortality. Building on this foundation, I added interactivity through Streamlit, allowing users to dynamically filter the dataset by year and select the number of countries to display. To enhance the visual analysis, I overlaid a regression line on the scatter plot, which provides a clear representation of trends. The app's functionality was refined iteratively, incorporating sliders for user interaction and tooltips for exploring country-specific data points.")
pages/Global_map.py DELETED
@@ -1,120 +0,0 @@
1
- # Hugging Face's logo
2
- # Hugging Face
3
- # Search models, datasets, users...
4
- # Models
5
- # Datasets
6
- # Spaces
7
- # Posts
8
- # Docs
9
- # Enterprise
10
- # Pricing
11
-
12
-
13
-
14
- # Spaces:
15
-
16
- # SmeetPatel
17
- # /
18
- # FP1
19
-
20
-
21
- # like
22
- # 0
23
-
24
- # App
25
- # Files
26
- # Community
27
- # Settings
28
- # FP1
29
- # /
30
- # app.py
31
-
32
- # SmeetPatel's picture
33
- # SmeetPatel
34
- # Update app.py
35
- # 9c77701
36
- # verified
37
- # less than a minute ago
38
- # raw
39
-
40
- # Copy download link
41
- # history
42
- # blame
43
- # edit
44
- # delete
45
-
46
- # 3.59 kB
47
- import os
48
- import subprocess
49
- import sys
50
-
51
- # Install plotly if not already installed
52
- try:
53
- import plotly
54
- except ImportError:
55
- subprocess.check_call([sys.executable, "-m", "pip", "install", "plotly"])
56
-
57
-
58
- # import subprocess
59
- # import sys
60
-
61
- import pandas as pd
62
- import streamlit as st
63
- import plotly.express as px
64
-
65
-
66
-
67
- # Load Dataset
68
- # Example: 'country' column for country names, other columns for years
69
- data = pd.read_csv("https://huggingface.co/spaces/jiyachachan/fp2/resolve/main/child_mortality_0_5_year_olds_dying_per_1000_born.csv")
70
-
71
- # Melt the data to long format for easier filtering
72
- data_melted = data.melt(id_vars=["country"], var_name="year", value_name="mortality_rate")
73
- data_melted["year"] = pd.to_numeric(data_melted["year"])
74
-
75
- # Streamlit App
76
- st.title("Global Child Mortality Rate (per 1000 children born)")
77
- st.write("""This interactive visualization provides an insightful overview of child mortality rates (number of deaths per 1,000 live births) across countries for a selected year.
78
- The data highlights disparities in healthcare, socioeconomic conditions, and development across the globe, making it a valuable tool for understanding global health challenges.""")
79
-
80
- # Add year selection
81
- # years = sorted(data_melted["year"].unique()) # Extract unique years from the dataset
82
- # selected_year = st.selectbox("Select Year", years)
83
- # Add year selection with a slider
84
- min_year = int(data_melted["year"].min())
85
- max_year = int(data_melted["year"].max())
86
-
87
- st.subheader("Child Mortality Trends around the Globe")
88
- st.write("""
89
- This Chart reveals an important trend of how the child mortality rate have been changing across the years.
90
- This gives us a very important insight on how the present developed countries have successfully reduced the rate, and underdeveloped countries still faces challenges to curb child mortality successfully.
91
- We can utilise the trends in the graph to understand the factors which might be the responsible for high mortality or low mortality.
92
- This will help the policymakers in developing/under-developed countries to develop data-driven policy to reduce child mortality.
93
- """)
94
-
95
- selected_year = st.slider("Select Year", min_value=min_year, max_value=max_year, value = 2024, step = 5)
96
-
97
-
98
- # Filter data for the selected year
99
- filtered_data = data_melted[data_melted["year"] == selected_year]
100
-
101
-
102
- # Create the map
103
- fig = px.choropleth(
104
- filtered_data,
105
- locations="country", # Country names or ISO 3166-1 Alpha-3 codes
106
- locationmode="country names", # Use 'ISO-3' if you have country codes
107
- color="mortality_rate",
108
- title=f"Child Mortality Rate in {selected_year}",
109
- color_continuous_scale=px.colors.sequential.OrRd, # Customize the color scale
110
-
111
- )
112
-
113
- # Display the map
114
- st.plotly_chart(fig)
115
-
116
- st.write("""I began by acquiring a dataset on child mortality rates, with countries as rows and years as columns. The dataset contained child mortality rates as the number of deaths per 1,000 live births.
117
- To make the dataset suitable for visualization, I transformed it into a long format using pandas.melt(), creating three columns: country, year, and mortality_rate. This step allowed for efficient filtering and visualization.
118
- I chose a choropleth map because it effectively communicates regional differences using a color gradient. Each country is color-coded based on its mortality rate for a selected year, offering immediate visual insights.
119
- I implemented a slider widget for year selection, enabling users to dynamically explore mortality rates over time.
120
- This required ensuring that the year column was properly formatted as numeric data, and filtering the dataset based on the slider’s value.""")