diff --git a/notebooks/README.md b/notebooks/README.md
index 60d6636b..8c744144 100644
--- a/notebooks/README.md
+++ b/notebooks/README.md
@@ -15,8 +15,6 @@ Notebook | Description
[`analyzing_genomic_data.ipynb`](https://colab.research.google.com/drive/1Io7EDr4LjfPLl_l2JYY8__WbfitfNlOf) | A notebook that analyzes genetic variants within RUNX1 (provided by multiple datasets from UCSC Genome Browser, NCBI/gene, and ClinVar).
[`Drug_Discovery_With_Data_Commons.ipynb`](https://colab.research.google.com/drive/1dSKYiRMn3mbDsInorQzYM0yk7sqv6fIV) | A notebook performing drug discovery by identifying novel applications of previously approved drugs using Biomedical Data Commons.
[`protein-charts.ipynb`](https://colab.research.google.com/drive/1Kh-ufqobdChZ2qQgEY0rdPA2_DBmOiSG) | A notebook summarizing various protein properties and interactions using graphical visualizations.
-[`Superfund sites (basic)`](Accessing_Superfund_data_from_Data_Commons.ipynb) | A notebook that illustrates basic access to [Superfund sites](https://en.wikipedia.org/wiki/List_of_Superfund_sites) data in Data Commons.
-[`Superfund sites (extended)`](Analyzing_SuperfundSites_with_Data_Commons.ipynb) | A notebook that includes extended analysis using [Superfund sites](https://en.wikipedia.org/wiki/List_of_Superfund_sites) data in Data Commons.
## Maintenance
diff --git a/notebooks/intro_data_science/Feature_Engineering.ipynb b/notebooks/intro_data_science/Feature_Engineering.ipynb
index 8c5adf3c..ae001859 100644
--- a/notebooks/intro_data_science/Feature_Engineering.ipynb
+++ b/notebooks/intro_data_science/Feature_Engineering.ipynb
@@ -5,7 +5,6 @@
"colab": {
"name": "Feature Engineering.ipynb",
"provenance": [],
- "collapsed_sections": [],
"include_colab_link": true
},
"kernelspec": {
@@ -24,7 +23,7 @@
"colab_type": "text"
},
"source": [
- " "
+ " "
]
},
{
@@ -45,7 +44,7 @@
"source": [
"# Exploring Feature Engineering\n",
"\n",
- "Welcome! In this lesson, we'll be exploring various techniques for feature engineering. We'll be walking through the steps one takes to set up your data for your machine learning models, starting with acquiring and exploring the data, working through different transformations and feature representation choices, and analyzing how those design decisions affect our model's results. \n",
+ "Welcome! In this lesson, we'll be exploring various techniques for feature engineering. We'll be walking through the steps one takes to set up your data for your machine learning models, starting with acquiring and exploring the data, working through different transformations and feature representation choices, and analyzing how those design decisions affect our model's results.\n",
"\n",
"## Learning Objectives:\n",
"In this lesson, we'll be covering\n",
@@ -62,7 +61,7 @@
"\n",
"And for help with Pandas and manipulating data frames, take a look at the [Pandas Documentation](https://pandas.pydata.org/docs/reference/index.html).\n",
"\n",
- "We'll be using the scikit-learn library for implementing our models today. Documentation can be found [here](https://scikit-learn.org/stable/modules/classes.html). \n",
+ "We'll be using the scikit-learn library for implementing our models today. Documentation can be found [here](https://scikit-learn.org/stable/modules/classes.html).\n",
"\n",
"As usual, if you have any other questions, please reach out to your course staff!\n"
]
@@ -87,11 +86,7 @@
{
"cell_type": "code",
"metadata": {
- "id": "gUETYfc0EuGg",
- "colab": {
- "base_uri": "https://localhost:8080/"
- },
- "outputId": "cd5c105a-79ab-4959-b511-d11b4ff99a3e"
+ "id": "gUETYfc0EuGg"
},
"source": [
"# We need to install the Data Commons API, since they don't ship natively with\n",
@@ -106,30 +101,8 @@
"# Import the two methods from heatmap library to make pretty correlation plots\n",
"!pip install heatmapz --upgrade --quiet"
],
- "execution_count": null,
- "outputs": [
- {
- "output_type": "stream",
- "text": [
- "Collecting heatmapz\n",
- " Downloading heatmapz-0.0.4-py3-none-any.whl (5.8 kB)\n",
- "Requirement already satisfied: seaborn in /usr/local/lib/python3.7/dist-packages (from heatmapz) (0.11.1)\n",
- "Requirement already satisfied: matplotlib>=3.0.3 in /usr/local/lib/python3.7/dist-packages (from heatmapz) (3.2.2)\n",
- "Requirement already satisfied: pandas in /usr/local/lib/python3.7/dist-packages (from heatmapz) (1.1.5)\n",
- "Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=3.0.3->heatmapz) (0.10.0)\n",
- "Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=3.0.3->heatmapz) (2.4.7)\n",
- "Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=3.0.3->heatmapz) (2.8.2)\n",
- "Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=3.0.3->heatmapz) (1.3.1)\n",
- "Requirement already satisfied: numpy>=1.11 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=3.0.3->heatmapz) (1.19.5)\n",
- "Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from cycler>=0.10->matplotlib>=3.0.3->heatmapz) (1.15.0)\n",
- "Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.7/dist-packages (from pandas->heatmapz) (2018.9)\n",
- "Requirement already satisfied: scipy>=1.0 in /usr/local/lib/python3.7/dist-packages (from seaborn->heatmapz) (1.4.1)\n",
- "Installing collected packages: heatmapz\n",
- "Successfully installed heatmapz-0.0.4\n"
- ],
- "name": "stdout"
- }
- ]
+ "execution_count": 22,
+ "outputs": []
},
{
"cell_type": "code",
@@ -155,7 +128,7 @@
"import matplotlib.pyplot as plt\n",
"from heatmap import heatmap, corrplot"
],
- "execution_count": null,
+ "execution_count": 23,
"outputs": []
},
{
@@ -189,7 +162,7 @@
"colab": {
"base_uri": "https://localhost:8080/"
},
- "outputId": "996adc68-c555-4d2c-8a35-95b76d5f6bcc"
+ "outputId": "51662f19-aee2-43e2-a219-6d2d8a7a9f00"
},
"source": [
"# Choose your state:\n",
@@ -199,14 +172,14 @@
"county_dcids = dc.get_places_in([your_state_dcid], \"County\")[your_state_dcid]\n",
"print(county_dcids)"
],
- "execution_count": null,
+ "execution_count": 24,
"outputs": [
{
"output_type": "stream",
+ "name": "stdout",
"text": [
"['geoId/06001', 'geoId/06003', 'geoId/06005', 'geoId/06007', 'geoId/06009', 'geoId/06011', 'geoId/06013', 'geoId/06015', 'geoId/06017', 'geoId/06019', 'geoId/06021', 'geoId/06023', 'geoId/06025', 'geoId/06027', 'geoId/06029', 'geoId/06031', 'geoId/06033', 'geoId/06035', 'geoId/06037', 'geoId/06039', 'geoId/06041', 'geoId/06043', 'geoId/06045', 'geoId/06047', 'geoId/06049', 'geoId/06051', 'geoId/06053', 'geoId/06055', 'geoId/06057', 'geoId/06059', 'geoId/06061', 'geoId/06063', 'geoId/06065', 'geoId/06067', 'geoId/06069', 'geoId/06071', 'geoId/06073', 'geoId/06075', 'geoId/06077', 'geoId/06079', 'geoId/06081', 'geoId/06083', 'geoId/06085', 'geoId/06087', 'geoId/06089', 'geoId/06091', 'geoId/06093', 'geoId/06095', 'geoId/06097', 'geoId/06099', 'geoId/06101', 'geoId/06103', 'geoId/06105', 'geoId/06107', 'geoId/06109', 'geoId/06111', 'geoId/06113', 'geoId/06115']\n"
- ],
- "name": "stdout"
+ ]
}
]
},
@@ -253,7 +226,7 @@
"base_uri": "https://localhost:8080/",
"height": 1000
},
- "outputId": "b486589a-2b1b-4235-fb67-5ba0d1f43c02"
+ "outputId": "f39de835-7334-4918-b5ae-399916e1d07e"
},
"source": [
"# Create a pandas dataframe containing data for each of the features\n",
@@ -261,24 +234,210 @@
"# with one column per feature.\n",
"\n",
"stat_vars_to_query = [\n",
- " \"CumulativeCount_MedicalTest_ConditionCOVID_19_Positive\",\n",
+ " \"CumulativeCount_MedicalConditionIncident_COVID_19_ConfirmedOrProbableCase\",\n",
" \"Count_Person\",\n",
" \"Count_Person_MarriedAndNotSeparated\",\n",
" \"Median_Income_Person\",\n",
" \"Count_Household_With4OrMorePerson\"\n",
- " \n",
+ "\n",
"]\n",
"\n",
"raw_df = dcp.build_multivariate_dataframe(county_dcids, stat_vars_to_query)\n",
"display(raw_df)"
],
- "execution_count": null,
+ "execution_count": 26,
"outputs": [
{
"output_type": "display_data",
"data": {
+ "text/plain": [
+ " CumulativeCount_MedicalConditionIncident_COVID_19_ConfirmedOrProbableCase \\\n",
+ "place \n",
+ "geoId/06001 284054 \n",
+ "geoId/06003 126 \n",
+ "geoId/06005 9242 \n",
+ "geoId/06007 40181 \n",
+ "geoId/06009 7754 \n",
+ "geoId/06011 4549 \n",
+ "geoId/06013 209958 \n",
+ "geoId/06015 6381 \n",
+ "geoId/06017 30508 \n",
+ "geoId/06019 257611 \n",
+ "geoId/06021 6631 \n",
+ "geoId/06023 20952 \n",
+ "geoId/06025 66715 \n",
+ "geoId/06027 4636 \n",
+ "geoId/06029 244681 \n",
+ "geoId/06031 55429 \n",
+ "geoId/06033 11699 \n",
+ "geoId/06035 10751 \n",
+ "geoId/06037 2908425 \n",
+ "geoId/06039 43685 \n",
+ "geoId/06041 38685 \n",
+ "geoId/06043 3145 \n",
+ "geoId/06045 16568 \n",
+ "geoId/06047 72959 \n",
+ "geoId/06049 1000 \n",
+ "geoId/06051 3144 \n",
+ "geoId/06053 95140 \n",
+ "geoId/06055 27765 \n",
+ "geoId/06057 17503 \n",
+ "geoId/06059 600384 \n",
+ "geoId/06061 71527 \n",
+ "geoId/06063 3379 \n",
+ "geoId/06065 626695 \n",
+ "geoId/06067 314407 \n",
+ "geoId/06069 13636 \n",
+ "geoId/06071 597377 \n",
+ "geoId/06073 824586 \n",
+ "geoId/06075 143959 \n",
+ "geoId/06077 178501 \n",
+ "geoId/06079 57556 \n",
+ "geoId/06081 137238 \n",
+ "geoId/06083 92683 \n",
+ "geoId/06085 342015 \n",
+ "geoId/06087 52532 \n",
+ "geoId/06089 36988 \n",
+ "geoId/06091 324 \n",
+ "geoId/06093 7478 \n",
+ "geoId/06095 89419 \n",
+ "geoId/06097 90191 \n",
+ "geoId/06099 136645 \n",
+ "geoId/06101 23045 \n",
+ "geoId/06103 14753 \n",
+ "geoId/06105 1485 \n",
+ "geoId/06107 136125 \n",
+ "geoId/06109 13758 \n",
+ "geoId/06111 186062 \n",
+ "geoId/06113 41061 \n",
+ "geoId/06115 17944 \n",
+ "\n",
+ " Count_Person Count_Person_MarriedAndNotSeparated \\\n",
+ "place \n",
+ "geoId/06001 1662323 674824.699 \n",
+ "geoId/06003 1119 495.504 \n",
+ "geoId/06005 40083 17272.889 \n",
+ "geoId/06007 212744 76639.140 \n",
+ "geoId/06009 46308 21560.079 \n",
+ "geoId/06011 21558 8416.200 \n",
+ "geoId/06013 1152333 491223.639 \n",
+ "geoId/06015 27968 9762.150 \n",
+ "geoId/06017 192925 91421.044 \n",
+ "geoId/06019 1000918 337513.410 \n",
+ "geoId/06021 28283 12146.960 \n",
+ "geoId/06023 134977 43891.200 \n",
+ "geoId/06025 180267 56884.256 \n",
+ "geoId/06027 18046 7219.959 \n",
+ "geoId/06029 901362 312889.872 \n",
+ "geoId/06031 152692 53927.780 \n",
+ "geoId/06033 64479 24822.130 \n",
+ "geoId/06035 30016 11461.076 \n",
+ "geoId/06037 9943046 3520955.696 \n",
+ "geoId/06039 157761 56900.481 \n",
+ "geoId/06041 257332 113378.795 \n",
+ "geoId/06043 17160 7397.292 \n",
+ "geoId/06045 86061 33108.240 \n",
+ "geoId/06047 279252 94260.678 \n",
+ "geoId/06049 8763 3883.438 \n",
+ "geoId/06051 14534 5487.377 \n",
+ "geoId/06053 430906 162039.867 \n",
+ "geoId/06055 135965 58060.389 \n",
+ "geoId/06057 99606 45714.103 \n",
+ "geoId/06059 3166857 1306956.195 \n",
+ "geoId/06061 402950 185639.202 \n",
+ "geoId/06063 18967 8585.850 \n",
+ "geoId/06065 2489188 936495.840 \n",
+ "geoId/06067 1559146 581828.561 \n",
+ "geoId/06069 64055 25577.667 \n",
+ "geoId/06071 2189183 781869.830 \n",
+ "geoId/06073 3332427 1293264.554 \n",
+ "geoId/06075 866606 306862.042 \n",
+ "geoId/06077 767967 283515.990 \n",
+ "geoId/06079 282249 114640.775 \n",
+ "geoId/06081 758308 335088.173 \n",
+ "geoId/06083 444766 160367.004 \n",
+ "geoId/06085 1907105 834042.321 \n",
+ "geoId/06087 269925 103077.379 \n",
+ "geoId/06089 179027 73045.770 \n",
+ "geoId/06091 2920 1334.942 \n",
+ "geoId/06093 43245 17714.508 \n",
+ "geoId/06095 446935 178080.056 \n",
+ "geoId/06097 489819 201348.202 \n",
+ "geoId/06099 550081 203766.620 \n",
+ "geoId/06101 96385 39530.528 \n",
+ "geoId/06103 64494 26263.636 \n",
+ "geoId/06105 12216 4868.825 \n",
+ "geoId/06107 468680 163314.826 \n",
+ "geoId/06109 54515 23785.476 \n",
+ "geoId/06111 841387 335898.012 \n",
+ "geoId/06113 219728 78920.160 \n",
+ "geoId/06115 80160 29632.050 \n",
+ "\n",
+ " Median_Income_Person Count_Household_With4OrMorePerson \n",
+ "place \n",
+ "geoId/06001 56575 155852 \n",
+ "geoId/06003 35598 103 \n",
+ "geoId/06005 41581 2923 \n",
+ "geoId/06007 33600 16410 \n",
+ "geoId/06009 37043 3075 \n",
+ "geoId/06011 36820 2203 \n",
+ "geoId/06013 54178 118442 \n",
+ "geoId/06015 31929 2309 \n",
+ "geoId/06017 48876 16166 \n",
+ "geoId/06019 33875 107943 \n",
+ "geoId/06021 35120 2500 \n",
+ "geoId/06023 31657 9713 \n",
+ "geoId/06025 24717 18112 \n",
+ "geoId/06027 41594 1043 \n",
+ "geoId/06029 30912 97654 \n",
+ "geoId/06031 34210 16052 \n",
+ "geoId/06033 31565 5140 \n",
+ "geoId/06035 34293 1615 \n",
+ "geoId/06037 38111 967873 \n",
+ "geoId/06039 30316 16535 \n",
+ "geoId/06041 65068 20703 \n",
+ "geoId/06043 36299 1233 \n",
+ "geoId/06045 34707 6681 \n",
+ "geoId/06047 31343 30146 \n",
+ "geoId/06049 31521 465 \n",
+ "geoId/06051 46048 1105 \n",
+ "geoId/06053 37063 48002 \n",
+ "geoId/06055 47432 12018 \n",
+ "geoId/06057 40174 6898 \n",
+ "geoId/06059 46430 321300 \n",
+ "geoId/06061 53071 37963 \n",
+ "geoId/06063 39230 999 \n",
+ "geoId/06065 37929 268129 \n",
+ "geoId/06067 42351 154446 \n",
+ "geoId/06069 44611 7695 \n",
+ "geoId/06071 36178 242009 \n",
+ "geoId/06073 45463 298453 \n",
+ "geoId/06075 69260 58854 \n",
+ "geoId/06077 38674 88055 \n",
+ "geoId/06079 40720 21754 \n",
+ "geoId/06081 63325 68046 \n",
+ "geoId/06083 38787 43491 \n",
+ "geoId/06085 62532 185999 \n",
+ "geoId/06087 43988 23759 \n",
+ "geoId/06089 35503 14201 \n",
+ "geoId/06091 31696 114 \n",
+ "geoId/06093 31315 3024 \n",
+ "geoId/06095 45137 42824 \n",
+ "geoId/06097 48308 42270 \n",
+ "geoId/06099 36126 59134 \n",
+ "geoId/06101 34285 10428 \n",
+ "geoId/06103 33015 5817 \n",
+ "geoId/06105 30470 604 \n",
+ "geoId/06107 31326 56914 \n",
+ "geoId/06109 37688 3488 \n",
+ "geoId/06111 42693 82532 \n",
+ "geoId/06113 40567 21166 \n",
+ "geoId/06115 35459 8439 "
+ ],
"text/html": [
- "
\n",
+ "\n",
+ "
\n",
+ "
\n",
"\n",
+ "\n",
+ " \n",
+ "
\n",
+ "\n",
+ "\n",
+ "
\n",
+ "
\n",
+ "\n",
+ "\n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ "\n",
+ "\n",
+ "\n",
+ " \n",
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "\n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ "
\n",
+ "\n",
+ "
\n",
+ "
\n"
],
- "text/plain": [
- " Count_Person ... CumulativeCount_MedicalTest_ConditionCOVID_19_Positive\n",
- "place ... \n",
- "geoId/06001 1656754 ... 73771 \n",
- "geoId/06003 1039 ... 74 \n",
- "geoId/06005 38429 ... 3199 \n",
- "geoId/06007 225817 ... 10177 \n",
- "geoId/06009 45514 ... 1763 \n",
- "geoId/06011 21454 ... 1935 \n",
- "geoId/06013 1142251 ... 57337 \n",
- "geoId/06015 27495 ... 883 \n",
- "geoId/06017 188563 ... 8359 \n",
- "geoId/06019 984521 ... 88476 \n",
- "geoId/06021 27976 ... 2074 \n",
- "geoId/06023 135940 ... 2754 \n",
- "geoId/06025 180701 ... 26160 \n",
- "geoId/06027 17977 ... 1083 \n",
- "geoId/06029 887641 ... 94113 \n",
- "geoId/06031 150691 ... 20773 \n",
- "geoId/06033 64195 ... 2809 \n",
- "geoId/06035 30818 ... 5393 \n",
- "geoId/06037 10081570 ... 1116948 \n",
- "geoId/06039 155433 ... 14443 \n",
- "geoId/06041 259943 ... 12346 \n",
- "geoId/06043 17420 ... 367 \n",
- "geoId/06045 87224 ... 3434 \n",
- "geoId/06047 271382 ... 26230 \n",
- "geoId/06049 8907 ... 407 \n",
- "geoId/06051 14310 ... 1165 \n",
- "geoId/06053 433410 ... 39425 \n",
- "geoId/06055 139623 ... 8342 \n",
- "geoId/06057 99244 ... 3499 \n",
- "geoId/06059 3168044 ... 245978 \n",
- "geoId/06061 385512 ... 18430 \n",
- "geoId/06063 18660 ... 626 \n",
- "geoId/06065 2411439 ... 271910 \n",
- "geoId/06067 1524553 ... 86388 \n",
- "geoId/06069 60376 ... 5401 \n",
- "geoId/06071 2149031 ... 274429 \n",
- "geoId/06073 3316073 ... 238042 \n",
- "geoId/06075 874961 ... 31427 \n",
- "geoId/06077 742603 ... 61901 \n",
- "geoId/06079 282165 ... 17605 \n",
- "geoId/06081 767423 ... 35466 \n",
- "geoId/06083 444829 ... 28567 \n",
- "geoId/06085 1927470 ... 101964 \n",
- "geoId/06087 273962 ... 13383 \n",
- "geoId/06089 179212 ... 10333 \n",
- "geoId/06091 3040 ... 95 \n",
- "geoId/06093 43468 ... 1532 \n",
- "geoId/06095 441829 ... 27706 \n",
- "geoId/06097 499772 ... 26108 \n",
- "geoId/06099 543194 ... 50983 \n",
- "geoId/06101 96109 ... 8264 \n",
- "geoId/06103 63912 ... 4660 \n",
- "geoId/06105 12700 ... 307 \n",
- "geoId/06107 461898 ... 44518 \n",
- "geoId/06109 54045 ... 3549 \n",
- "geoId/06111 847263 ... 69931 \n",
- "geoId/06113 217352 ... 11698 \n",
- "geoId/06115 76360 ... 5324 \n",
- "\n",
- "[58 rows x 5 columns]"
- ]
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "dataframe",
+ "variable_name": "raw_df",
+ "summary": "{\n \"name\": \"raw_df\",\n \"rows\": 58,\n \"fields\": [\n {\n \"column\": \"place\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 58,\n \"samples\": [\n \"geoId/06001\",\n \"geoId/06011\",\n \"geoId/06069\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"CumulativeCount_MedicalConditionIncident_COVID_19_ConfirmedOrProbableCase\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 406565,\n \"min\": 126,\n \"max\": 2908425,\n \"num_unique_values\": 58,\n \"samples\": [\n 284054,\n 4549,\n 13636\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Count_Person\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1457019,\n \"min\": 1119,\n \"max\": 9943046,\n \"num_unique_values\": 58,\n \"samples\": [\n 1662323,\n 21558,\n 64055\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Count_Person_MarriedAndNotSeparated\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 530767.1947848009,\n \"min\": 495.504,\n \"max\": 3520955.696,\n \"num_unique_values\": 58,\n \"samples\": [\n 674824.699,\n 8416.2,\n 25577.667\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Median_Income_Person\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 9434,\n \"min\": 24717,\n \"max\": 69260,\n \"num_unique_values\": 58,\n \"samples\": [\n 56575,\n 36820,\n 44611\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Count_Household_With4OrMorePerson\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 142918,\n \"min\": 103,\n \"max\": 967873,\n \"num_unique_values\": 58,\n \"samples\": [\n 155852,\n 2203,\n 7695\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"
+ }
},
- "metadata": {
- "tags": []
- }
+ "metadata": {}
}
]
},
@@ -870,10 +1232,10 @@
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
- "height": 689
+ "height": 1000
},
"id": "0O20_YeL95nn",
- "outputId": "f95c85e8-5f05-4d5d-a1ae-e2a1e249bff7"
+ "outputId": "1c4f82c5-c8a1-4f19-ab1e-d5cc0301e299"
},
"source": [
"# Generate a correlation matrix plot\n",
@@ -881,20 +1243,17 @@
"plt.figure(figsize=(8, 8))\n",
"corrplot(raw_df.corr(), size_scale=300);"
],
- "execution_count": null,
+ "execution_count": 27,
"outputs": [
{
"output_type": "display_data",
"data": {
- "image/png": "\n",
"text/plain": [
- ""
- ]
+ ""
+ ],
+ "image/png": "\n"
},
- "metadata": {
- "tags": [],
- "needs_background": "light"
- }
+ "metadata": {}
}
]
},
@@ -925,7 +1284,7 @@
"height": 1000
},
"id": "ivZK29Ho_PYN",
- "outputId": "b87d7843-86fd-4d6f-8e43-b100f202f7d0"
+ "outputId": "39f9d87d-1ae3-43f7-964d-36b5d39255b4"
},
"source": [
"filtered_stat_vars_to_query = [\n",
@@ -934,20 +1293,206 @@
" \"Count_Person_MarriedAndNotSeparated\",\n",
" \"Median_Income_Person\",\n",
" \"Count_Household_With4OrMorePerson\"\n",
- " \n",
+ "\n",
"]\n",
"\n",
"# Get data from Data Commons\n",
"filtered_df = dcp.build_multivariate_dataframe(county_dcids, stat_vars_to_query)\n",
"display(filtered_df)\n"
],
- "execution_count": null,
+ "execution_count": 28,
"outputs": [
{
"output_type": "display_data",
"data": {
+ "text/plain": [
+ " CumulativeCount_MedicalConditionIncident_COVID_19_ConfirmedOrProbableCase \\\n",
+ "place \n",
+ "geoId/06001 284054 \n",
+ "geoId/06003 126 \n",
+ "geoId/06005 9242 \n",
+ "geoId/06007 40181 \n",
+ "geoId/06009 7754 \n",
+ "geoId/06011 4549 \n",
+ "geoId/06013 209958 \n",
+ "geoId/06015 6381 \n",
+ "geoId/06017 30508 \n",
+ "geoId/06019 257611 \n",
+ "geoId/06021 6631 \n",
+ "geoId/06023 20952 \n",
+ "geoId/06025 66715 \n",
+ "geoId/06027 4636 \n",
+ "geoId/06029 244681 \n",
+ "geoId/06031 55429 \n",
+ "geoId/06033 11699 \n",
+ "geoId/06035 10751 \n",
+ "geoId/06037 2908425 \n",
+ "geoId/06039 43685 \n",
+ "geoId/06041 38685 \n",
+ "geoId/06043 3145 \n",
+ "geoId/06045 16568 \n",
+ "geoId/06047 72959 \n",
+ "geoId/06049 1000 \n",
+ "geoId/06051 3144 \n",
+ "geoId/06053 95140 \n",
+ "geoId/06055 27765 \n",
+ "geoId/06057 17503 \n",
+ "geoId/06059 600384 \n",
+ "geoId/06061 71527 \n",
+ "geoId/06063 3379 \n",
+ "geoId/06065 626695 \n",
+ "geoId/06067 314407 \n",
+ "geoId/06069 13636 \n",
+ "geoId/06071 597377 \n",
+ "geoId/06073 824586 \n",
+ "geoId/06075 143959 \n",
+ "geoId/06077 178501 \n",
+ "geoId/06079 57556 \n",
+ "geoId/06081 137238 \n",
+ "geoId/06083 92683 \n",
+ "geoId/06085 342015 \n",
+ "geoId/06087 52532 \n",
+ "geoId/06089 36988 \n",
+ "geoId/06091 324 \n",
+ "geoId/06093 7478 \n",
+ "geoId/06095 89419 \n",
+ "geoId/06097 90191 \n",
+ "geoId/06099 136645 \n",
+ "geoId/06101 23045 \n",
+ "geoId/06103 14753 \n",
+ "geoId/06105 1485 \n",
+ "geoId/06107 136125 \n",
+ "geoId/06109 13758 \n",
+ "geoId/06111 186062 \n",
+ "geoId/06113 41061 \n",
+ "geoId/06115 17944 \n",
+ "\n",
+ " Count_Person Count_Person_MarriedAndNotSeparated \\\n",
+ "place \n",
+ "geoId/06001 1662323 674824.699 \n",
+ "geoId/06003 1119 495.504 \n",
+ "geoId/06005 40083 17272.889 \n",
+ "geoId/06007 212744 76639.140 \n",
+ "geoId/06009 46308 21560.079 \n",
+ "geoId/06011 21558 8416.200 \n",
+ "geoId/06013 1152333 491223.639 \n",
+ "geoId/06015 27968 9762.150 \n",
+ "geoId/06017 192925 91421.044 \n",
+ "geoId/06019 1000918 337513.410 \n",
+ "geoId/06021 28283 12146.960 \n",
+ "geoId/06023 134977 43891.200 \n",
+ "geoId/06025 180267 56884.256 \n",
+ "geoId/06027 18046 7219.959 \n",
+ "geoId/06029 901362 312889.872 \n",
+ "geoId/06031 152692 53927.780 \n",
+ "geoId/06033 64479 24822.130 \n",
+ "geoId/06035 30016 11461.076 \n",
+ "geoId/06037 9943046 3520955.696 \n",
+ "geoId/06039 157761 56900.481 \n",
+ "geoId/06041 257332 113378.795 \n",
+ "geoId/06043 17160 7397.292 \n",
+ "geoId/06045 86061 33108.240 \n",
+ "geoId/06047 279252 94260.678 \n",
+ "geoId/06049 8763 3883.438 \n",
+ "geoId/06051 14534 5487.377 \n",
+ "geoId/06053 430906 162039.867 \n",
+ "geoId/06055 135965 58060.389 \n",
+ "geoId/06057 99606 45714.103 \n",
+ "geoId/06059 3166857 1306956.195 \n",
+ "geoId/06061 402950 185639.202 \n",
+ "geoId/06063 18967 8585.850 \n",
+ "geoId/06065 2489188 936495.840 \n",
+ "geoId/06067 1559146 581828.561 \n",
+ "geoId/06069 64055 25577.667 \n",
+ "geoId/06071 2189183 781869.830 \n",
+ "geoId/06073 3332427 1293264.554 \n",
+ "geoId/06075 866606 306862.042 \n",
+ "geoId/06077 767967 283515.990 \n",
+ "geoId/06079 282249 114640.775 \n",
+ "geoId/06081 758308 335088.173 \n",
+ "geoId/06083 444766 160367.004 \n",
+ "geoId/06085 1907105 834042.321 \n",
+ "geoId/06087 269925 103077.379 \n",
+ "geoId/06089 179027 73045.770 \n",
+ "geoId/06091 2920 1334.942 \n",
+ "geoId/06093 43245 17714.508 \n",
+ "geoId/06095 446935 178080.056 \n",
+ "geoId/06097 489819 201348.202 \n",
+ "geoId/06099 550081 203766.620 \n",
+ "geoId/06101 96385 39530.528 \n",
+ "geoId/06103 64494 26263.636 \n",
+ "geoId/06105 12216 4868.825 \n",
+ "geoId/06107 468680 163314.826 \n",
+ "geoId/06109 54515 23785.476 \n",
+ "geoId/06111 841387 335898.012 \n",
+ "geoId/06113 219728 78920.160 \n",
+ "geoId/06115 80160 29632.050 \n",
+ "\n",
+ " Median_Income_Person Count_Household_With4OrMorePerson \n",
+ "place \n",
+ "geoId/06001 56575 155852 \n",
+ "geoId/06003 35598 103 \n",
+ "geoId/06005 41581 2923 \n",
+ "geoId/06007 33600 16410 \n",
+ "geoId/06009 37043 3075 \n",
+ "geoId/06011 36820 2203 \n",
+ "geoId/06013 54178 118442 \n",
+ "geoId/06015 31929 2309 \n",
+ "geoId/06017 48876 16166 \n",
+ "geoId/06019 33875 107943 \n",
+ "geoId/06021 35120 2500 \n",
+ "geoId/06023 31657 9713 \n",
+ "geoId/06025 24717 18112 \n",
+ "geoId/06027 41594 1043 \n",
+ "geoId/06029 30912 97654 \n",
+ "geoId/06031 34210 16052 \n",
+ "geoId/06033 31565 5140 \n",
+ "geoId/06035 34293 1615 \n",
+ "geoId/06037 38111 967873 \n",
+ "geoId/06039 30316 16535 \n",
+ "geoId/06041 65068 20703 \n",
+ "geoId/06043 36299 1233 \n",
+ "geoId/06045 34707 6681 \n",
+ "geoId/06047 31343 30146 \n",
+ "geoId/06049 31521 465 \n",
+ "geoId/06051 46048 1105 \n",
+ "geoId/06053 37063 48002 \n",
+ "geoId/06055 47432 12018 \n",
+ "geoId/06057 40174 6898 \n",
+ "geoId/06059 46430 321300 \n",
+ "geoId/06061 53071 37963 \n",
+ "geoId/06063 39230 999 \n",
+ "geoId/06065 37929 268129 \n",
+ "geoId/06067 42351 154446 \n",
+ "geoId/06069 44611 7695 \n",
+ "geoId/06071 36178 242009 \n",
+ "geoId/06073 45463 298453 \n",
+ "geoId/06075 69260 58854 \n",
+ "geoId/06077 38674 88055 \n",
+ "geoId/06079 40720 21754 \n",
+ "geoId/06081 63325 68046 \n",
+ "geoId/06083 38787 43491 \n",
+ "geoId/06085 62532 185999 \n",
+ "geoId/06087 43988 23759 \n",
+ "geoId/06089 35503 14201 \n",
+ "geoId/06091 31696 114 \n",
+ "geoId/06093 31315 3024 \n",
+ "geoId/06095 45137 42824 \n",
+ "geoId/06097 48308 42270 \n",
+ "geoId/06099 36126 59134 \n",
+ "geoId/06101 34285 10428 \n",
+ "geoId/06103 33015 5817 \n",
+ "geoId/06105 30470 604 \n",
+ "geoId/06107 31326 56914 \n",
+ "geoId/06109 37688 3488 \n",
+ "geoId/06111 42693 82532 \n",
+ "geoId/06113 40567 21166 \n",
+ "geoId/06115 35459 8439 "
+ ],
"text/html": [
- "\n",
+ "\n",
+ "
\n",
+ "
\n",
"\n",
+ "\n",
+ " \n",
+ "
\n",
+ "\n",
+ "\n",
+ "
\n",
+ "
\n",
+ "\n",
+ "\n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ "\n",
+ "\n",
+ "\n",
+ " \n",
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "\n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ "
\n",
+ "\n",
+ "
\n",
+ "
\n"
],
- "text/plain": [
- " Median_Income_Person ... Count_Person_MarriedAndNotSeparated\n",
- "place ... \n",
- "geoId/06001 43583 ... 669429\n",
- "geoId/06003 30512 ... 432\n",
- "geoId/06005 31160 ... 17756\n",
- "geoId/06007 24067 ... 79755\n",
- "geoId/06009 29124 ... 19985\n",
- "geoId/06011 27103 ... 8858\n",
- "geoId/06013 42181 ... 480862\n",
- "geoId/06015 23493 ... 9727\n",
- "geoId/06017 36682 ... 88764\n",
- "geoId/06019 25238 ... 328577\n",
- "geoId/06021 24329 ... 12286\n",
- "geoId/06023 25114 ... 45201\n",
- "geoId/06025 18245 ... 57548\n",
- "geoId/06027 31566 ... 7116\n",
- "geoId/06029 25013 ... 310852\n",
- "geoId/06031 26841 ... 52666\n",
- "geoId/06033 25172 ... 24020\n",
- "geoId/06035 30814 ... 11778\n",
- "geoId/06037 29985 ... 3538490\n",
- "geoId/06039 23691 ... 58861\n",
- "geoId/06041 52866 ... 114950\n",
- "geoId/06043 27678 ... 8132\n",
- "geoId/06045 26784 ... 32946\n",
- "geoId/06047 24873 ... 93884\n",
- "geoId/06049 25106 ... 3725\n",
- "geoId/06051 30701 ... 4432\n",
- "geoId/06053 28207 ... 156955\n",
- "geoId/06055 37953 ... 58092\n",
- "geoId/06057 32359 ... 44923\n",
- "geoId/06059 36135 ... 1303187\n",
- "geoId/06061 41130 ... 176980\n",
- "geoId/06063 31351 ... 8287\n",
- "geoId/06065 28557 ... 915017\n",
- "geoId/06067 32275 ... 559487\n",
- "geoId/06069 33547 ... 24181\n",
- "geoId/06071 27235 ... 768273\n",
- "geoId/06073 34307 ... 1289394\n",
- "geoId/06075 52677 ... 312207\n",
- "geoId/06077 29535 ... 274992\n",
- "geoId/06079 31938 ... 115104\n",
- "geoId/06081 49128 ... 333988\n",
- "geoId/06083 29657 ... 161813\n",
- "geoId/06085 47584 ... 836748\n",
- "geoId/06087 31174 ... 102414\n",
- "geoId/06089 27000 ... 74306\n",
- "geoId/06091 28138 ... 1302\n",
- "geoId/06093 25576 ... 18281\n",
- "geoId/06095 36273 ... 174436\n",
- "geoId/06097 36623 ... 199770\n",
- "geoId/06099 27840 ... 202150\n",
- "geoId/06101 26334 ... 40060\n",
- "geoId/06103 23061 ... 25736\n",
- "geoId/06105 23901 ... 5342\n",
- "geoId/06107 22994 ... 162001\n",
- "geoId/06109 29458 ... 23563\n",
- "geoId/06111 33814 ... 339806\n",
- "geoId/06113 30331 ... 78786\n",
- "geoId/06115 27877 ... 29484\n",
- "\n",
- "[58 rows x 5 columns]"
- ]
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "dataframe",
+ "variable_name": "filtered_df",
+ "summary": "{\n \"name\": \"filtered_df\",\n \"rows\": 58,\n \"fields\": [\n {\n \"column\": \"place\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 58,\n \"samples\": [\n \"geoId/06001\",\n \"geoId/06011\",\n \"geoId/06069\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"CumulativeCount_MedicalConditionIncident_COVID_19_ConfirmedOrProbableCase\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 406565,\n \"min\": 126,\n \"max\": 2908425,\n \"num_unique_values\": 58,\n \"samples\": [\n 284054,\n 4549,\n 13636\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Count_Person\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1457019,\n \"min\": 1119,\n \"max\": 9943046,\n \"num_unique_values\": 58,\n \"samples\": [\n 1662323,\n 21558,\n 64055\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Count_Person_MarriedAndNotSeparated\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 530767.1947848009,\n \"min\": 495.504,\n \"max\": 3520955.696,\n \"num_unique_values\": 58,\n \"samples\": [\n 674824.699,\n 8416.2,\n 25577.667\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Median_Income_Person\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 9434,\n \"min\": 24717,\n \"max\": 69260,\n \"num_unique_values\": 58,\n \"samples\": [\n 56575,\n 36820,\n 44611\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Count_Household_With4OrMorePerson\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 142918,\n \"min\": 103,\n \"max\": 967873,\n \"num_unique_values\": 58,\n \"samples\": [\n 155852,\n 2203,\n 7695\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"
+ }
},
- "metadata": {
- "tags": []
- }
+ "metadata": {}
}
]
},
@@ -1537,7 +2285,7 @@
"height": 1000
},
"id": "hKKVG1FJCrh1",
- "outputId": "50c9522b-b74c-4e2f-b95e-253003210eb0"
+ "outputId": "9d9f6942-3859-4669-ddf2-5b9c4bb14363"
},
"source": [
"# Produce a dictionary mapping dcids to county names\n",
@@ -1555,15 +2303,263 @@
"county_name_dict = {key:value[0] for key, value in county_name_dict.items()}\n",
"df['County'] = pd.Series(county_name_dict)\n",
"df.set_index('County', inplace=True)\n",
+ "df = df.drop(\"DCID\", axis=1)\n",
"display(df)"
],
- "execution_count": null,
+ "execution_count": 29,
"outputs": [
{
"output_type": "display_data",
"data": {
+ "text/plain": [
+ " CumulativeCount_MedicalConditionIncident_COVID_19_ConfirmedOrProbableCase \\\n",
+ "County \n",
+ "Alameda County 284054 \n",
+ "Alpine County 126 \n",
+ "Amador County 9242 \n",
+ "Butte County 40181 \n",
+ "Calaveras County 7754 \n",
+ "Colusa County 4549 \n",
+ "Contra Costa County 209958 \n",
+ "Del Norte County 6381 \n",
+ "El Dorado County 30508 \n",
+ "Fresno County 257611 \n",
+ "Glenn County 6631 \n",
+ "Humboldt County 20952 \n",
+ "Imperial County 66715 \n",
+ "Inyo County 4636 \n",
+ "Kern County 244681 \n",
+ "Kings County 55429 \n",
+ "Lake County 11699 \n",
+ "Lassen County 10751 \n",
+ "Los Angeles County 2908425 \n",
+ "Madera County 43685 \n",
+ "Marin County 38685 \n",
+ "Mariposa County 3145 \n",
+ "Mendocino County 16568 \n",
+ "Merced County 72959 \n",
+ "Modoc County 1000 \n",
+ "Mono County 3144 \n",
+ "Monterey County 95140 \n",
+ "Napa County 27765 \n",
+ "Nevada County 17503 \n",
+ "Orange County 600384 \n",
+ "Placer County 71527 \n",
+ "Plumas County 3379 \n",
+ "Riverside County 626695 \n",
+ "Sacramento County 314407 \n",
+ "San Benito County 13636 \n",
+ "San Bernardino County 597377 \n",
+ "San Diego County 824586 \n",
+ "San Francisco County 143959 \n",
+ "San Joaquin County 178501 \n",
+ "San Luis Obispo County 57556 \n",
+ "San Mateo County 137238 \n",
+ "Santa Barbara County 92683 \n",
+ "Santa Clara County 342015 \n",
+ "Santa Cruz County 52532 \n",
+ "Shasta County 36988 \n",
+ "Sierra County 324 \n",
+ "Siskiyou County 7478 \n",
+ "Solano County 89419 \n",
+ "Sonoma County 90191 \n",
+ "Stanislaus County 136645 \n",
+ "Sutter County 23045 \n",
+ "Tehama County 14753 \n",
+ "Trinity County 1485 \n",
+ "Tulare County 136125 \n",
+ "Tuolumne County 13758 \n",
+ "Ventura County 186062 \n",
+ "Yolo County 41061 \n",
+ "Yuba County 17944 \n",
+ "\n",
+ " Count_Person Count_Person_MarriedAndNotSeparated \\\n",
+ "County \n",
+ "Alameda County 1662323 674824.699 \n",
+ "Alpine County 1119 495.504 \n",
+ "Amador County 40083 17272.889 \n",
+ "Butte County 212744 76639.140 \n",
+ "Calaveras County 46308 21560.079 \n",
+ "Colusa County 21558 8416.200 \n",
+ "Contra Costa County 1152333 491223.639 \n",
+ "Del Norte County 27968 9762.150 \n",
+ "El Dorado County 192925 91421.044 \n",
+ "Fresno County 1000918 337513.410 \n",
+ "Glenn County 28283 12146.960 \n",
+ "Humboldt County 134977 43891.200 \n",
+ "Imperial County 180267 56884.256 \n",
+ "Inyo County 18046 7219.959 \n",
+ "Kern County 901362 312889.872 \n",
+ "Kings County 152692 53927.780 \n",
+ "Lake County 64479 24822.130 \n",
+ "Lassen County 30016 11461.076 \n",
+ "Los Angeles County 9943046 3520955.696 \n",
+ "Madera County 157761 56900.481 \n",
+ "Marin County 257332 113378.795 \n",
+ "Mariposa County 17160 7397.292 \n",
+ "Mendocino County 86061 33108.240 \n",
+ "Merced County 279252 94260.678 \n",
+ "Modoc County 8763 3883.438 \n",
+ "Mono County 14534 5487.377 \n",
+ "Monterey County 430906 162039.867 \n",
+ "Napa County 135965 58060.389 \n",
+ "Nevada County 99606 45714.103 \n",
+ "Orange County 3166857 1306956.195 \n",
+ "Placer County 402950 185639.202 \n",
+ "Plumas County 18967 8585.850 \n",
+ "Riverside County 2489188 936495.840 \n",
+ "Sacramento County 1559146 581828.561 \n",
+ "San Benito County 64055 25577.667 \n",
+ "San Bernardino County 2189183 781869.830 \n",
+ "San Diego County 3332427 1293264.554 \n",
+ "San Francisco County 866606 306862.042 \n",
+ "San Joaquin County 767967 283515.990 \n",
+ "San Luis Obispo County 282249 114640.775 \n",
+ "San Mateo County 758308 335088.173 \n",
+ "Santa Barbara County 444766 160367.004 \n",
+ "Santa Clara County 1907105 834042.321 \n",
+ "Santa Cruz County 269925 103077.379 \n",
+ "Shasta County 179027 73045.770 \n",
+ "Sierra County 2920 1334.942 \n",
+ "Siskiyou County 43245 17714.508 \n",
+ "Solano County 446935 178080.056 \n",
+ "Sonoma County 489819 201348.202 \n",
+ "Stanislaus County 550081 203766.620 \n",
+ "Sutter County 96385 39530.528 \n",
+ "Tehama County 64494 26263.636 \n",
+ "Trinity County 12216 4868.825 \n",
+ "Tulare County 468680 163314.826 \n",
+ "Tuolumne County 54515 23785.476 \n",
+ "Ventura County 841387 335898.012 \n",
+ "Yolo County 219728 78920.160 \n",
+ "Yuba County 80160 29632.050 \n",
+ "\n",
+ " Median_Income_Person \\\n",
+ "County \n",
+ "Alameda County 56575 \n",
+ "Alpine County 35598 \n",
+ "Amador County 41581 \n",
+ "Butte County 33600 \n",
+ "Calaveras County 37043 \n",
+ "Colusa County 36820 \n",
+ "Contra Costa County 54178 \n",
+ "Del Norte County 31929 \n",
+ "El Dorado County 48876 \n",
+ "Fresno County 33875 \n",
+ "Glenn County 35120 \n",
+ "Humboldt County 31657 \n",
+ "Imperial County 24717 \n",
+ "Inyo County 41594 \n",
+ "Kern County 30912 \n",
+ "Kings County 34210 \n",
+ "Lake County 31565 \n",
+ "Lassen County 34293 \n",
+ "Los Angeles County 38111 \n",
+ "Madera County 30316 \n",
+ "Marin County 65068 \n",
+ "Mariposa County 36299 \n",
+ "Mendocino County 34707 \n",
+ "Merced County 31343 \n",
+ "Modoc County 31521 \n",
+ "Mono County 46048 \n",
+ "Monterey County 37063 \n",
+ "Napa County 47432 \n",
+ "Nevada County 40174 \n",
+ "Orange County 46430 \n",
+ "Placer County 53071 \n",
+ "Plumas County 39230 \n",
+ "Riverside County 37929 \n",
+ "Sacramento County 42351 \n",
+ "San Benito County 44611 \n",
+ "San Bernardino County 36178 \n",
+ "San Diego County 45463 \n",
+ "San Francisco County 69260 \n",
+ "San Joaquin County 38674 \n",
+ "San Luis Obispo County 40720 \n",
+ "San Mateo County 63325 \n",
+ "Santa Barbara County 38787 \n",
+ "Santa Clara County 62532 \n",
+ "Santa Cruz County 43988 \n",
+ "Shasta County 35503 \n",
+ "Sierra County 31696 \n",
+ "Siskiyou County 31315 \n",
+ "Solano County 45137 \n",
+ "Sonoma County 48308 \n",
+ "Stanislaus County 36126 \n",
+ "Sutter County 34285 \n",
+ "Tehama County 33015 \n",
+ "Trinity County 30470 \n",
+ "Tulare County 31326 \n",
+ "Tuolumne County 37688 \n",
+ "Ventura County 42693 \n",
+ "Yolo County 40567 \n",
+ "Yuba County 35459 \n",
+ "\n",
+ " Count_Household_With4OrMorePerson \n",
+ "County \n",
+ "Alameda County 155852 \n",
+ "Alpine County 103 \n",
+ "Amador County 2923 \n",
+ "Butte County 16410 \n",
+ "Calaveras County 3075 \n",
+ "Colusa County 2203 \n",
+ "Contra Costa County 118442 \n",
+ "Del Norte County 2309 \n",
+ "El Dorado County 16166 \n",
+ "Fresno County 107943 \n",
+ "Glenn County 2500 \n",
+ "Humboldt County 9713 \n",
+ "Imperial County 18112 \n",
+ "Inyo County 1043 \n",
+ "Kern County 97654 \n",
+ "Kings County 16052 \n",
+ "Lake County 5140 \n",
+ "Lassen County 1615 \n",
+ "Los Angeles County 967873 \n",
+ "Madera County 16535 \n",
+ "Marin County 20703 \n",
+ "Mariposa County 1233 \n",
+ "Mendocino County 6681 \n",
+ "Merced County 30146 \n",
+ "Modoc County 465 \n",
+ "Mono County 1105 \n",
+ "Monterey County 48002 \n",
+ "Napa County 12018 \n",
+ "Nevada County 6898 \n",
+ "Orange County 321300 \n",
+ "Placer County 37963 \n",
+ "Plumas County 999 \n",
+ "Riverside County 268129 \n",
+ "Sacramento County 154446 \n",
+ "San Benito County 7695 \n",
+ "San Bernardino County 242009 \n",
+ "San Diego County 298453 \n",
+ "San Francisco County 58854 \n",
+ "San Joaquin County 88055 \n",
+ "San Luis Obispo County 21754 \n",
+ "San Mateo County 68046 \n",
+ "Santa Barbara County 43491 \n",
+ "Santa Clara County 185999 \n",
+ "Santa Cruz County 23759 \n",
+ "Shasta County 14201 \n",
+ "Sierra County 114 \n",
+ "Siskiyou County 3024 \n",
+ "Solano County 42824 \n",
+ "Sonoma County 42270 \n",
+ "Stanislaus County 59134 \n",
+ "Sutter County 10428 \n",
+ "Tehama County 5817 \n",
+ "Trinity County 604 \n",
+ "Tulare County 56914 \n",
+ "Tuolumne County 3488 \n",
+ "Ventura County 82532 \n",
+ "Yolo County 21166 \n",
+ "Yuba County 8439 "
+ ],
"text/html": [
- "\n",
+ "\n",
+ "
\n",
+ "
\n",
"\n",
+ "\n",
+ " \n",
+ "
\n",
+ "\n",
+ "\n",
+ "
\n",
+ "
\n",
+ "\n",
+ "\n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ "\n",
+ "\n",
+ "\n",
+ " \n",
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "\n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ "
\n",
+ "\n",
+ "
\n",
+ "
\n"
],
- "text/plain": [
- " Median_Income_Person ... DCID\n",
- "County ... \n",
- "Alameda County 43583 ... geoId/06001\n",
- "Alpine County 30512 ... geoId/06003\n",
- "Amador County 31160 ... geoId/06005\n",
- "Butte County 24067 ... geoId/06007\n",
- "Calaveras County 29124 ... geoId/06009\n",
- "Colusa County 27103 ... geoId/06011\n",
- "Contra Costa County 42181 ... geoId/06013\n",
- "Del Norte County 23493 ... geoId/06015\n",
- "El Dorado County 36682 ... geoId/06017\n",
- "Fresno County 25238 ... geoId/06019\n",
- "Glenn County 24329 ... geoId/06021\n",
- "Humboldt County 25114 ... geoId/06023\n",
- "Imperial County 18245 ... geoId/06025\n",
- "Inyo County 31566 ... geoId/06027\n",
- "Kern County 25013 ... geoId/06029\n",
- "Kings County 26841 ... geoId/06031\n",
- "Lake County 25172 ... geoId/06033\n",
- "Lassen County 30814 ... geoId/06035\n",
- "Los Angeles County 29985 ... geoId/06037\n",
- "Madera County 23691 ... geoId/06039\n",
- "Marin County 52866 ... geoId/06041\n",
- "Mariposa County 27678 ... geoId/06043\n",
- "Mendocino County 26784 ... geoId/06045\n",
- "Merced County 24873 ... geoId/06047\n",
- "Modoc County 25106 ... geoId/06049\n",
- "Mono County 30701 ... geoId/06051\n",
- "Monterey County 28207 ... geoId/06053\n",
- "Napa County 37953 ... geoId/06055\n",
- "Nevada County 32359 ... geoId/06057\n",
- "Orange County 36135 ... geoId/06059\n",
- "Placer County 41130 ... geoId/06061\n",
- "Plumas County 31351 ... geoId/06063\n",
- "Riverside County 28557 ... geoId/06065\n",
- "Sacramento County 32275 ... geoId/06067\n",
- "San Benito County 33547 ... geoId/06069\n",
- "San Bernardino County 27235 ... geoId/06071\n",
- "San Diego County 34307 ... geoId/06073\n",
- "San Francisco County 52677 ... geoId/06075\n",
- "San Joaquin County 29535 ... geoId/06077\n",
- "San Luis Obispo County 31938 ... geoId/06079\n",
- "San Mateo County 49128 ... geoId/06081\n",
- "Santa Barbara County 29657 ... geoId/06083\n",
- "Santa Clara County 47584 ... geoId/06085\n",
- "Santa Cruz County 31174 ... geoId/06087\n",
- "Shasta County 27000 ... geoId/06089\n",
- "Sierra County 28138 ... geoId/06091\n",
- "Siskiyou County 25576 ... geoId/06093\n",
- "Solano County 36273 ... geoId/06095\n",
- "Sonoma County 36623 ... geoId/06097\n",
- "Stanislaus County 27840 ... geoId/06099\n",
- "Sutter County 26334 ... geoId/06101\n",
- "Tehama County 23061 ... geoId/06103\n",
- "Trinity County 23901 ... geoId/06105\n",
- "Tulare County 22994 ... geoId/06107\n",
- "Tuolumne County 29458 ... geoId/06109\n",
- "Ventura County 33814 ... geoId/06111\n",
- "Yolo County 30331 ... geoId/06113\n",
- "Yuba County 27877 ... geoId/06115\n",
- "\n",
- "[58 rows x 6 columns]"
- ]
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "dataframe",
+ "variable_name": "df",
+ "summary": "{\n \"name\": \"df\",\n \"rows\": 58,\n \"fields\": [\n {\n \"column\": \"County\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 58,\n \"samples\": [\n \"Alameda County\",\n \"Colusa County\",\n \"San Benito County\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"CumulativeCount_MedicalConditionIncident_COVID_19_ConfirmedOrProbableCase\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 406565,\n \"min\": 126,\n \"max\": 2908425,\n \"num_unique_values\": 58,\n \"samples\": [\n 284054,\n 4549,\n 13636\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Count_Person\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1457019,\n \"min\": 1119,\n \"max\": 9943046,\n \"num_unique_values\": 58,\n \"samples\": [\n 1662323,\n 21558,\n 64055\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Count_Person_MarriedAndNotSeparated\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 530767.1947848009,\n \"min\": 495.504,\n \"max\": 3520955.696,\n \"num_unique_values\": 58,\n \"samples\": [\n 674824.699,\n 8416.2,\n 25577.667\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Median_Income_Person\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 9434,\n \"min\": 24717,\n \"max\": 69260,\n \"num_unique_values\": 58,\n \"samples\": [\n 56575,\n 36820,\n 44611\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Count_Household_With4OrMorePerson\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 142918,\n \"min\": 103,\n \"max\": 967873,\n \"num_unique_values\": 58,\n \"samples\": [\n 155852,\n 2203,\n 7695\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"
+ }
},
- "metadata": {
- "tags": []
- }
+ "metadata": {}
}
]
},
@@ -2221,9 +3360,9 @@
"id": "_EbvFd0EY7wS",
"colab": {
"base_uri": "https://localhost:8080/",
- "height": 297
+ "height": 320
},
- "outputId": "1a664747-9f2f-479c-f35f-a57917ae9265"
+ "outputId": "9a1821ba-be28-4d85-82b4-60ff3686ef99"
},
"source": [
"# Use this space to create scatter plots, histograms, etc.\n",
@@ -2236,13 +3375,46 @@
"# Get some basic statistics\n",
"display(df.describe())\n"
],
- "execution_count": null,
+ "execution_count": 34,
"outputs": [
{
"output_type": "display_data",
"data": {
+ "text/plain": [
+ " CumulativeCount_MedicalConditionIncident_COVID_19_ConfirmedOrProbableCase \\\n",
+ "count 5.800000e+01 \n",
+ "mean 1.612350e+05 \n",
+ "std 4.065660e+05 \n",
+ "min 1.260000e+02 \n",
+ "25% 1.098800e+04 \n",
+ "50% 4.062100e+04 \n",
+ "75% 1.370898e+05 \n",
+ "max 2.908425e+06 \n",
+ "\n",
+ " Count_Person Count_Person_MarriedAndNotSeparated \\\n",
+ "count 5.800000e+01 5.800000e+01 \n",
+ "mean 6.787600e+05 2.584289e+05 \n",
+ "std 1.457020e+06 5.307672e+05 \n",
+ "min 1.119000e+03 4.955040e+02 \n",
+ "25% 4.835975e+04 2.211643e+04 \n",
+ "50% 1.865960e+05 7.484246e+04 \n",
+ "75% 7.062512e+05 2.635786e+05 \n",
+ "max 9.943046e+06 3.520956e+06 \n",
+ "\n",
+ " Median_Income_Person Count_Household_With4OrMorePerson \n",
+ "count 58.000000 58.000000 \n",
+ "mean 40144.172414 66565.879310 \n",
+ "std 9434.683907 142918.556168 \n",
+ "min 24717.000000 103.000000 \n",
+ "25% 33958.750000 3178.250000 \n",
+ "50% 37375.500000 16472.500000 \n",
+ "75% 44455.250000 59064.000000 \n",
+ "max 69260.000000 967873.000000 "
+ ],
"text/html": [
- "\n",
+ "\n",
+ "
\n",
+ "
\n",
"\n",
+ "\n",
+ " \n",
+ "
\n",
+ "\n",
+ "\n",
+ "
\n",
+ "
\n",
+ "\n",
+ "\n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ "\n",
+ "\n",
+ "\n",
+ " \n",
+ "
\n",
+ "\n",
+ "
\n",
+ "
\n"
],
- "text/plain": [
- " Median_Income_Person ... Count_Person_MarriedAndNotSeparated\n",
- "count 58.000000 ... 5.800000e+01\n",
- "mean 30963.620690 ... 2.568637e+05\n",
- "std 7354.083205 ... 5.313952e+05\n",
- "min 18245.000000 ... 4.320000e+02\n",
- "25% 25765.500000 ... 2.087950e+04\n",
- "50% 29496.500000 ... 7.654600e+04\n",
- "75% 33250.000000 ... 2.567815e+05\n",
- "max 52866.000000 ... 3.538490e+06\n",
- "\n",
- "[8 rows x 5 columns]"
- ]
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "dataframe",
+ "summary": "{\n \"name\": \"display(df\",\n \"rows\": 8,\n \"fields\": [\n {\n \"column\": \"CumulativeCount_MedicalConditionIncident_COVID_19_ConfirmedOrProbableCase\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 999404.4848324301,\n \"min\": 58.0,\n \"max\": 2908425.0,\n \"num_unique_values\": 8,\n \"samples\": [\n 161235.0,\n 40621.0,\n 58.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Count_Person\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 3397262.4772673785,\n \"min\": 58.0,\n \"max\": 9943046.0,\n \"num_unique_values\": 8,\n \"samples\": [\n 678759.9655172414,\n 186596.0,\n 58.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Count_Person_MarriedAndNotSeparated\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1200810.9846878175,\n \"min\": 58.0,\n \"max\": 3520955.696,\n \"num_unique_values\": 8,\n \"samples\": [\n 258428.85063793106,\n 74842.455,\n 58.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Median_Income_Person\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 21459.53064249957,\n \"min\": 58.0,\n \"max\": 69260.0,\n \"num_unique_values\": 8,\n \"samples\": [\n 40144.1724137931,\n 37375.5,\n 58.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Count_Household_With4OrMorePerson\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 331261.5845253443,\n \"min\": 58.0,\n \"max\": 967873.0,\n \"num_unique_values\": 8,\n \"samples\": [\n 66565.87931034483,\n 16472.5,\n 58.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"
+ }
},
- "metadata": {
- "tags": []
- }
+ "metadata": {}
}
]
},
@@ -2361,40 +3731,35 @@
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
- "height": 282
+ "height": 447
},
"id": "GBJ4rOt73pI1",
- "outputId": "29fef41d-3789-4b3b-df19-65a416207cad"
+ "outputId": "0d0b93e4-fd8d-4bbe-cdd9-c364ac30b476"
},
"source": [
"# Plot histograms to get an idea of data spread\n",
- "display(features_df[\"Median_Income_Person\"].plot.hist(bins=100))\n"
+ "display(df[\"Median_Income_Person\"].plot.hist(bins=100))\n"
],
- "execution_count": null,
+ "execution_count": 35,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": [
- ""
+ ""
]
},
- "metadata": {
- "tags": []
- }
+ "metadata": {}
},
{
"output_type": "display_data",
"data": {
- "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYIAAAD4CAYAAADhNOGaAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAVOklEQVR4nO3df/BddX3n8efLEAEriprvaoYkfrEyumgB8Svq2u6yuI4RKHRX3MVZXXS12VGY6rY7FewOKjOd0XZaKOJKU2EFtAVE24n8mBZX2ursEAwYfkuJmC5BtgRQkGphg+/9457A5XK/39zvN99z7zc5z8fMnZzzOZ9zzvue3NxXzo97TqoKSVJ3PWfSBUiSJssgkKSOMwgkqeMMAknqOINAkjpun0kXMF8rVqyo6enpSZchSXuUG2+88cGqmho2bY8LgunpaTZt2jTpMiRpj5Lk72eb5qEhSeo4g0CSOs4gkKSOMwgkqeMMAknqOINAkjqu9SBIsizJd5NcOWTavkkuS7IlycYk023XI0l6pnHsEXwEuHOWaR8AflRVrwTOBj4zhnokSX1aDYIkq4DjgC/M0uVE4KJm+ArgrUnSZk2SpGdq+5fF5wC/DRwwy/SDgHsBqmpHkkeAlwAP9ndKsg5YB7BmzZrWit1bTJ9+1VPDWz993AQrkbQnaG2PIMnxwANVdePuLquq1lfVTFXNTE0NvVWGJGmB2jw09BbghCRbgUuBY5J8aaDPfcBqgCT7AC8EHmqxJknSgNaCoKrOqKpVVTUNnAx8s6reM9BtA3BKM3xS08eHKEvSGI397qNJzgI2VdUG4ALgkiRbgIfpBYYkaYzGEgRV9dfAXzfDZ/a1/xPwrnHUIEkazl8WS1LHGQSS1HEGgSR1nEEgSR1nEEhSxxkEktRxBoEkdZxBIEkdZxBIUscZBJLUcQaBJHWcQSBJHWcQSFLHGQSS1HEGgSR1nEEgSR3X5sPr90tyQ5Kbk9ye5FND+rwvyfYkm5vXB9uqR5I0XJtPKHscOKaqHkuyHPh2kmuq6vqBfpdV1Wkt1iFJmkNrQdA8hP6xZnR58/LB9JK0xLR6jiDJsiSbgQeAa6tq45Bu70xyS5Irkqxusx5J0rO1GgRV9WRVHQGsAo5K8tqBLl8HpqvqMOBa4KJhy0myLsmmJJu2b9/eZsmS1DljuWqoqn4MXAesHWh/qKoeb0a/ALx+lvnXV9VMVc1MTU21W6wkdUybVw1NJTmwGd4feBvwvYE+K/tGTwDubKseSdJwbV41tBK4KMkyeoFzeVVdmeQsYFNVbQB+I8kJwA7gYeB9LdYjSRqizauGbgFeN6T9zL7hM4Az2qpBkrRr/rJYkjrOIJCkjjMIJKnjDAJJ6jiDQJI6ziCQpI4zCCSp4wwCSeo4g0CSOs4gkKSOMwgkqeMMAknqOINAkjrOIJCkjjMIJKnjDAJJ6jiDQJI6rs1nFu+X5IYkNye5PcmnhvTZN8llSbYk2Zhkuq16JEnDtblH8DhwTFUdDhwBrE3ypoE+HwB+VFWvBM4GPtNiPZKkIVoLgup5rBld3rxqoNuJwEXN8BXAW5OkrZokSc/W2sPrAZIsA24EXgl8rqo2DnQ5CLgXoKp2JHkEeAnw4MBy1gHrANasWdNmyRowffpVTw1v/fRxu2zfnWW2Pa+k4Vo9WVxVT1bVEcAq4Kgkr13gctZX1UxVzUxNTS1ukZLUcWO5aqiqfgxcB6wdmHQfsBogyT7AC4GHxlGTJKmnzauGppIc2AzvD7wN+N5Atw3AKc3wScA3q2rwPIIkqUVtniNYCVzUnCd4DnB5VV2Z5CxgU1VtAC4ALkmyBXgYOLnFeiRJQ7QWBFV1C/C6Ie1n9g3/E/CutmqQJO2avyyWpI4zCCSp4wwCSeo4g0CSOs4gkKSOMwgkqeMMAknqOINAkjrOIJCkjjMIJKnjDAJJ6jiDQJI6ziCQpI4zCCSp4wwCSeo4g0CSOs4gkKSOa/OZxauTXJfkjiS3J/nIkD5HJ3kkyebmdeawZUmS2jPSoyqT/FJV3TrPZe8AfquqbkpyAHBjkmur6o6Bft+qquPnuWxJ0iIZdY/gfyS5IcmHk7xwlBmq6v6quqkZ/glwJ3DQAuuUJLVkpCCoql8B/iOwmt7/7P80ydtGXUmSaXoPst84ZPKbk9yc5Jokr5ll/nVJNiXZtH379lFXK0kawcjnCKrqbuC/Ax8D/hVwbpLvJfl3c82X5PnAV4GPVtWjA5NvAl5eVYcDnwX+YpZ1r6+qmaqamZqaGrVkSdIIRgqCJIclOZve4Z1jgF+tqn/eDJ89x3zL6YXAl6vqa4PTq+rRqnqsGb4aWJ5kxfzfhiRpoUbdI/gsvf+9H15Vp/Yd+/8hvb2EZ0kS4ALgzqr6w1n6vKzpR5Kjmnoemt9bkCTtjpGuGgKOA35WVU8CJHkOsF9V/bSqLpllnrcA7wVuTbK5afs4sAagqs4HTgI+lGQH8DPg5Kqqhb0VSdJCjBoE3wD+DfBYM/484K+AfzHbDFX1bSBzLbSqzgPOG7EGSVILRj00tN/OY/kAzfDz2ilJkjROowbBPyY5cudIktfTO5QjSdrDjXpo6KPAV5L8kN7hnpcB/6G1qiRJYzNSEFTVd5K8GnhV03RXVf2/9sqSJI3LqHsEAG8Appt5jkxCVV3cSlWSpLEZ9aZzlwC/CGwGnmyaCzAIJGkPN+oewQxwqNf4S9LeZ9Srhm6jd4JYkrSXGXWPYAVwR5IbgMd3NlbVCa1UJUkam1GD4JNtFiFJmpxRLx/9myQvBw6pqm8keR6wrN3SJEnjMOptqH8duAL446bpIGZ5doAkac8y6sniU+ndTfRReOohNf+sraIkSeMzahA8XlVP7BxJsg+93xFIkvZwowbB3yT5OLB/86zirwBfb68sSdK4jBoEpwPbgVuB/wJczSxPJpMk7VlGvWro58CfNC9J0l5k1KuGfpDknsHXLuZZneS6JHckuT3JR4b0SZJzk2xJckv/Mw8kSeMxn3sN7bQf8C7gxbuYZwfwW1V1U5IDgBuTXFtVd/T1eQdwSPN6I/D55k9J0piMtEdQVQ/1ve6rqnPoPdB+rnnur6qbmuGfAHfS+/1BvxOBi6vneuDAJCvn/zYkSQs16m2o+w/ZPIfeHsLIzzJIMg28Dtg4MOkg4N6+8W1N2/0D868D1gGsWbNm1NXucaZPv+qp4a2fPm5oe7/+PvNd5ij9RzXfdbS9zN15z4tV/1Jan/Zs4/i8jPpl/gd9wzuArcC/H2XGJM8Hvgp8tKoenVd1japaD6wHmJmZ8fcLkrSIRr1q6F8vZOFJltMLgS9X1deGdLkPWN03vqppkySNyaiHhn5zrulV9YdD5glwAXDnsOmNDcBpSS6ld5L4kaq6f5a+kqQWzOeqoTfQ++IG+FXgBuDuOeZ5C/Be4NYkm5u2jwNrAKrqfHo/TDsW2AL8FHj/fIqXJO2+UYNgFXBkc/UPST4JXFVV75lthqr6NpC5Fto8+vLUEWuQJLVg1FtMvBR4om/8iaZNkrSHG3WP4GLghiR/3oz/GnBROyVJksZp1KuGfjfJNcCvNE3vr6rvtleWJGlcRj00BPA84NGq+iNgW5KDW6pJkjRGo9507hPAx4AzmqblwJfaKkqSND6j7hH8W+AE4B8BquqHwAFtFSVJGp9Rg+CJ5lLPAkjyC+2VJEkap1GD4PIkf0zv7qC/DnwDH1IjSXuFXV411Nwq4jLg1cCjwKuAM6vq2pZrkySNwS6DoKoqydVV9UuAX/6StJcZ9dDQTUne0GolkqSJGPWXxW8E3pNkK70rh0JvZ+GwtgqTJI3HnEGQZE1V/R/g7WOqR5I0ZrvaI/gLencd/fskX62qd46jKEnS+OzqHEH/baRf0WYhkqTJ2FUQ1CzDkqS9xK4ODR2e5FF6ewb7N8Pw9MniF7RanSSpdXPuEVTVsqp6QVUdUFX7NMM7x+cMgSQXJnkgyW2zTD86ySNJNjevM3fnjUiSFmbUy0cX4ovAefQeajObb1XV8S3WIEnahfk8j2BequpvgYfbWr4kaXG0FgQjenOSm5Nck+Q1s3VKsi7JpiSbtm/fPs76JGmvN8kguAl4eVUdDnyW3m8Whqqq9VU1U1UzU1NTYytQkrpgYkFQVY9W1WPN8NXA8iQrJlWPJHXVxIIgycuaW1yT5KimlocmVY8kdVVrVw0l+TPgaGBFkm3AJ+g965iqOh84CfhQkh3Az4CTm6egSZLGqLUgqKp372L6efQuL5UkTdCkrxqSJE2YQSBJHWcQSFLHGQSS1HEGgSR1nEEgSR1nEEhSxxkEktRxBoEkdZxBIEkdZxBIUscZBJLUcQaBJHWcQSBJHWcQSFLHGQSS1HEGgSR1XGtBkOTCJA8kuW2W6UlybpItSW5JcmRbtUiSZtfmHsEXgbVzTH8HcEjzWgd8vsVaJEmzaC0IqupvgYfn6HIicHH1XA8cmGRlW/VIkoZr7eH1IzgIuLdvfFvTdv9gxyTr6O01sGbNmgWvcPr0q54a3vrp4xa8nKVivu9ntv797W3VNNs6Rln37sy7O3b387I79U3qszrKehdS21L4t7c7NSyF+tu0R5wsrqr1VTVTVTNTU1OTLkeS9iqTDIL7gNV946uaNknSGE0yCDYA/6m5euhNwCNV9azDQpKkdrV2jiDJnwFHAyuSbAM+ASwHqKrzgauBY4EtwE+B97dViyRpdq0FQVW9exfTCzi1rfVLkkazR5wsliS1xyCQpI4zCCSp4wwCSeo4g0CSOs4gkKSOMwgkqeMMAknqOINAkjrOIJCkjjMIJKnjDAJJ6jiDQJI6ziCQpI4zCCSp4wwCSeo4g0CSOq7VIEiyNsldSbYkOX3I9Pcl2Z5kc/P6YJv1SJKerc1nFi8DPge8DdgGfCfJhqq6Y6DrZVV1Wlt1SJLm1uYewVHAlqq6p6qeAC4FTmxxfZKkBWgzCA4C7u0b39a0DXpnkluSXJFk9bAFJVmXZFOSTdu3b2+jVknqrEmfLP46MF1VhwHXAhcN61RV66tqpqpmpqamxlqgJO3t2gyC+4D+/+GvatqeUlUPVdXjzegXgNe3WI8kaYg2g+A7wCFJDk7yXOBkYEN/hyQr+0ZPAO5ssR5J0hCtXTVUVTuSnAb8JbAMuLCqbk9yFrCpqjYAv5HkBGAH8DDwvrbqkSQN11oQAFTV1cDVA21n9g2fAZzRZg2SpLlN+mSxJGnCDAJJ6jiDQJI6ziCQpI4zCCSp4wwCSeo4g0CSOs4gkKSOMwgkqeMMAknqOINAkjrOIJCkjjMIJKnjDAJJ6jiDQJI6ziCQpI4zCCSp41oNgiRrk9yVZEuS04dM3zfJZc30jUmm26xHkvRsrQVBkmXA54B3AIcC705y6EC3DwA/qqpXAmcDn2mrHknScG3uERwFbKmqe6rqCeBS4MSBPicCFzXDVwBvTZIWa5IkDUhVtbPg5CRgbVV9sBl/L/DGqjqtr89tTZ9tzfj3mz4PDixrHbCuGX0VcFcrRe/aCuDBXfZaOqy3XdbbLutdXC+vqqlhE/YZdyULUVXrgfWTriPJpqqamXQdo7Ledllvu6x3fNo8NHQfsLpvfFXTNrRPkn2AFwIPtViTJGlAm0HwHeCQJAcneS5wMrBhoM8G4JRm+CTgm9XWsSpJ0lCtHRqqqh1JTgP+ElgGXFhVtyc5C9hUVRuAC4BLkmwBHqYXFkvZxA9PzZP1tst622W9Y9LayWJJ0p7BXxZLUscZBJLUcZ0LgiSrk1yX5I4ktyf5SNP+4iTXJrm7+fNFTXuSnNvcBuOWJEf2LeuUpv/dSU7pa399klubec7dnR/JzVHvJ5Pcl2Rz8zq2b54zmnXfleTtfe1Db/nRnNDf2LRf1pzcX2i9+yW5IcnNTb2fmmsdc91mZL7vY5Hr/WKSH/Rt3yOa9ol+HprlLUvy3SRXNuNLctvOUe+S3bbNMrc2y9ycZFPTtiS/HxZNVXXqBawEjmyGDwD+jt4tMH4POL1pPx34TDN8LHANEOBNwMam/cXAPc2fL2qGX9RMu6Hpm2bed7RQ7yeB/zak/6HAzcC+wMHA9+mdrF/WDL8CeG7T59BmnsuBk5vh84EP7Ua9AZ7fDC8HNjbbYug6gA8D5zfDJwOXLfR9LHK9XwROGtJ/op+HZnm/CfwpcOVcf3+T3rZz1Ltkt22zzK3AioG2Jfn9sFivzu0RVNX9VXVTM/wT4E7gIJ55u4uLgF9rhk8ELq6e64EDk6wE3g5cW1UPV9WPgGuBtc20F1TV9dX7W7+4b1mLWe9sTgQurarHq+oHwBZ6t/sYesuP5n8jx9C7xcfge19IvVVVjzWjy5tXzbGO2W4zMq/30UK9s5no5yHJKuA44AvN+Fx/fxPdtsPq3YWJbtsRalty3w+LpXNB0K/ZVX4dvf8FvrSq7m8m/V/gpc3wQcC9fbNta9rmat82pH2x6wU4rdkdvXDnruoC6n0J8OOq2rFY9TaHAjYDD9D7B/D9OdbxVF3N9Eeamub7Phat3qrauX1/t9m+ZyfZd7DeEeta7M/DOcBvAz9vxuf6+5v4th1S705LcdvuVMBfJbkxvdvbwB7w/bA7OhsESZ4PfBX4aFU92j+tSeoldV3tkHo/D/wicARwP/AHEyzvGarqyao6gt6vyY8CXj3hkuY0WG+S1wJn0Kv7DfR27z82wRIBSHI88EBV3TjpWkYxR71LbtsO+OWqOpLenZNPTfIv+ycuxe+H3dXJIEiynN6X6per6mtN8z80u200fz7QtM92q4y52lcNaV/UeqvqH5ovsJ8Df0LvC3ch9T5Eb3d2n4H23VZVPwauA948xzpmu83IfN/HYta7tjkkV1X1OPA/Wfj2XczPw1uAE5JspXfY5hjgj1i62/ZZ9Sb50hLdtk+pqvuaPx8A/rypb8l+PyyKxTrZsKe86J2guRg4Z6D993nmyaDfa4aP45kng26op08G/YDeiaAXNcMvruEng45tod6VfcP/ld4xX4DX8MwTgffQOwm4TzN8ME+fCHxNM89XeObJxg/vRr1TwIHN8P7At4DjZ1sHcCrPPKF5+ULfxyLXu7Jv+58DfHopfB766j6ap0++LsltO0e9S3bbAr8AHNA3/L+BtSzR74fFek105RN5w/DL9HbrbgE2N69j6R07/V/A3cA3+v7SQu8BO98HbgVm+pb1n+mdaNsCvL+vfQa4rZnnPJpfcC9yvZc09dxC755N/cHwO82676LvioRmvr9rpv1OX/srmg/nFnpfKvvuRr2HAd9t6roNOHOudQD7NeNbmumvWOj7WOR6v9ls39uAL/H0lUUT/Tz0LfNonv5iXZLbdo56l+y2bbblzc3r9p3bgCX6/bBYL28xIUkd18lzBJKkpxkEktRxBoEkdZxBIEkdZxBIUscZBJLUcQaBJHXc/wd4isbjGkZ7xAAAAABJRU5ErkJggg==\n",
"text/plain": [
- ""
- ]
+ ""
+ ],
+ "image/png": "\n"
},
- "metadata": {
- "tags": [],
- "needs_background": "light"
- }
+ "metadata": {}
}
]
},
@@ -2403,39 +3768,34 @@
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
- "height": 295
+ "height": 465
},
"id": "3j9THBI03tQA",
- "outputId": "a5523ad7-1cb6-44c6-c769-657c910cf9ca"
+ "outputId": "fc71c64f-f410-4cc8-c500-35f19a91257e"
},
"source": [
"display(df[\"Count_Household_With4OrMorePerson\"].plot.hist(bins=100))\n"
],
- "execution_count": null,
+ "execution_count": 36,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": [
- ""
+ ""
]
},
- "metadata": {
- "tags": []
- }
+ "metadata": {}
},
{
"output_type": "display_data",
"data": {
- "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX8AAAEFCAYAAAAL/efAAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAPnklEQVR4nO3de4xmdX3H8fcHVkUUK3YXSlQcMSu60YrreGmtitJahAhaL5VUi5a6xqKp1TZurSmkTROaRm1ttLpUKlq13qpuA0qRUmkbURYvuGAtqItdQHe9gbeK4Ld/PGdhgnM5M8x5np3n934lkzm355zvb2f3s2d+55zfSVUhSWrLAZMuQJI0foa/JDXI8JekBhn+ktQgw1+SGrRu0gX0sX79+pqZmZl0GZK0plx++eXfrKoN861bE+E/MzPDjh07Jl2GJK0pSa5daJ3dPpLUIMNfkhpk+EtSgwx/SWqQ4S9JDTL8JalBhr8kNcjwl6QGGf6S1KA18YTvnTGz9bzbpneddeIEK5Gk/Ydn/pLUIMNfkhpk+EtSgwx/SWqQ4S9JDTL8JalBhr8kNcjwl6QGGf6S1CDDX5IaZPhLUoMMf0lqkOEvSQ0y/CWpQYa/JDXI8JekBhn+ktQgw1+SGmT4S1KDDH9JapDhL0kNMvwlqUGGvyQ1aLDwT3L/JBcnuSrJlUl+v1t+nyQXJrm6+37oUDVIkuY35Jn/LcCrqmoT8Djg9CSbgK3ARVW1Ebiom5ckjdFg4V9VN1TVZ7rp7wFfBO4LnAyc2212LvCMoWqQJM1vLH3+SWaARwKfAg6vqhu6VV8HDl/gM1uS7EiyY+/eveMoU5KaMXj4J7kn8EHgFVV109x1VVVAzfe5qtpWVbNVNbthw4ahy5Skpgwa/knuwij431VV/9wt/kaSI7r1RwB7hqxBkvSzhrzbJ8DbgC9W1evnrNoOnNpNnwp8ZKgaJEnzWzfgvh8PvAD4QpLPdcteA5wFvC/JacC1wHMHrEGSNI/Bwr+q/hPIAquPG+q4kqSl+YSvJDXI8JekBhn+ktQgw1+SGmT4S1KDDH9JapDhL0kNMvwlqUGGvyQ1yPCXpAYZ/pLUIMNfkhpk+EtSgwx/SWqQ4S9JDTL8JalBhr8kNcjwl6QGGf6S1CDDX5IaZPhLUoMMf0lqkOEvSQ0y/CWpQYa/JDXI8JekBhn+ktQgw1+SGmT4S1KDDH9JapDhL0kNMvwlqUGGvyQ1yPCXpAYZ/pLUIMNfkho0WPgnOSfJniQ75yw7M8l1ST7XfZ0w1PElSQsb8sz/7cDx8yx/Q1Ud032dP+DxJUkLGCz8q+oS4NtD7V+StHKT6PN/WZIrum6hQxfaKMmWJDuS7Ni7d+8465OkqTfu8P874EHAMcANwOsW2rCqtlXVbFXNbtiwYVz1SVITxhr+VfWNqrq1qn4KnA08ZpzHlySN9Ar/JA9fjYMlOWLO7DOBnQttK0kazrqe2705yd0Y3cHzrqq6cakPJHkPcCywPslu4Azg2CTHAAXsAl6ygpolSXdSr/Cvqick2Qj8DnB5kk8D/1BVFy7ymVPmWfy2lZUpSVpNvfv8q+pq4LXAq4EnAW9M8t9JfmOo4iRJw+jb5/+LSd4AfBF4CvD0qnpoN/2GAeuTJA2gb5//3wJ/D7ymqn60b2FVXZ/ktYNUJkkaTN/wPxH4UVXdCpDkAOCgqvphVb1zsOokSYPo2+f/ceDuc+YP7pZJktagvuF/UFV9f99MN33wMCVJkobWN/x/kGTzvpkkjwJ+tMj2kqT9WN8+/1cA709yPRDgF4DfHKwqSdKg+j7kdVmShwBHd4u+VFU/Ga4sSdKQ+p75AzwamOk+szkJVfWOQaqSJA2qV/gneSejoZg/B9zaLS7A8JekNajvmf8ssKmqashiJEnj0fdun52MLvJKkqZA3zP/9cBV3WieP963sKpOGqQqSdKg+ob/mUMWIUkar763en4iyQOAjVX18SQHAwcOW5okaSh9h3R+MfAB4K3dovsCHx6qKEnSsPpe8D0deDxwE9z2YpfDhipKkjSsvuH/46q6ed9MknWM7vOXJK1BfcP/E0leA9w9ya8B7wf+ZbiyJElD6hv+W4G9wBeAlwDnM3qfryRpDep7t89PgbO7L0nSGtd3bJ+vMk8ff1UdteoVSZIGt5yxffY5CHgOcJ/VL0eSNA69+vyr6ltzvq6rqr9m9FJ3SdIa1LfbZ/Oc2QMY/SawnHcBSJL2I30D/HVzpm8BdgHPXfVqJElj0fdunycPXYgkaXz6dvu8crH1VfX61SlHkjQOy7nb59HA9m7+6cCngauHKEqSNKy+4X8/YHNVfQ8gyZnAeVX1/KEKkyQNp+/wDocDN8+Zv7lbJklag/qe+b8D+HSSD3XzzwDOHaYkSdLQ+t7t8xdJPgo8oVv0oqr67HBlSZKG1LfbB+Bg4Kaq+htgd5IHDlSTJGlgfV/jeAbwauCPu0V3Af5xqKIkScPqe+b/TOAk4AcAVXU9cMhiH0hyTpI9SXbOWXafJBcmubr7fuhKC5ckrVzf8L+5qopuWOck9+jxmbcDx99h2VbgoqraCFzUzUuSxqxv+L8vyVuBeyd5MfBxlnixS1VdAnz7DotP5va7hM5ldNeQJGnMlrzbJ0mA9wIPAW4Cjgb+tKouXMHxDq+qG7rpr7PIswJJtgBbAI488sgVHEqStJAlw7+qKsn5VfVwYCWBv9h+f+btYHPWbwO2AczOzi64nSRp+fp2+3wmyaNX4XjfSHIEQPd9zyrsU5K0TH3D/7HApUm+nOSKJF9IcsUKjrcdOLWbPhX4yAr2IUm6kxbt9klyZFV9Dfj15e44yXuAY4H1SXYDZwBnMbp4fBpwLb4QRpImYqk+/w8zGs3z2iQfrKpn9d1xVZ2ywKrjelcnSRrEUt0+mTN91JCFSJLGZ6nwrwWmJUlr2FLdPo9IchOj3wDu3k3TzVdV3WvQ6iRJg1g0/KvqwHEVIkkan+UM6SxJmhKGvyQ1yPCXpAYZ/pLUIMNfkhpk+EtSgwx/SWqQ4S9JDVryZS7TZGbrebdN7zrrxAlWIkmT5Zm/JDXI8JekBhn+ktQgw1+SGmT4S1KDDH9JapDhL0kNMvwlqUGGvyQ1yPCXpAYZ/pLUIMNfkhpk+EtSgwx/SWqQ4S9JDTL8JalBhr8kNcjwl6QGGf6S1CDDX5IaZPhLUoMMf0lqkOEvSQ1aN4mDJtkFfA+4FbilqmYnUYcktWoi4d95clV9c4LHl6Rm2e0jSQ2aVPgX8K9JLk+yZb4NkmxJsiPJjr179465PEmabpMK/1+pqs3A04DTkzzxjhtU1baqmq2q2Q0bNoy/QkmaYhMJ/6q6rvu+B/gQ8JhJ1CFJrRp7+Ce5R5JD9k0DTwV2jrsOSWrZJO72ORz4UJJ9x393VX1sAnVIUrPGHv5V9RXgEeM+riTpdt7qKUkNMvwlqUGGvyQ1aJLDO0zUzNbzbpveddaJSy6XpGnimb8kNcjwl6QGGf6S1CDDX5IaZPhLUoMMf0lqkOEvSQ0y/CWpQYa/JDXI8JekBhn+ktQgw1+SGmT4S1KDDH9JapDhL0kNMvwlqUGGvyQ1qNk3ec019+1dCy1f7K1evv1L0lrjmb8kNcjwl6QGGf6S1CDDX5IaZPhLUoMMf0lqkOEvSQ0y/CWpQT7k1dNCD4KtZLuFHgRb6GGxPg+RLfdBs4XqvDMPqd1xn8ttg6TF/x2tJs/8JalBhr8kNcjwl6QGGf6S1CDDX5IaNJHwT3J8ki8luSbJ1knUIEktG3v4JzkQeBPwNGATcEqSTeOuQ5JaNokz/8cA11TVV6rqZuCfgJMnUIckNStVNd4DJs8Gjq+q3+3mXwA8tqpedofttgBbutmjgS+t8JDrgW+u8LNrXattb7Xd0G7bbff8HlBVG+Zbsd8+4VtV24Btd3Y/SXZU1ewqlLTmtNr2VtsN7bbddi/fJLp9rgPuP2f+ft0ySdKYTCL8LwM2JnlgkrsCzwO2T6AOSWrW2Lt9quqWJC8DLgAOBM6pqisHPOSd7jpaw1pte6vthnbbbruXaewXfCVJk+cTvpLUIMNfkho0NeG/1JARSe6W5L3d+k8lmRl/lauvR7tfmeSqJFckuSjJAyZR5xD6DhOS5FlJKslU3ArYp91Jntv93K9M8u5x1ziUHn/fj0xycZLPdn/nT5hEnastyTlJ9iTZucD6JHlj9+dyRZLNS+60qtb8F6MLx18GjgLuCnwe2HSHbX4PeEs3/TzgvZOue0ztfjJwcDf90mlod9+2d9sdAlwCXArMTrruMf3MNwKfBQ7t5g+bdN1jbPs24KXd9CZg16TrXqW2PxHYDOxcYP0JwEeBAI8DPrXUPqflzL/PkBEnA+d20x8AjkuSMdY4hCXbXVUXV9UPu9lLGT1XMQ36DhPy58BfAv83zuIG1KfdLwbeVFXfAaiqPWOucSh92l7AvbrpnwOuH2N9g6mqS4BvL7LJycA7auRS4N5Jjlhsn9MS/vcF/nfO/O5u2bzbVNUtwI3Az4+luuH0afdcpzE6O5gGS7a9+9X3/lXV7wXMa0Ofn/mDgQcn+a8klyY5fmzVDatP288Enp9kN3A+8PLxlDZxy82C/Xd4B62uJM8HZoEnTbqWcUhyAPB64IUTLmUS1jHq+jmW0W96lyR5eFV9d6JVjccpwNur6nVJfgl4Z5KHVdVPJ13Y/mZazvz7DBlx2zZJ1jH6lfBbY6luOL2Gykjyq8CfACdV1Y/HVNvQlmr7IcDDgH9PsotRP+j2Kbjo2+dnvhvYXlU/qaqvAv/D6D+Dta5P208D3gdQVZ8EDmI0+Nm0W/awOdMS/n2GjNgOnNpNPxv4t+qulKxhS7Y7ySOBtzIK/mnp+4Ul2l5VN1bV+qqaqaoZRtc7TqqqHZMpd9X0+bv+YUZn/SRZz6gb6CvjLHIgfdr+NeA4gCQPZRT+e8da5WRsB367u+vnccCNVXXDYh+Yim6fWmDIiCR/Buyoqu3A2xj9CngNowsnz5tcxaujZ7v/Crgn8P7u+vbXquqkiRW9Snq2fer0bPcFwFOTXAXcCvxRVa3133L7tv1VwNlJ/oDRxd8XTsFJHknew+g/9PXd9YwzgLsAVNVbGF3fOAG4Bvgh8KIl9zkFfy6SpGWalm4fSdIyGP6S1CDDX5IaZPhLUoMMf0nazyw1kNs82y97ID/v9pGk/UySJwLfZzRez8OW2HYjowfbnlJV30lyWJ9nejzzl6T9zHwDuSV5UJKPJbk8yX8keUi3akUD+Rn+krQ2bANeXlWPAv4QeHO3fEUD+U3FE76SNM2S3BP4ZW5/Uh/gbt33FQ3kZ/hL0v7vAOC7VXXMPOt2M3p5y0+ArybZN5DfZUvtUJK0H6uqmxgF+3Pgttc2PqJbvaKB/Ax/SdrPdAO5fRI4OsnuJKcBvwWcluTzwJXc/hazC4BvdQP5XUzPgfy81VOSGuSZvyQ1yPCXpAYZ/pLUIMNfkhpk+EtSgwx/SWqQ4S9JDfp/CVgrVNK/G9YAAAAASUVORK5CYII=\n",
"text/plain": [
- ""
- ]
+ ""
+ ],
+ "image/png": "\n"
},
- "metadata": {
- "tags": [],
- "needs_background": "light"
- }
+ "metadata": {}
}
]
},
@@ -2444,39 +3804,34 @@
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
- "height": 295
+ "height": 465
},
"id": "euPFSZvK3wGq",
- "outputId": "ea5ee84d-f96f-4165-ce12-b27ec2e01871"
+ "outputId": "e3abbab9-6856-4106-d2ae-bad1ad9d4d7d"
},
"source": [
- "display(df[\"CumulativeCount_MedicalTest_ConditionCOVID_19_Positive\"].plot.hist(bins=10))"
+ "display(df[\"CumulativeCount_MedicalConditionIncident_COVID_19_ConfirmedOrProbableCase\"].plot.hist(bins=10))"
],
- "execution_count": null,
+ "execution_count": 33,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": [
- ""
+ ""
]
},
- "metadata": {
- "tags": []
- }
+ "metadata": {}
},
{
"output_type": "display_data",
"data": {
- "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX4AAAEFCAYAAADkP4z+AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAQ+UlEQVR4nO3deZBlZX3G8e8DgyKKCzISCjDjgiiloNggxi2CGqMRSFQCJTpaU05KjaXRJKKxopWlCitR1JSWjks5mqggRpm4RhElSakwSFQWDYiDGUBpEQSXyOIvf9wz0kDP9OmePvd29/v9VHXds97ze6tnnj73Pee8N1WFJKkdu0y6AEnSeBn8ktQYg1+SGmPwS1JjDH5JasyqSRfQx957711r1qyZdBmStKycf/75P66q1XdcviyCf82aNWzevHnSZUjSspLkitmW29UjSY0x+CWpMQa/JDXG4Jekxhj8ktQYg1+SGmPwS1JjDH5JaozBL0mNWRZP7u6MNSd/eiLH3XLKMydyXEmai2f8ktQYg1+SGmPwS1JjDH5JaozBL0mNMfglqTEGvyQ1ZtD7+JNsAW4EbgVuqaqpJHsBpwFrgC3A8VV13ZB1SJJuM44z/idX1SOraqqbPxk4q6oOBM7q5iVJYzKJrp5jgY3d9EbguAnUIEnNGjr4C/j3JOcnWd8t26eqru6mfwjsM9uOSdYn2Zxk8/T09MBlSlI7hh6r5/FVdWWS+wFfSPKdmSurqpLUbDtW1QZgA8DU1NSs20iS5m/QM/6qurJ7vQb4BHAE8KMk+wJ0r9cMWYMk6fYGC/4kd0+y57Zp4GnAhcAmYG232VrgzKFqkCTd2ZBdPfsAn0iy7TgfrqrPJTkPOD3JOuAK4PgBa5Ak3cFgwV9VlwOHzrL8WuDooY4rSdoxn9yVpMYY/JLUGINfkhpj8EtSYwx+SWqMwS9JjTH4JakxBr8kNcbgl6TGGPyS1BiDX5IaY/BLUmMMfklqjMEvSY0x+CWpMQa/JDXG4Jekxhj8ktQYg1+SGmPwS1JjDH5JaozBL0mNMfglqTEGvyQ1xuCXpMYY/JLUGINfkhpj8EtSYwx+SWqMwS9JjRk8+JPsmuSCJJ/q5h+Q5OtJLktyWpK7DF2DJOk24zjjfwVwyYz5NwGnVtWDgeuAdWOoQZLUGTT4k+wPPBN4bzcf4CjgjG6TjcBxQ9YgSbq9oc/43wr8JfDrbv6+wPVVdUs3vxXYb+AaJEkzDBb8Sf4AuKaqzl/g/uuTbE6yeXp6epGrk6R2DXnG/zjgmCRbgI8y6uJ5G3DvJKu6bfYHrpxt56raUFVTVTW1evXqAcuUpLYMFvxV9dqq2r+q1gAnAF+qqucBZwPP6TZbC5w5VA2SpDubxH38rwFeleQyRn3+75tADZLUrFVzb7LzqurLwJe76cuBI8ZxXEnSnfnkriQ1xuCXpMYY/JLUGINfkhpj8EtSYwx+SWqMwS9JjTH4JakxBr8kNcbgl6TGGPyS1BiDX5IaY/BLUmMMfklqjMEvSY0x+CWpMQa/JDXG4Jekxhj8ktSYXsGf5BFDFyJJGo++Z/zvTHJukpcmudegFUmSBtUr+KvqCcDzgAOA85N8OMlTB61MkjSI3n38VXUp8HrgNcCTgLcn+U6SPxqqOEnS4uvbx39IklOBS4CjgGdV1cO66VMHrE+StMhW9dzun4D3Aq+rql9uW1hVVyV5/SCVSZIG0Tf4nwn8sqpuBUiyC7B7Vf2iqj40WHWSpEXXt4//i8DdZszv0S2TJC0zfYN/96r62baZbnqPYUqSJA2pb/D/PMlh22aSPBr45Q62lyQtUX37+F8JfCzJVUCA3wL+eLCqJEmD6RX8VXVekocCB3WLvltVN+9onyS7A+cAd+2Oc0ZVvSHJA4CPAvcFzgeeX1U3LbQBkqT5mc8gbYcDhwCHAScmecEc2/8KOKqqDgUeCTw9yZHAm4BTq+rBwHXAuvmXLUlaqL4PcH0I+Efg8Yz+ABwOTO1onxrZdkF4t+6nGD30dUa3fCNw3PzLliQtVN8+/ing4Kqq+bx5kl0Zdec8GHgH8D3g+qq6pdtkK7DffN5TkrRz+nb1XMjogu68VNWtVfVIYH/gCOChffdNsj7J5iSbp6en53toSdJ29D3j3xu4OMm5jPruAaiqY/rsXFXXJzkbeCxw7ySrurP+/YErt7PPBmADwNTU1Lw+aUiStq9v8L9xvm+cZDVwcxf6dwOeyujC7tnAcxjd2bMWOHO+7y1JWri+t3N+JclvAwdW1ReT7AHsOsdu+wIbu37+XYDTq+pTSS4GPprk74ALgPftRP2SpHnqFfxJXgysB/YCHsToguy7gKO3t09VfQt41CzLL2fU3y9JmoC+F3dfBjwOuAF+86Us9xuqKEnScPoG/69mPl2bZBWje/IlSctM3+D/SpLXAXfrvmv3Y8C/DVeWJGkofYP/ZGAa+DbwJ8BnGH3/riRpmel7V8+vgfd0P5KkZazvXT3fZ5Y+/ap64KJXJEka1HzG6tlmd+C5jG7tlCQtM736+Kvq2hk/V1bVWxl9AbskaZnp29Vz2IzZXRh9Auj7aUGStIT0De83z5i+BdgCHL/o1UiSBtf3rp4nD12IJGk8+nb1vGpH66vqLYtTjiRpaPO5q+dwYFM3/yzgXODSIYqSJA2nb/DvDxxWVTcCJHkj8OmqOmmowiRJw+g7ZMM+wE0z5m/qlkmSlpm+Z/wfBM5N8olu/jhg4zAlSZKG1Peunr9P8lngCd2iF1XVBcOVJUkaSt+uHoA9gBuq6m3A1iQPGKgmSdKAegV/kjcArwFe2y3aDfjnoYqSJA2n7xn/HwLHAD8HqKqrgD2HKkqSNJy+wX9TVRXd0MxJ7j5cSZKkIfUN/tOTvBu4d5IXA1/EL2WRpGVpzrt6kgQ4DXgocANwEPDXVfWFgWuTJA1gzuCvqkrymap6BGDYS9Iy17er5xtJDh+0EknSWPR9cvcxwElJtjC6syeMPgwcMlRhkqRh7DD4k9y/qn4A/N6Y6pEkDWyuM/5PMhqV84okH6+qZ4+jKEnScObq48+M6QcOWYgkaTzmCv7azrQkaZmaK/gPTXJDkhuBQ7rpG5LcmOSGHe2Y5IAkZye5OMlFSV7RLd8ryReSXNq93mexGiNJmtsOg7+qdq2qe1bVnlW1qpveNn/POd77FuDVVXUwcCTwsiQHAycDZ1XVgcBZ3bwkaUzmMyzzvFTV1VX1jW76RuASYD/gWG77EpeNjL7URZI0JoMF/0xJ1gCPAr4O7FNVV3erfohf4ShJYzV48Ce5B/Bx4JVVdbvrAjNH/Jxlv/VJNifZPD09PXSZktSMQYM/yW6MQv9fqupfu8U/SrJvt35f4JrZ9q2qDVU1VVVTq1evHrJMSWrKYMHfjer5PuCSqnrLjFWbgLXd9FrgzKFqkCTdWd+xehbiccDzgW8n+e9u2euAUxiN778OuAI4fsAaJEl3MFjwV9V/cvsnf2c6eqjjSpJ2bCx39UiSlg6DX5IaY/BLUmMMfklqjMEvSY0x+CWpMQa/JDXG4Jekxhj8ktQYg1+SGmPwS1JjDH5JaozBL0mNMfglqTEGvyQ1xuCXpMYY/JLUGINfkhpj8EtSYwx+SWqMwS9JjTH4JakxBr8kNcbgl6TGGPyS1BiDX5IaY/BLUmMMfklqjMEvSY0x+CWpMQa/JDVmsOBP8v4k1yS5cMayvZJ8Icml3et9hjq+JGl2Q57xfwB4+h2WnQycVVUHAmd185KkMRos+KvqHOAnd1h8LLCxm94IHDfU8SVJsxt3H/8+VXV1N/1DYJ/tbZhkfZLNSTZPT0+PpzpJasDELu5WVQG1g/UbqmqqqqZWr149xsokaWUbd/D/KMm+AN3rNWM+viQ1b9zBvwlY202vBc4c8/ElqXlD3s75EeCrwEFJtiZZB5wCPDXJpcBTunlJ0hitGuqNq+rE7aw6eqhjSpLm5pO7ktQYg1+SGmPwS1JjDH5JaozBL0mNMfglqTEGvyQ1xuCXpMYY/JLUGINfkhpj8EtSYwx+SWqMwS9JjTH4JakxBr8kNcbgl6TGGPyS1JjBvoFLk7Pm5E9P5LhbTnnmRI4raX4845ekxhj8ktQYg1+SGmPwS1JjDH5JaozBL0mNMfglqTEGvyQ1xuCXpMYY/JLUGINfkhrjWD2SNIeVNv7VRM74kzw9yXeTXJbk5EnUIEmtGnvwJ9kVeAfw+8DBwIlJDh53HZLUqkmc8R8BXFZVl1fVTcBHgWMnUIckNWkSffz7Af87Y34r8Jg7bpRkPbC+m/1Zku8u8Hh7Az9e4L4LljeN5TATadv2LGKbl1S7FtlKbdtKbRdMsG2L8H/qt2dbuGQv7lbVBmDDzr5Pks1VNbUIJS05K7VtK7VdsHLbtlLbBSuzbZPo6rkSOGDG/P7dMknSGEwi+M8DDkzygCR3AU4ANk2gDklq0ti7eqrqliR/Cnwe2BV4f1VdNOAhd7q7aAlbqW1bqe2Cldu2ldouWIFtS1VNugZJ0hg5ZIMkNcbgl6TGrJjgn2sYiCR3TXJat/7rSdaMv8r569GuVyW5OMm3kpyVZNb7dpeivkN3JHl2kkqyLG6p69OuJMd3v7eLknx43DUuVI9/j/dPcnaSC7p/k8+YRJ3zleT9Sa5JcuF21ifJ27t2fyvJYeOucVFV1bL/YXSR+HvAA4G7AN8EDr7DNi8F3tVNnwCcNum6F6ldTwb26KZfshza1bdt3XZ7AucAXwOmJl33Iv3ODgQuAO7Tzd9v0nUvYts2AC/ppg8Gtky67p5teyJwGHDhdtY/A/gsEOBI4OuTrnlnflbKGX+fYSCOBTZ202cARyfJGGtciDnbVVVnV9UvutmvMXouYjnoO3TH3wJvAv5vnMXthD7tejHwjqq6DqCqrhlzjQvVp20F3LObvhdw1RjrW7CqOgf4yQ42ORb4YI18Dbh3kn3HU93iWynBP9swEPttb5uqugX4KXDfsVS3cH3aNdM6Rmcly8Gcbes+Th9QVZMZE3dh+vzOHgI8JMl/JflakqePrbqd06dtbwROSrIV+Azw8vGUNrj5/l9c0pbskA2anyQnAVPAkyZdy2JIsgvwFuCFEy5lCKsYdff8LqNPaOckeURVXT/RqhbHicAHqurNSR4LfCjJw6vq15MuTLdZKWf8fYaB+M02SVYx+hh67ViqW7hew1skeQrwV8AxVfWrMdW2s+Zq257Aw4EvJ9nCqF910zK4wNvnd7YV2FRVN1fV94H/YfSHYKnr07Z1wOkAVfVVYHdGg5wtdytqqJmVEvx9hoHYBKztpp8DfKm6qzZL2JztSvIo4N2MQn+59BXDHG2rqp9W1d5Vtaaq1jC6fnFMVW2eTLm99fm3+ElGZ/sk2ZtR18/l4yxygfq07QfA0QBJHsYo+KfHWuUwNgEv6O7uORL4aVVdPemiFmpFdPXUdoaBSPI3wOaq2gS8j9HHzssYXcQ5YXIV99OzXf8A3AP4WHet+gdVdczEiu6pZ9uWnZ7t+jzwtCQXA7cCf1FVS/3TZ9+2vRp4T5I/Y3Sh94XL4ASLJB9h9Md47+76xBuA3QCq6l2Mrlc8A7gM+AXwoslUujgcskGSGrNSunokST0Z/JLUGINfkhpj8EtSYwx+SVpi5ho0bpbt5zXon3f1SNISk+SJwM8YjQ/08Dm2PZDRQ3NHVdV1Se431zM9nvFL0hIz26BxSR6U5HNJzk/yH0ke2q2a96B/Br8kLQ8bgJdX1aOBPwfe2S2f96B/K+LJXUlayZLcA/gdbntCH+Cu3eu8B/0z+CVp6dsFuL6qHjnLuq2MvhjmZuD7SbYN+nfejt5MkrSEVdUNjEL9ufCbr4I8tFs970H/DH5JWmK6QeO+ChyUZGuSdcDzgHVJvglcxG3ffvZ54Npu0L+z6THon7dzSlJjPOOXpMYY/JLUGINfkhpj8EtSYwx+SWqMwS9JjTH4Jakx/w9b2JZfp++d0wAAAABJRU5ErkJggg==\n",
"text/plain": [
- ""
- ]
+ ""
+ ],
+ "image/png": "\n"
},
- "metadata": {
- "tags": [],
- "needs_background": "light"
- }
+ "metadata": {}
}
]
},
@@ -2499,7 +3854,7 @@
"\n",
"\n",
"**1.3B)** How would you approach handling any NaN or empty values in a dataframe? Should we remove that row? Remove the feature? Or should we replace NaNs with a particular value (and if so, how do you decide what value that should be)?\n",
- " \n",
+ "\n",
"\n",
"**1.3C)** Take a look at the dataframe outputted by the code box above from section 1.2. Are there any values that need to be cleaned? If so, write code to implement the answers to the above questions using the code box below.\n",
"\n",
@@ -2540,7 +3895,7 @@
"\n",
"Sometimes transforming the data can reveal interesting combinations, or better scale our data. Here are some things to look out for:\n",
"\n",
- "* If your data has a skewed distribution or large changes in magnitude, it may be helpful to take the $log()$ of your data to bring it closer to normal. \n",
+ "* If your data has a skewed distribution or large changes in magnitude, it may be helpful to take the $log()$ of your data to bring it closer to normal.\n",
"* Othertimes it may be helpful to bin close values together (for example, create groupings by age 0-10, 11-20, 21-30, etc.)\n",
"* When working with population or demographic data, it's often also prudent to consider whether the features you are using should be scaled by population.\n",
"\n",
@@ -2555,7 +3910,7 @@
"base_uri": "https://localhost:8080/",
"height": 1000
},
- "outputId": "0891fdea-b0cf-4abd-ebf3-033aa383effb"
+ "outputId": "6f8dc2c3-258e-4ab2-b040-04c2c49af2a9"
},
"source": [
"# Try different transformations on your features like taking the log, binning, etc.\n",
@@ -2575,13 +3930,321 @@
"features_df_new = pd.concat([household_df,df], axis=1)\n",
"display(features_df_new)\n"
],
- "execution_count": null,
+ "execution_count": 37,
"outputs": [
{
"output_type": "display_data",
"data": {
+ "text/plain": [
+ " Household4orMore_percapita \\\n",
+ "place \n",
+ "Alameda County 0.093756 \n",
+ "Alpine County 0.092046 \n",
+ "Amador County 0.072924 \n",
+ "Butte County 0.077135 \n",
+ "Calaveras County 0.066403 \n",
+ "Colusa County 0.102189 \n",
+ "Contra Costa County 0.102785 \n",
+ "Del Norte County 0.082559 \n",
+ "El Dorado County 0.083794 \n",
+ "Fresno County 0.107844 \n",
+ "Glenn County 0.088392 \n",
+ "Humboldt County 0.071960 \n",
+ "Imperial County 0.100473 \n",
+ "Inyo County 0.057797 \n",
+ "Kern County 0.108340 \n",
+ "Kings County 0.105127 \n",
+ "Lake County 0.079716 \n",
+ "Lassen County 0.053805 \n",
+ "Los Angeles County 0.097342 \n",
+ "Madera County 0.104810 \n",
+ "Marin County 0.080452 \n",
+ "Mariposa County 0.071853 \n",
+ "Mendocino County 0.077631 \n",
+ "Merced County 0.107953 \n",
+ "Modoc County 0.053064 \n",
+ "Mono County 0.076029 \n",
+ "Monterey County 0.111398 \n",
+ "Napa County 0.088390 \n",
+ "Nevada County 0.069253 \n",
+ "Orange County 0.101457 \n",
+ "Placer County 0.094213 \n",
+ "Plumas County 0.052670 \n",
+ "Riverside County 0.107717 \n",
+ "Sacramento County 0.099058 \n",
+ "San Benito County 0.120131 \n",
+ "San Bernardino County 0.110548 \n",
+ "San Diego County 0.089560 \n",
+ "San Francisco County 0.067913 \n",
+ "San Joaquin County 0.114660 \n",
+ "San Luis Obispo County 0.077074 \n",
+ "San Mateo County 0.089734 \n",
+ "Santa Barbara County 0.097784 \n",
+ "Santa Clara County 0.097530 \n",
+ "Santa Cruz County 0.088021 \n",
+ "Shasta County 0.079323 \n",
+ "Sierra County 0.039041 \n",
+ "Siskiyou County 0.069927 \n",
+ "Solano County 0.095817 \n",
+ "Sonoma County 0.086297 \n",
+ "Stanislaus County 0.107501 \n",
+ "Sutter County 0.108191 \n",
+ "Tehama County 0.090194 \n",
+ "Trinity County 0.049443 \n",
+ "Tulare County 0.121435 \n",
+ "Tuolumne County 0.063982 \n",
+ "Ventura County 0.098090 \n",
+ "Yolo County 0.096328 \n",
+ "Yuba County 0.105277 \n",
+ "\n",
+ " CumulativeCount_MedicalConditionIncident_COVID_19_ConfirmedOrProbableCase \\\n",
+ "place \n",
+ "Alameda County 284054 \n",
+ "Alpine County 126 \n",
+ "Amador County 9242 \n",
+ "Butte County 40181 \n",
+ "Calaveras County 7754 \n",
+ "Colusa County 4549 \n",
+ "Contra Costa County 209958 \n",
+ "Del Norte County 6381 \n",
+ "El Dorado County 30508 \n",
+ "Fresno County 257611 \n",
+ "Glenn County 6631 \n",
+ "Humboldt County 20952 \n",
+ "Imperial County 66715 \n",
+ "Inyo County 4636 \n",
+ "Kern County 244681 \n",
+ "Kings County 55429 \n",
+ "Lake County 11699 \n",
+ "Lassen County 10751 \n",
+ "Los Angeles County 2908425 \n",
+ "Madera County 43685 \n",
+ "Marin County 38685 \n",
+ "Mariposa County 3145 \n",
+ "Mendocino County 16568 \n",
+ "Merced County 72959 \n",
+ "Modoc County 1000 \n",
+ "Mono County 3144 \n",
+ "Monterey County 95140 \n",
+ "Napa County 27765 \n",
+ "Nevada County 17503 \n",
+ "Orange County 600384 \n",
+ "Placer County 71527 \n",
+ "Plumas County 3379 \n",
+ "Riverside County 626695 \n",
+ "Sacramento County 314407 \n",
+ "San Benito County 13636 \n",
+ "San Bernardino County 597377 \n",
+ "San Diego County 824586 \n",
+ "San Francisco County 143959 \n",
+ "San Joaquin County 178501 \n",
+ "San Luis Obispo County 57556 \n",
+ "San Mateo County 137238 \n",
+ "Santa Barbara County 92683 \n",
+ "Santa Clara County 342015 \n",
+ "Santa Cruz County 52532 \n",
+ "Shasta County 36988 \n",
+ "Sierra County 324 \n",
+ "Siskiyou County 7478 \n",
+ "Solano County 89419 \n",
+ "Sonoma County 90191 \n",
+ "Stanislaus County 136645 \n",
+ "Sutter County 23045 \n",
+ "Tehama County 14753 \n",
+ "Trinity County 1485 \n",
+ "Tulare County 136125 \n",
+ "Tuolumne County 13758 \n",
+ "Ventura County 186062 \n",
+ "Yolo County 41061 \n",
+ "Yuba County 17944 \n",
+ "\n",
+ " Count_Person Count_Person_MarriedAndNotSeparated \\\n",
+ "place \n",
+ "Alameda County 1662323 674824.699 \n",
+ "Alpine County 1119 495.504 \n",
+ "Amador County 40083 17272.889 \n",
+ "Butte County 212744 76639.140 \n",
+ "Calaveras County 46308 21560.079 \n",
+ "Colusa County 21558 8416.200 \n",
+ "Contra Costa County 1152333 491223.639 \n",
+ "Del Norte County 27968 9762.150 \n",
+ "El Dorado County 192925 91421.044 \n",
+ "Fresno County 1000918 337513.410 \n",
+ "Glenn County 28283 12146.960 \n",
+ "Humboldt County 134977 43891.200 \n",
+ "Imperial County 180267 56884.256 \n",
+ "Inyo County 18046 7219.959 \n",
+ "Kern County 901362 312889.872 \n",
+ "Kings County 152692 53927.780 \n",
+ "Lake County 64479 24822.130 \n",
+ "Lassen County 30016 11461.076 \n",
+ "Los Angeles County 9943046 3520955.696 \n",
+ "Madera County 157761 56900.481 \n",
+ "Marin County 257332 113378.795 \n",
+ "Mariposa County 17160 7397.292 \n",
+ "Mendocino County 86061 33108.240 \n",
+ "Merced County 279252 94260.678 \n",
+ "Modoc County 8763 3883.438 \n",
+ "Mono County 14534 5487.377 \n",
+ "Monterey County 430906 162039.867 \n",
+ "Napa County 135965 58060.389 \n",
+ "Nevada County 99606 45714.103 \n",
+ "Orange County 3166857 1306956.195 \n",
+ "Placer County 402950 185639.202 \n",
+ "Plumas County 18967 8585.850 \n",
+ "Riverside County 2489188 936495.840 \n",
+ "Sacramento County 1559146 581828.561 \n",
+ "San Benito County 64055 25577.667 \n",
+ "San Bernardino County 2189183 781869.830 \n",
+ "San Diego County 3332427 1293264.554 \n",
+ "San Francisco County 866606 306862.042 \n",
+ "San Joaquin County 767967 283515.990 \n",
+ "San Luis Obispo County 282249 114640.775 \n",
+ "San Mateo County 758308 335088.173 \n",
+ "Santa Barbara County 444766 160367.004 \n",
+ "Santa Clara County 1907105 834042.321 \n",
+ "Santa Cruz County 269925 103077.379 \n",
+ "Shasta County 179027 73045.770 \n",
+ "Sierra County 2920 1334.942 \n",
+ "Siskiyou County 43245 17714.508 \n",
+ "Solano County 446935 178080.056 \n",
+ "Sonoma County 489819 201348.202 \n",
+ "Stanislaus County 550081 203766.620 \n",
+ "Sutter County 96385 39530.528 \n",
+ "Tehama County 64494 26263.636 \n",
+ "Trinity County 12216 4868.825 \n",
+ "Tulare County 468680 163314.826 \n",
+ "Tuolumne County 54515 23785.476 \n",
+ "Ventura County 841387 335898.012 \n",
+ "Yolo County 219728 78920.160 \n",
+ "Yuba County 80160 29632.050 \n",
+ "\n",
+ " Median_Income_Person \\\n",
+ "place \n",
+ "Alameda County 56575 \n",
+ "Alpine County 35598 \n",
+ "Amador County 41581 \n",
+ "Butte County 33600 \n",
+ "Calaveras County 37043 \n",
+ "Colusa County 36820 \n",
+ "Contra Costa County 54178 \n",
+ "Del Norte County 31929 \n",
+ "El Dorado County 48876 \n",
+ "Fresno County 33875 \n",
+ "Glenn County 35120 \n",
+ "Humboldt County 31657 \n",
+ "Imperial County 24717 \n",
+ "Inyo County 41594 \n",
+ "Kern County 30912 \n",
+ "Kings County 34210 \n",
+ "Lake County 31565 \n",
+ "Lassen County 34293 \n",
+ "Los Angeles County 38111 \n",
+ "Madera County 30316 \n",
+ "Marin County 65068 \n",
+ "Mariposa County 36299 \n",
+ "Mendocino County 34707 \n",
+ "Merced County 31343 \n",
+ "Modoc County 31521 \n",
+ "Mono County 46048 \n",
+ "Monterey County 37063 \n",
+ "Napa County 47432 \n",
+ "Nevada County 40174 \n",
+ "Orange County 46430 \n",
+ "Placer County 53071 \n",
+ "Plumas County 39230 \n",
+ "Riverside County 37929 \n",
+ "Sacramento County 42351 \n",
+ "San Benito County 44611 \n",
+ "San Bernardino County 36178 \n",
+ "San Diego County 45463 \n",
+ "San Francisco County 69260 \n",
+ "San Joaquin County 38674 \n",
+ "San Luis Obispo County 40720 \n",
+ "San Mateo County 63325 \n",
+ "Santa Barbara County 38787 \n",
+ "Santa Clara County 62532 \n",
+ "Santa Cruz County 43988 \n",
+ "Shasta County 35503 \n",
+ "Sierra County 31696 \n",
+ "Siskiyou County 31315 \n",
+ "Solano County 45137 \n",
+ "Sonoma County 48308 \n",
+ "Stanislaus County 36126 \n",
+ "Sutter County 34285 \n",
+ "Tehama County 33015 \n",
+ "Trinity County 30470 \n",
+ "Tulare County 31326 \n",
+ "Tuolumne County 37688 \n",
+ "Ventura County 42693 \n",
+ "Yolo County 40567 \n",
+ "Yuba County 35459 \n",
+ "\n",
+ " Count_Household_With4OrMorePerson \n",
+ "place \n",
+ "Alameda County 155852 \n",
+ "Alpine County 103 \n",
+ "Amador County 2923 \n",
+ "Butte County 16410 \n",
+ "Calaveras County 3075 \n",
+ "Colusa County 2203 \n",
+ "Contra Costa County 118442 \n",
+ "Del Norte County 2309 \n",
+ "El Dorado County 16166 \n",
+ "Fresno County 107943 \n",
+ "Glenn County 2500 \n",
+ "Humboldt County 9713 \n",
+ "Imperial County 18112 \n",
+ "Inyo County 1043 \n",
+ "Kern County 97654 \n",
+ "Kings County 16052 \n",
+ "Lake County 5140 \n",
+ "Lassen County 1615 \n",
+ "Los Angeles County 967873 \n",
+ "Madera County 16535 \n",
+ "Marin County 20703 \n",
+ "Mariposa County 1233 \n",
+ "Mendocino County 6681 \n",
+ "Merced County 30146 \n",
+ "Modoc County 465 \n",
+ "Mono County 1105 \n",
+ "Monterey County 48002 \n",
+ "Napa County 12018 \n",
+ "Nevada County 6898 \n",
+ "Orange County 321300 \n",
+ "Placer County 37963 \n",
+ "Plumas County 999 \n",
+ "Riverside County 268129 \n",
+ "Sacramento County 154446 \n",
+ "San Benito County 7695 \n",
+ "San Bernardino County 242009 \n",
+ "San Diego County 298453 \n",
+ "San Francisco County 58854 \n",
+ "San Joaquin County 88055 \n",
+ "San Luis Obispo County 21754 \n",
+ "San Mateo County 68046 \n",
+ "Santa Barbara County 43491 \n",
+ "Santa Clara County 185999 \n",
+ "Santa Cruz County 23759 \n",
+ "Shasta County 14201 \n",
+ "Sierra County 114 \n",
+ "Siskiyou County 3024 \n",
+ "Solano County 42824 \n",
+ "Sonoma County 42270 \n",
+ "Stanislaus County 59134 \n",
+ "Sutter County 10428 \n",
+ "Tehama County 5817 \n",
+ "Trinity County 604 \n",
+ "Tulare County 56914 \n",
+ "Tuolumne County 3488 \n",
+ "Ventura County 82532 \n",
+ "Yolo County 21166 \n",
+ "Yuba County 8439 "
+ ],
"text/html": [
- "\n",
+ "\n",
+ "
\n",
+ "
\n",
"\n",
+ "\n",
+ " \n",
+ "
\n",
+ "\n",
+ "\n",
+ "
\n",
+ "
\n",
+ "\n",
+ "\n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ "\n",
+ "\n",
+ "\n",
+ " \n",
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "\n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ "
\n",
+ "\n",
+ "
\n",
+ "
\n"
],
- "text/plain": [
- " Household4orMore_percapita ... DCID\n",
- "place ... \n",
- "Alameda County 0.095091 ... geoId/06001\n",
- "Alpine County 0.058710 ... geoId/06003\n",
- "Amador County 0.071665 ... geoId/06005\n",
- "Butte County 0.078413 ... geoId/06007\n",
- "Calaveras County 0.058004 ... geoId/06009\n",
- "Colusa County 0.107439 ... geoId/06011\n",
- "Contra Costa County 0.099008 ... geoId/06013\n",
- "Del Norte County 0.070777 ... geoId/06015\n",
- "El Dorado County 0.076197 ... geoId/06017\n",
- "Fresno County 0.104936 ... geoId/06019\n",
- "Glenn County 0.112918 ... geoId/06021\n",
- "Humboldt County 0.069509 ... geoId/06023\n",
- "Imperial County 0.089175 ... geoId/06025\n",
- "Inyo County 0.056294 ... geoId/06027\n",
- "Kern County 0.108111 ... geoId/06029\n",
- "Kings County 0.101970 ... geoId/06031\n",
- "Lake County 0.076501 ... geoId/06033\n",
- "Lassen County 0.048316 ... geoId/06035\n",
- "Los Angeles County 0.096681 ... geoId/06037\n",
- "Madera County 0.109108 ... geoId/06039\n",
- "Marin County 0.082283 ... geoId/06041\n",
- "Mariposa County 0.073077 ... geoId/06043\n",
- "Mendocino County 0.075667 ... geoId/06045\n",
- "Merced County 0.112358 ... geoId/06047\n",
- "Modoc County 0.059953 ... geoId/06049\n",
- "Mono County 0.036548 ... geoId/06051\n",
- "Monterey County 0.103129 ... geoId/06053\n",
- "Napa County 0.089219 ... geoId/06055\n",
- "Nevada County 0.061122 ... geoId/06057\n",
- "Orange County 0.100142 ... geoId/06059\n",
- "Placer County 0.091717 ... geoId/06061\n",
- "Plumas County 0.063934 ... geoId/06063\n",
- "Riverside County 0.100117 ... geoId/06065\n",
- "Sacramento County 0.092931 ... geoId/06067\n",
- "San Benito County 0.112180 ... geoId/06069\n",
- "San Bernardino County 0.108604 ... geoId/06071\n",
- "San Diego County 0.090031 ... geoId/06073\n",
- "San Francisco County 0.067674 ... geoId/06075\n",
- "San Joaquin County 0.108223 ... geoId/06077\n",
- "San Luis Obispo County 0.074013 ... geoId/06079\n",
- "San Mateo County 0.094532 ... geoId/06081\n",
- "Santa Barbara County 0.095592 ... geoId/06083\n",
- "Santa Clara County 0.100123 ... geoId/06085\n",
- "Santa Cruz County 0.087122 ... geoId/06087\n",
- "Shasta County 0.080938 ... geoId/06089\n",
- "Sierra County 0.063816 ... geoId/06091\n",
- "Siskiyou County 0.065358 ... geoId/06093\n",
- "Solano County 0.094776 ... geoId/06095\n",
- "Sonoma County 0.084332 ... geoId/06097\n",
- "Stanislaus County 0.107750 ... geoId/06099\n",
- "Sutter County 0.108044 ... geoId/06101\n",
- "Tehama County 0.081268 ... geoId/06103\n",
- "Trinity County 0.045039 ... geoId/06105\n",
- "Tulare County 0.117961 ... geoId/06107\n",
- "Tuolumne County 0.071792 ... geoId/06109\n",
- "Ventura County 0.097425 ... geoId/06111\n",
- "Yolo County 0.095785 ... geoId/06113\n",
- "Yuba County 0.106155 ... geoId/06115\n",
- "\n",
- "[58 rows x 7 columns]"
- ]
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "dataframe",
+ "variable_name": "features_df_new",
+ "summary": "{\n \"name\": \"features_df_new\",\n \"rows\": 58,\n \"fields\": [\n {\n \"column\": \"place\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 58,\n \"samples\": [\n \"Alameda County\",\n \"Colusa County\",\n \"San Benito County\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Household4orMore_percapita\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.019056969002914145,\n \"min\": 0.03904109589041096,\n \"max\": 0.12143466757702484,\n \"num_unique_values\": 58,\n \"samples\": [\n 0.0937555457032117,\n 0.10218944243436312,\n 0.12013113730387948\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"CumulativeCount_MedicalConditionIncident_COVID_19_ConfirmedOrProbableCase\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 406565,\n \"min\": 126,\n \"max\": 2908425,\n \"num_unique_values\": 58,\n \"samples\": [\n 284054,\n 4549,\n 13636\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Count_Person\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1457019,\n \"min\": 1119,\n \"max\": 9943046,\n \"num_unique_values\": 58,\n \"samples\": [\n 1662323,\n 21558,\n 64055\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Count_Person_MarriedAndNotSeparated\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 530767.1947848009,\n \"min\": 495.504,\n \"max\": 3520955.696,\n \"num_unique_values\": 58,\n \"samples\": [\n 674824.699,\n 8416.2,\n 25577.667\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Median_Income_Person\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 9434,\n \"min\": 24717,\n \"max\": 69260,\n \"num_unique_values\": 58,\n \"samples\": [\n 56575,\n 36820,\n 44611\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Count_Household_With4OrMorePerson\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 142918,\n \"min\": 103,\n \"max\": 967873,\n \"num_unique_values\": 58,\n \"samples\": [\n 155852,\n 2203,\n 7695\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"
+ }
},
- "metadata": {
- "tags": []
- }
+ "metadata": {}
}
]
},
@@ -3281,11 +5087,11 @@
},
"source": [
"### 2.2) Feature Representations\n",
- "If any of your data is discrete, getting a good encoding of discrete features is particularly important. You want to create “opportunities” for your model to find the underlying regularities. \n",
+ "If any of your data is discrete, getting a good encoding of discrete features is particularly important. You want to create “opportunities” for your model to find the underlying regularities.\n",
"\n",
"**2.2A) For each of the following encodings, name an example of data the encoding would work well on, as well as an example of data it would not work as well for. Explain your answers.**\n",
"\n",
- "* *Numeric* Assign each of these values a number, say 1.0/k, 2.0/k, . . . , 1.0. \n",
+ "* *Numeric* Assign each of these values a number, say 1.0/k, 2.0/k, . . . , 1.0.\n",
"\n",
"* *Thermometer code* Use a vector of length k binary variables, where we convert discrete input value $0 < j < k$ into a vector in which the first j values are 1.0 and the rest are 0.0.\n",
"\n",
@@ -3306,7 +5112,7 @@
"\n",
"# YOUR CODE HERE\n"
],
- "execution_count": null,
+ "execution_count": 32,
"outputs": []
},
{
@@ -3317,7 +5123,7 @@
"source": [
"### 2.3) Standardization\n",
"It is typically useful to scale numeric data, so that it tends to be in the range [−1, +1]. Without performing this transformation, if you have\n",
- "one feature with much larger values than another, it will take the learning algorithm a lot of work to find parameters that can put them on an equal basis. \n",
+ "one feature with much larger values than another, it will take the learning algorithm a lot of work to find parameters that can put them on an equal basis.\n",
"\n",
"Typically, we use the transformation\n",
"$$ \\phi(x) = \\frac{\\bar{x} − x}{\\sigma} $$\n",
@@ -3338,7 +5144,7 @@
"height": 1000
},
"id": "kd1jAmAwpr-j",
- "outputId": "ef50d1f0-c1ee-41d1-93bc-e8ac0704285e"
+ "outputId": "06a1c0f1-85e0-46c2-f1a2-4094fa9cd2d8"
},
"source": [
"# Create a new dataframe with each of the features standardized.\n",
@@ -3349,13 +5155,321 @@
"standardized_df = (features_df_new - features_df_new.mean())/features_df_new.std()\n",
"display(standardized_df)"
],
- "execution_count": null,
+ "execution_count": 38,
"outputs": [
{
"output_type": "display_data",
"data": {
+ "text/plain": [
+ " Household4orMore_percapita \\\n",
+ "place \n",
+ "Alameda County 0.301893 \n",
+ "Alpine County 0.212211 \n",
+ "Amador County -0.791243 \n",
+ "Butte County -0.570259 \n",
+ "Calaveras County -1.133400 \n",
+ "Colusa County 0.744456 \n",
+ "Contra Costa County 0.775682 \n",
+ "Del Norte County -0.285656 \n",
+ "El Dorado County -0.220820 \n",
+ "Fresno County 1.041174 \n",
+ "Glenn County 0.020462 \n",
+ "Humboldt County -0.841790 \n",
+ "Imperial County 0.654396 \n",
+ "Inyo County -1.585018 \n",
+ "Kern County 1.067227 \n",
+ "Kings County 0.898584 \n",
+ "Lake County -0.434828 \n",
+ "Lassen County -1.794500 \n",
+ "Los Angeles County 0.490074 \n",
+ "Madera County 0.881990 \n",
+ "Marin County -0.396175 \n",
+ "Mariposa County -0.847419 \n",
+ "Mendocino County -0.544231 \n",
+ "Merced County 1.046877 \n",
+ "Modoc County -1.833364 \n",
+ "Mono County -0.628314 \n",
+ "Monterey County 1.227660 \n",
+ "Napa County 0.020361 \n",
+ "Nevada County -0.983867 \n",
+ "Orange County 0.706024 \n",
+ "Placer County 0.325881 \n",
+ "Plumas County -1.854017 \n",
+ "Riverside County 1.034534 \n",
+ "Sacramento County 0.580139 \n",
+ "San Benito County 1.685932 \n",
+ "San Bernardino County 1.183045 \n",
+ "San Diego County 0.081748 \n",
+ "San Francisco County -1.054163 \n",
+ "San Joaquin County 1.398832 \n",
+ "San Luis Obispo County -0.573469 \n",
+ "San Mateo County 0.090865 \n",
+ "Santa Barbara County 0.513283 \n",
+ "Santa Clara County 0.499929 \n",
+ "Santa Cruz County 0.000964 \n",
+ "Shasta County -0.455432 \n",
+ "Sierra County -2.569206 \n",
+ "Siskiyou County -0.948483 \n",
+ "Solano County 0.410070 \n",
+ "Sonoma County -0.089479 \n",
+ "Stanislaus County 1.023151 \n",
+ "Sutter County 1.059388 \n",
+ "Tehama County 0.115027 \n",
+ "Trinity County -2.023355 \n",
+ "Tulare County 1.754334 \n",
+ "Tuolumne County -1.260431 \n",
+ "Ventura County 0.529362 \n",
+ "Yolo County 0.436891 \n",
+ "Yuba County 0.906470 \n",
+ "\n",
+ " CumulativeCount_MedicalConditionIncident_COVID_19_ConfirmedOrProbableCase \\\n",
+ "place \n",
+ "Alameda County 0.302089 \n",
+ "Alpine County -0.396268 \n",
+ "Amador County -0.373846 \n",
+ "Butte County -0.297747 \n",
+ "Calaveras County -0.377506 \n",
+ "Colusa County -0.385389 \n",
+ "Contra Costa County 0.119840 \n",
+ "Del Norte County -0.380883 \n",
+ "El Dorado County -0.321539 \n",
+ "Fresno County 0.237049 \n",
+ "Glenn County -0.380268 \n",
+ "Humboldt County -0.345044 \n",
+ "Imperial County -0.232484 \n",
+ "Inyo County -0.385175 \n",
+ "Kern County 0.205246 \n",
+ "Kings County -0.260243 \n",
+ "Lake County -0.367803 \n",
+ "Lassen County -0.370134 \n",
+ "Los Angeles County 6.757058 \n",
+ "Madera County -0.289129 \n",
+ "Marin County -0.301427 \n",
+ "Mariposa County -0.388842 \n",
+ "Mendocino County -0.355827 \n",
+ "Merced County -0.217126 \n",
+ "Modoc County -0.394118 \n",
+ "Mono County -0.388845 \n",
+ "Monterey County -0.162569 \n",
+ "Napa County -0.328286 \n",
+ "Nevada County -0.353527 \n",
+ "Orange County 1.080142 \n",
+ "Placer County -0.220648 \n",
+ "Plumas County -0.388267 \n",
+ "Riverside County 1.144857 \n",
+ "Sacramento County 0.376746 \n",
+ "San Benito County -0.363038 \n",
+ "San Bernardino County 1.072746 \n",
+ "San Diego County 1.631595 \n",
+ "San Francisco County -0.042492 \n",
+ "San Joaquin County 0.042468 \n",
+ "San Luis Obispo County -0.255011 \n",
+ "San Mateo County -0.059024 \n",
+ "Santa Barbara County -0.168612 \n",
+ "Santa Clara County 0.444651 \n",
+ "Santa Cruz County -0.267369 \n",
+ "Shasta County -0.305601 \n",
+ "Sierra County -0.395781 \n",
+ "Siskiyou County -0.378185 \n",
+ "Solano County -0.176640 \n",
+ "Sonoma County -0.174742 \n",
+ "Stanislaus County -0.060482 \n",
+ "Sutter County -0.339896 \n",
+ "Tehama County -0.360291 \n",
+ "Trinity County -0.392925 \n",
+ "Tulare County -0.061761 \n",
+ "Tuolumne County -0.362738 \n",
+ "Ventura County 0.061065 \n",
+ "Yolo County -0.295583 \n",
+ "Yuba County -0.352442 \n",
+ "\n",
+ " Count_Person Count_Person_MarriedAndNotSeparated \\\n",
+ "place \n",
+ "Alameda County 0.675051 0.784517 \n",
+ "Alpine County -0.465087 -0.485963 \n",
+ "Amador County -0.438345 -0.454354 \n",
+ "Butte County -0.319842 -0.342504 \n",
+ "Calaveras County -0.434072 -0.446276 \n",
+ "Colusa County -0.451059 -0.471040 \n",
+ "Contra Costa County 0.325029 0.438601 \n",
+ "Del Norte County -0.446660 -0.468504 \n",
+ "El Dorado County -0.333444 -0.314654 \n",
+ "Fresno County 0.221108 0.149000 \n",
+ "Glenn County -0.446443 -0.464011 \n",
+ "Humboldt County -0.373216 -0.404203 \n",
+ "Imperial County -0.342132 -0.379723 \n",
+ "Inyo County -0.453469 -0.473294 \n",
+ "Kern County 0.152779 0.102608 \n",
+ "Kings County -0.361058 -0.385293 \n",
+ "Lake County -0.421601 -0.440130 \n",
+ "Lassen County -0.445254 -0.465303 \n",
+ "Los Angeles County 6.358380 6.146813 \n",
+ "Madera County -0.357578 -0.379693 \n",
+ "Marin County -0.289240 -0.273284 \n",
+ "Mariposa County -0.454078 -0.472960 \n",
+ "Mendocino County -0.406789 -0.424519 \n",
+ "Merced County -0.274195 -0.309304 \n",
+ "Modoc County -0.459841 -0.479580 \n",
+ "Mono County -0.455880 -0.476558 \n",
+ "Monterey County -0.170110 -0.181603 \n",
+ "Napa County -0.372538 -0.377507 \n",
+ "Nevada County -0.397492 -0.400768 \n",
+ "Orange County 1.707662 1.975494 \n",
+ "Placer County -0.189297 -0.137140 \n",
+ "Plumas County -0.452837 -0.470721 \n",
+ "Riverside County 1.242556 1.277522 \n",
+ "Sacramento County 0.604238 0.609306 \n",
+ "San Benito County -0.421892 -0.438707 \n",
+ "San Bernardino County 1.036652 0.986197 \n",
+ "San Diego County 1.821298 1.949698 \n",
+ "San Francisco County 0.128925 0.091251 \n",
+ "San Joaquin County 0.061226 0.047266 \n",
+ "San Luis Obispo County -0.272138 -0.270906 \n",
+ "San Mateo County 0.054596 0.144431 \n",
+ "Santa Barbara County -0.160598 -0.184755 \n",
+ "Santa Clara County 0.843053 1.084493 \n",
+ "Santa Cruz County -0.280597 -0.292692 \n",
+ "Shasta County -0.342983 -0.349274 \n",
+ "Sierra County -0.463851 -0.484382 \n",
+ "Siskiyou County -0.436175 -0.453522 \n",
+ "Solano County -0.159109 -0.151382 \n",
+ "Sonoma County -0.129676 -0.107544 \n",
+ "Stanislaus County -0.088317 -0.102987 \n",
+ "Sutter County -0.399703 -0.412419 \n",
+ "Tehama County -0.421591 -0.437414 \n",
+ "Trinity County -0.457471 -0.477724 \n",
+ "Tulare County -0.144185 -0.179201 \n",
+ "Tuolumne County -0.428440 -0.442083 \n",
+ "Ventura County 0.111616 0.145957 \n",
+ "Yolo County -0.315049 -0.338206 \n",
+ "Yuba County -0.410839 -0.431068 \n",
+ "\n",
+ " Median_Income_Person \\\n",
+ "place \n",
+ "Alameda County 1.741535 \n",
+ "Alpine County -0.481857 \n",
+ "Amador County 0.152292 \n",
+ "Butte County -0.693629 \n",
+ "Calaveras County -0.328699 \n",
+ "Colusa County -0.352335 \n",
+ "Contra Costa County 1.487472 \n",
+ "Del Norte County -0.870742 \n",
+ "El Dorado County 0.925503 \n",
+ "Fresno County -0.664481 \n",
+ "Glenn County -0.532522 \n",
+ "Humboldt County -0.899571 \n",
+ "Imperial County -1.635155 \n",
+ "Inyo County 0.153670 \n",
+ "Kern County -0.978535 \n",
+ "Kings County -0.628974 \n",
+ "Lake County -0.909323 \n",
+ "Lassen County -0.620177 \n",
+ "Los Angeles County -0.215500 \n",
+ "Madera County -1.041707 \n",
+ "Marin County 2.641724 \n",
+ "Mariposa County -0.407557 \n",
+ "Mendocino County -0.576296 \n",
+ "Merced County -0.932853 \n",
+ "Modoc County -0.913986 \n",
+ "Mono County 0.625758 \n",
+ "Monterey County -0.326579 \n",
+ "Napa County 0.772451 \n",
+ "Nevada County 0.003161 \n",
+ "Orange County 0.666247 \n",
+ "Placer County 1.370139 \n",
+ "Plumas County -0.096895 \n",
+ "Riverside County -0.234790 \n",
+ "Sacramento County 0.233906 \n",
+ "San Benito County 0.473448 \n",
+ "San Bernardino County -0.420382 \n",
+ "San Diego County 0.563753 \n",
+ "San Francisco County 3.086042 \n",
+ "San Joaquin County -0.155826 \n",
+ "San Luis Obispo County 0.061033 \n",
+ "San Mateo County 2.456980 \n",
+ "Santa Barbara County -0.143849 \n",
+ "Santa Clara County 2.372928 \n",
+ "Santa Cruz County 0.407415 \n",
+ "Shasta County -0.491927 \n",
+ "Sierra County -0.895438 \n",
+ "Siskiyou County -0.935821 \n",
+ "Solano County 0.529199 \n",
+ "Sonoma County 0.865300 \n",
+ "Stanislaus County -0.425894 \n",
+ "Sutter County -0.621025 \n",
+ "Tehama County -0.755634 \n",
+ "Trinity County -1.025384 \n",
+ "Tulare County -0.934655 \n",
+ "Tuolumne County -0.260334 \n",
+ "Ventura County 0.270155 \n",
+ "Yolo County 0.044816 \n",
+ "Yuba County -0.496590 \n",
+ "\n",
+ " Count_Household_With4OrMorePerson \n",
+ "place \n",
+ "Alameda County 0.624734 \n",
+ "Alpine County -0.465040 \n",
+ "Amador County -0.445309 \n",
+ "Butte County -0.350940 \n",
+ "Calaveras County -0.444245 \n",
+ "Colusa County -0.450347 \n",
+ "Contra Costa County 0.362977 \n",
+ "Del Norte County -0.449605 \n",
+ "El Dorado County -0.352648 \n",
+ "Fresno County 0.289515 \n",
+ "Glenn County -0.448268 \n",
+ "Humboldt County -0.397799 \n",
+ "Imperial County -0.339031 \n",
+ "Inyo County -0.458463 \n",
+ "Kern County 0.217523 \n",
+ "Kings County -0.353445 \n",
+ "Lake County -0.429796 \n",
+ "Lassen County -0.454461 \n",
+ "Los Angeles County 6.306439 \n",
+ "Madera County -0.350066 \n",
+ "Marin County -0.320902 \n",
+ "Mariposa County -0.457134 \n",
+ "Mendocino County -0.419014 \n",
+ "Merced County -0.254830 \n",
+ "Modoc County -0.462507 \n",
+ "Mono County -0.458029 \n",
+ "Monterey County -0.129891 \n",
+ "Napa County -0.381671 \n",
+ "Nevada County -0.417496 \n",
+ "Orange County 1.782373 \n",
+ "Placer County -0.200134 \n",
+ "Plumas County -0.458771 \n",
+ "Riverside County 1.410336 \n",
+ "Sacramento County 0.614897 \n",
+ "San Benito County -0.411919 \n",
+ "San Bernardino County 1.227574 \n",
+ "San Diego County 1.622512 \n",
+ "San Francisco County -0.053960 \n",
+ "San Joaquin County 0.150359 \n",
+ "San Luis Obispo County -0.313548 \n",
+ "San Mateo County 0.010356 \n",
+ "Santa Barbara County -0.161455 \n",
+ "Santa Clara County 0.835673 \n",
+ "Santa Cruz County -0.299519 \n",
+ "Shasta County -0.366397 \n",
+ "Sierra County -0.464963 \n",
+ "Siskiyou County -0.444602 \n",
+ "Solano County -0.166122 \n",
+ "Sonoma County -0.169998 \n",
+ "Stanislaus County -0.052001 \n",
+ "Sutter County -0.392796 \n",
+ "Tehama County -0.425059 \n",
+ "Trinity County -0.461535 \n",
+ "Tulare County -0.067534 \n",
+ "Tuolumne County -0.441355 \n",
+ "Ventura County 0.111715 \n",
+ "Yolo County -0.317663 \n",
+ "Yuba County -0.406713 "
+ ],
"text/html": [
- "\n",
+ "\n",
+ "
\n",
+ "
\n",
"\n",
+ "\n",
+ " \n",
+ "
\n",
+ "\n",
+ "\n",
+ "
\n",
+ "
\n",
+ "\n",
+ "\n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ "\n",
+ "\n",
+ "\n",
+ " \n",
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "\n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ "
\n",
+ "\n",
+ "
\n",
+ "
\n"
],
- "text/plain": [
- " Count_Household_With4OrMorePerson ... Median_Income_Person\n",
- "place ... \n",
- "Alameda County 0.645561 ... 1.715969\n",
- "Alpine County -0.457721 ... -0.061411\n",
- "Amador County -0.438854 ... 0.026703\n",
- "Butte County -0.334097 ... -0.937795\n",
- "Calaveras County -0.439653 ... -0.250150\n",
- "Colusa County -0.442000 ... -0.524963\n",
- "Contra Costa County 0.334148 ... 1.525327\n",
- "Del Norte County -0.444515 ... -1.015847\n",
- "El Dorado County -0.357489 ... 0.777579\n",
- "Fresno County 0.265631 ... -0.778563\n",
- "Glenn County -0.436017 ... -0.902168\n",
- "Humboldt County -0.391951 ... -0.795425\n",
- "Imperial County -0.345257 ... -1.729464\n",
- "Inyo County -0.451058 ... 0.081911\n",
- "Kern County 0.214153 ... -0.809159\n",
- "Kings County -0.350498 ... -0.560589\n",
- "Lake County -0.423743 ... -0.787538\n",
- "Lassen County -0.447717 ... -0.020345\n",
- "Los Angeles County 6.370316 ... -0.133072\n",
- "Madera County -0.339337 ... -0.988923\n",
- "Marin County -0.308302 ... 2.978261\n",
- "Mariposa County -0.449230 ... -0.446775\n",
- "Mendocino County -0.411910 ... -0.568340\n",
- "Merced County -0.244528 ... -0.828196\n",
- "Modoc County -0.454407 ... -0.796513\n",
- "Mono County -0.454484 ... -0.035711\n",
- "Monterey County -0.145012 ... -0.374842\n",
- "Napa County -0.370877 ... 0.950408\n",
- "Nevada County -0.415651 ... 0.189742\n",
- "Orange County 1.764452 ... 0.703198\n",
- "Placer County -0.210438 ... 1.382413\n",
- "Plumas County -0.449790 ... 0.052675\n",
- "Riverside County 1.233233 ... -0.327250\n",
- "Sacramento County 0.534415 ... 0.178320\n",
- "San Benito County -0.410698 ... 0.351285\n",
- "San Bernardino County 1.176956 ... -0.507014\n",
- "San Diego County 1.633423 ... 0.454629\n",
- "San Francisco County -0.043323 ... 2.952561\n",
- "San Joaquin County 0.104884 ... -0.194262\n",
- "San Luis Obispo County -0.311840 ... 0.132495\n",
- "San Mateo County 0.050092 ... 2.469972\n",
- "Santa Barbara County -0.160249 ... -0.177673\n",
- "Santa Clara County 0.893860 ... 2.260021\n",
- "Santa Cruz County -0.290935 ... 0.028607\n",
- "Shasta County -0.356530 ... -0.538969\n",
- "Sierra County -0.456789 ... -0.384225\n",
- "Siskiyou County -0.438245 ... -0.732603\n",
- "Solano County -0.164782 ... 0.721963\n",
- "Sonoma County -0.162876 ... 0.769556\n",
- "Stanislaus County -0.048108 ... -0.424746\n",
- "Sutter County -0.385400 ... -0.629531\n",
- "Tehama County -0.421760 ... -1.074590\n",
- "Trinity County -0.454141 ... -0.960367\n",
- "Tulare County -0.076432 ... -1.083700\n",
- "Tuolumne County -0.430966 ... -0.204733\n",
- "Ventura County 0.120143 ... 0.387591\n",
- "Yolo County -0.312295 ... -0.086023\n",
- "Yuba County -0.401360 ... -0.419715\n",
- "\n",
- "[58 rows x 7 columns]"
- ]
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "dataframe",
+ "variable_name": "standardized_df",
+ "summary": "{\n \"name\": \"standardized_df\",\n \"rows\": 58,\n \"fields\": [\n {\n \"column\": \"place\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 58,\n \"samples\": [\n \"Alameda County\",\n \"Colusa County\",\n \"San Benito County\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Household4orMore_percapita\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.9999999999999999,\n \"min\": -2.5692060457103576,\n \"max\": 1.7543341601856828,\n \"num_unique_values\": 58,\n \"samples\": [\n 0.3018932253454933,\n 0.7444555619826428,\n 1.6859323974003189\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"CumulativeCount_MedicalConditionIncident_COVID_19_ConfirmedOrProbableCase\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.9999999999999999,\n \"min\": -0.3962677646621369,\n \"max\": 6.7570578949790265,\n \"num_unique_values\": 58,\n \"samples\": [\n 0.3020887137778709,\n -0.3853888421742521,\n -0.3630382275128438\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Count_Person\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1.0,\n \"min\": -0.46508698263926235,\n \"max\": 6.3583801117983265,\n \"num_unique_values\": 58,\n \"samples\": [\n 0.6750512250892897,\n -0.4510590337372829,\n -0.42189196370022525\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Count_Person_MarriedAndNotSeparated\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1.0,\n \"min\": -0.4859632418362064,\n \"max\": 6.14681328729229,\n \"num_unique_values\": 58,\n \"samples\": [\n 0.7845169265423353,\n -0.4710401341576856,\n -0.4387068114342304\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Median_Income_Person\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.9999999999999999,\n \"min\": -1.6351551960601571,\n \"max\": 3.0860416600135863,\n \"num_unique_values\": 58,\n \"samples\": [\n 1.7415345069413768,\n -0.35233532427198566,\n 0.473447508174651\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Count_Household_With4OrMorePerson\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.9999999999999999,\n \"min\": -0.4650402375487558,\n \"max\": 6.306438749858123,\n \"num_unique_values\": 58,\n \"samples\": [\n 0.624734275826978,\n -0.45034655426289816,\n -0.41191907397435995\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"
+ }
},
- "metadata": {
- "tags": []
- }
+ "metadata": {}
}
]
},
@@ -4073,7 +6330,7 @@
"colab": {
"base_uri": "https://localhost:8080/"
},
- "outputId": "5682f9f7-8a57-4d2d-b043-a0cc4700d271"
+ "outputId": "43e87498-7ae9-4a50-f9ba-08a8ba53a6a0"
},
"source": [
"# Run me!\n",
@@ -4081,7 +6338,7 @@
"# Convert Dataframes into data and labels for the model\n",
"target_df = standardized_df\n",
"X = target_df[['Household4orMore_percapita','Median_Income_Person']]\n",
- "Y = target_df[['CumulativeCount_MedicalTest_ConditionCOVID_19_Positive']]\n",
+ "Y = target_df[['CumulativeCount_MedicalConditionIncident_COVID_19_ConfirmedOrProbableCase']]\n",
"\n",
"# Split into training and test sets\n",
"x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.2)\n",
@@ -4092,15 +6349,15 @@
"print('Model Intercept: {}'.format(model.intercept_))\n",
"print('Model Coefficients: {}'.format(model.coef_))"
],
- "execution_count": null,
+ "execution_count": 19,
"outputs": [
{
"output_type": "stream",
+ "name": "stdout",
"text": [
- "Model Intercept: [0.01562292]\n",
- "Model Coefficients: [[0.21038206 0.08063848]]\n"
- ],
- "name": "stdout"
+ "Model Intercept: [-0.139239]\n",
+ "Model Coefficients: [[0.17036494 0.06717593]]\n"
+ ]
}
]
},
@@ -4120,7 +6377,7 @@
"base_uri": "https://localhost:8080/"
},
"id": "1fI_htJPKjqS",
- "outputId": "5ed9993b-28b1-44e3-b7fb-5099b7e1157b"
+ "outputId": "4c45f4d3-7a8a-40db-97ec-736ce77bf1d2"
},
"source": [
"train_pred = model.predict(x_train)\n",
@@ -4129,15 +6386,15 @@
"print('Training Error: {}'.format(mse(train_pred, y_train)))\n",
"print('Test Error: {}'.format(mse(test_pred, y_test)))\n"
],
- "execution_count": null,
+ "execution_count": 20,
"outputs": [
{
"output_type": "stream",
+ "name": "stdout",
"text": [
- "Training Error: 1.1252156802381956\n",
- "Test Error: 0.19184932363718019\n"
- ],
- "name": "stdout"
+ "Training Error: 0.11404453554282165\n",
+ "Test Error: 4.153066573871609\n"
+ ]
}
]
},