Reproducible Research: Course Project 2 (Storm data Analysis)

Author: Benedict Neo Yao En Date Created: 18th November 2020

Introduction

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the course web site:

  • Storm Data [47Mb]

There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.

  • National Weather Service Storm Data Documentation
  • National Climatic Data Center Storm Events FAQ

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events. You must use the database to answer the questions below and show the code for your entire analysis. Your analysis can consist of tables, figures, or other summaries. You may use any R package you want to support your analysis.

Your data analysis must address the following questions:

Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

Across the United States, which types of events have the greatest economic consequences?

Consider writing your report as if it were to be read by a government or municipal manager who might be responsible for preparing for severe weather events and will need to prioritize resources for different types of events. However, there is no need to make any specific recommendations in your report.

Software Environment information

Loading packages, download and read bz2 file.

Look at column names at the data,

we see there’s over 37 of variables. However, for the purpose of this analysis, we won’t been needing all the columns, so I’ll be using dplyr to subset them and lowercase them

Based on the information above, the data table now has 902,297 rows and 7 columns. Below is a brief description of each variable.

  • evtype : storm event type
  • fatalities: amount of fatalities per event
  • injuries : amount of injuries per event
  • propdmg : property damage amount
  • propdmgexp: property damage in exponents
  • cropdmg : crop damage amount
  • cropdmgexp: crop damage in exponents

Data Processing

Processing data for population health analysis.

First I select columns I need for the bar plot, group it by event type and calculate sum of both fatalities and injuries. Then, arrange it in descending order and slice the first 10 rows, then gather it and turning it into categorical variables for creating a grouped bar plot.

Processing data for economic consequences analysis

the variable PROPDMGEXP is regarding property damage expenses, so it can be utilized to denote the events with greatest economic consequences

The values for the exponents for property and crop damage costs are messy, so I created a function to deal with that, and to calculate the cost with their respective exponent values (but in millions).

Aside from the function to calculate cost, the methodology is pretty much the same for the rest of the manipulation.

With the data processed and ready for creating plots, we can now answer both questions.

1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

Based on the bar plot, it’s evident that tornadoes have the highest impact on the popoulation health, since it causes the most fatalities and injuries.

2. Across the United States, which types of events have the greatest economic consequences?

From the bar plot, Floods and Hurricanes/Typhoons have highest property and crop damage costs, thus resulting in the biggest economic consequences.

Based on the analysis, resources should be directed towards dealing with tornadoes for the safety and health of population by building better infrastructure or early warning systems. As for dealing with hurricanes and typhoons, there should be more funding for innovation in developing better systems and infrastructure to safeguard these properties and crops to prevent damages as much as possible.

Search code, repositories, users, issues, pull requests...

Provide feedback.

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

  • Notifications

Repository created.

kpacharya08/REPRODUCIBLE-RESEARCH-Course-project-2

Folders and files, repository files navigation.

Link on RPubs My Coursera-Reproducible-Research Repo

Introduction

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the course web site:

Storm Data [47Mb] There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.

National Weather Service Storm Data Documentation

National Climatic Data Center Storm Events FAQ

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events. You must use the database to answer the questions below and show the code for your entire analysis. Your analysis can consist of tables, figures, or other summaries. You may use any R package you want to support your analysis.

Your data analysis must address the following questions:

Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

Across the United States, which types of events have the greatest economic consequences?

Consider writing your report as if it were to be read by a government or municipal manager who might be responsible for preparing for severe weather events and will need to prioritize resources for different types of events. However, there is no need to make any specific recommendations in your report.

Requirements

For this assignment you will need some specific tools

RStudio: You will need RStudio to publish your completed analysis document to RPubs. You can also use RStudio to edit/write your analysis.

knitr: You will need the knitr package in order to compile your R Markdown document and convert it to HTML

Document Layout

Language: Your document should be written in English.

Title: Your document should have a title that briefly summarizes your data analysis

Synopsis: Immediately after the title, there should be a synopsis which describes and summarizes your analysis in at most 10 complete sentences .

There should be a section titled Data Processing which describes (in words and code) how the data were loaded into R and processed for analysis. In particular, your analysis must start from the raw CSV file containing the data. You cannot do any preprocessing outside the document. If preprocessing is time-consuming you may consider using the cache = TRUE option for certain code chunks.

There should be a section titled Results in which your results are presented.

You may have other sections in your analysis, but Data Processing and Results are required .

The analysis document must have at least one figure containing a plot .

Your analyis must have no more than three figures . Figures may have multiple plots in them (i.e. panel plots), but there cannot be more than three figures total.

You must show all your code for the work in your analysis document. This may make the document a bit verbose, but that is okay. In general, you should ensure that echo = TRUE for every code chunk (this is the default setting in knitr).

Publishing Your Analysis

For this assignment you will need to publish your analysis on RPubs.com . If you do not already have an account, then you will have to create a new account. After you have completed writing your analysis in RStudio, you can publish it to RPubs by doing the following:

In RStudio, make sure your R Markdown document ( .Rmd ) document is loaded in the editor

Click the Knit HTML button in the doc toolbar to preview your document.

In the preview window, click the Publish button.

Once your document is published to RPubs, you should get a unique URL to that document. Make a note of this URL as you will need it to submit your assignment.

NOTE : If you are having trouble connecting with RPubs due to proxy-related or other issues, you can upload your final analysis document file as a PDF to Coursera instead.

Submitting Your Assignment

In order to submit this assignment, you must copy the RPubs URL for your completed data analysis document in to the peer assessment question.

  • HTML 100.0%
  • Coursera - Reproducible Research - Course Project 2
  • by Adam Crawford
  • Last updated over 3 years ago
  • Hide Comments (–) Share Hide Toolbars

Twitter Facebook Google+

Or copy & paste this link into an email or IM:

Reproducible Research. week 2. Course Project 1

John jairo prado piñeres, 0. turn off scientific notation, 0.1. we load the libraries that we are going to use, loading and preprocessing the data., show any code that is needed to., general exploratory analysis and data type., basic statistics, conclusions, the calculation of the mean and the median has a value of 1177.5, for this reason the distribution of the interval variable is symmetric., in the variable steps 2304 values are missing., what is mean total number of steps taken per day, 1. calculate the total number of steps taken per day., 2. histogram of the total number of steps taken each day., 3. mean and median of total number of steps taken per day, the mean has a value of 10766.19, while the mean has a value of 10765, what is the average daily activity pattern, make a time series plot (i.e. type = “l”) of the 5-minute interval (x-axis) and the average number of steps taken, averaged across all days (y-axis), 2. which 5-minute interval, on average across all the days in the dataset, contains the maximum number of steps.

##Imputing missing values

1. Calculate and report the total number of missing values in the dataset (i.e. the total number of rows with NAs)

2. devise a strategy for filling in all of the missing values in the dataset. the strategy does not need to be sophisticated. for example, you could use the mean/median for that day, or the mean for that 5-minute interval, etc., 3. create a new dataset that is equal to the original dataset but with the missing data filled in., 4. make a histogram of the total number of steps taken each day and calculate and report the mean and median total number of steps taken per day. do these values differ from the estimates from the first part of the assignment what is the impact of imputing missing data on the estimates of the total daily number of steps, the low mean from 10766.19 to 10765.64 and the low median from 10765 to 10762 reviewing the histogram it can be seen that the only interval that is changed is the one that oscillates 10,000 and 12500 steps, increased from a frequency of 18 to 26., it is observed that when filling the gap of the missing values with the mean it has the disadvantage of modification of the distribution of the variable becoming more narrow as it reduces its variance and its advantage is the ease of application of the method., are there differences in activity patterns between weekdays and weekends, for this part the weekdays() function may be of some help here. use the dataset with the filled-in missing values for this part., 1. create a new factor variable in the dataset with two levels - “weekday” and “weekend” indicating whether a given date is a weekday or weekend day., 2. make a panel plot containing a time series plot (i.e. type = “l”) of the 5-minute interval (x-axis) and the average number of steps taken, averaged across all weekday days or weekend days (y-axis). see the readme file in the github repository to see an example of what this plot should look like using simulated data..

IMAGES

  1. GitHub

    reproducible research course project 2

  2. Forms: Enroll & Evaluate a Reproducible Research Course on Behance

    reproducible research course project 2

  3. Chapter 1 Introduction to Reproducible Research

    reproducible research course project 2

  4. Guide for Reproducible Research

    reproducible research course project 2

  5. Statistical Analysis with R for Reproducible Research

    reproducible research course project 2

  6. GitHub

    reproducible research course project 2

VIDEO

  1. Reproducible Research, week (1-4) All Quiz Answers with Assignments

  2. [Course 5] Reproducible Research John Hopkins Data Science Specialization #courseraanswers

  3. Workshop: Creating reproducible research reports using RMarkdown

  4. Issues in Reproducible Research

  5. Reproducible Research with R

  6. Inserting references into R markdown documents for a scientific paper (CC072)

COMMENTS

  1. Reproducible Research Course Project 2

    This project involves exploring the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

  2. Course Project 2

    This is the second course project in the Reproducible Research course. Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

  3. RPubs

    RPubs - Reproducible Research - Week 4 - Course Project 2. by Tiago Adria Nunes. 12 months ago.

  4. Reproducible Research > Week 4 > Course Project 2

    Synopsis. This is the second course project from the Reproducible Research course. Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage. Preventing such outcomes is of a key importance.

  5. RPubs

    Reproducible Research - Course Project 2. by Norhan Osama. Last updatedalmost 5 years ago. HideComments(-)ShareHide Toolbars. ×. Post on: TwitterFacebookGoogle+. Or copy & paste this link into an email or IM:

  6. RPubs

    Password. Forgot your password? Sign InCancel. RPubs. by RStudio. Sign inRegister. Coursera Reproducible Research: Course Project 2. by kasereka. Last updatedover 3 years ago.

  7. Reproducible Research

    There are 4 modules in this course. This course focuses on the concepts and tools behind reporting modern data analyses in a reproducible manner. Reproducible research is the idea that data analyses, and more generally, scientific claims, are published with their data and software code so that others may verify the findings and build upon them.

  8. Reproducible Research: Course Project 2 (Storm data Analysis)

    The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events. You must use the database to answer the questions below and show the code for your entire analysis. Your analysis can consist of tables, figures, or other summaries.

  9. Reproducible Research

    Synopsis. As part of the second project in the Coursera course Reproducible Research, we are asked to analyze economic and health consequences of severe weather events during the years 1950 - 2011 using the NOAA database. The NOAA database tracks storms and weather events in the US. We are asked to answer the following questions:

  10. RPubs

    RPubs - Reproducible Research - Week 4 Course - Project 2. by Ethan Schnelle.

  11. Reproducible Research

    Offered by Johns Hopkins University. This course focuses on the concepts and tools behind reporting modern data analyses in a reproducible ... Enroll for free.

  12. Coursera-reproducible-research-project-2

    Everything for course project 2 from the reproducible research course by Johns Hopkins on Coursera - GitHub - didda6429/Coursera-reproducible-research-project-2: Everything for course project 2 fro...

  13. Reproducible Research:Peer-graded Assignment: Course Project 2

    Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern. This project involves exploring the U.S. National Oceanic and Atmospheric ...

  14. Natasha-R/Reproducible-Research-Course-Project-1

    This is my submission for the Coursera assignment, for the Reproducible Research course. The files in this repo are: The R Markdown document, containing the R code and written explanations, can be found in the file PA1_template.Rmd.The same document, in Markdown format, can be found at PA1_template.md, and the HTML version at PA1_template.html.; The "figure" folder contains all of the ...

  15. kpacharya08/REPRODUCIBLE-RESEARCH-Course-project-2

    The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events. You must use the database to answer the questions below and show the code for your entire analysis. Your analysis can consist of tables, figures, or other summaries. You may use any R package you want to support ...

  16. RPubs

    Coursera - Reproducible Research - Course Project 2; by Adam Crawford; Last updated over 3 years ago; Hide Comments (-) Share Hide Toolbars

  17. RPubs

    Reproducible Research Course Project 2; by Mohammed Sarfraz Arif; Last updated about 2 years ago; Hide Comments (-) Share Hide Toolbars

  18. Reproducible Research Week(4)

    This project involves exploring the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

  19. Learn Reproducible Research Online

    In summary, here are 10 of our most popular reproducible research courses. Reproducible Research: Johns Hopkins University. Data Science: Foundations using R: Johns Hopkins University. Expressway to Data Science: R Programming and Tidyverse: University of Colorado Boulder. Communicating Data Science Results: University of Washington.

  20. Reproducible Research. week 2. Course Project 1

    Reproducible Research. week 2. Course Project 1. 0. Turn off scientific notation. options (scipen = 999) 0.1. We load the libraries that we are going to use. packages <- c ('dplyr', #For data manipulation. 'lubridate', #To work with date-times and time-spans. 'ggplot2', #For graphics 'sqldf', #configure and transparently import a database ...

  21. RPubs

    Forgot your password? Sign InCancel. RPubs. by RStudio. Sign inRegister. Coursera Reproducible Research - Course Project 2. by Miftahul Hilmi. Last updatedover 3 years ago. HideComments(-)ShareHide Toolbars.

  22. RPubs

    Password. Forgot your password? Sign InCancel. RPubs. by RStudio. Sign inRegister. Reproducible Research - Week 4 - Project 2. by Francisco Guzman. Last updatedover 3 years ago.