Data Science in Practice
Data wrangling, data wrangling ¶.
In this notebook, we will focus on loading different types of data files. Other aspects of ‘wrangling’ such as combining different datasets will be covered in future tutorials, and are explored in the assignments.
Note: Throughout this notebook, we will be using ! to run the shell command cat to print out the contents of example data files.
Python I/O ¶
Let’s start with basic Python utilities for reading and loading data files.
Since opening and closing files basically always goes together, there is a shortcut to do both of them together, which is the with keyword.
By using with , file objects will be opened, and then automatically closed at the end of the code block.
Using input / output functionality from standard library Python is a pretty ‘low level’ way to read data files. This strategy often takes a lot of work to organize and define the details of how files are organized and how to read them. For example, in the above simple example, we had to deal with the new line character explicitly.
As long as you have reasonably well structured data files, using standardized file types, you can use higher-level functions that will take care of a lot of these details - loading data straight into pandas data objects, for example.
Pandas I/O ¶
File types ¶.
There are many different file types in which data may be stored.
Here, we will start by examining CSV and JSON files.
CSV Files ¶
Csv files with python ¶, csv files with pandas ¶.
As we can see, using Pandas save us from having to do more work (write more code) to use load the file.
JSON Files ¶
Json files with python ¶, json files with pandas ¶, conclusion ¶.
As a general guideline, for loading and wrangling data files, using standardized data files, and loading them with ‘higher-level’ tools such as Pandas makes it easier to work with data files.
Data Gathering
Data Cleaning
Navigation Menu
Search code, repositories, users, issues, pull requests..., provide feedback.
We read every piece of feedback, and take your input very seriously.
Saved searches
Use saved searches to filter your results more quickly.
To see all available qualifiers, see our documentation .
- Notifications You must be signed in to change notification settings
Data Wrangling, Analysis and AB Testing with SQL
mbaradas/Data_Wrangling_Analysis_and_AB_Testing_with_SQL
Folders and files, repository files navigation, data_wrangling_analysis_and_ab_testing_with_sql.
This course allows you to apply the SQL skills taught in “SQL for Data Science” to four increasingly complex and authentic data science inquiry case studies. We'll learn how to convert timestamps of all types to common formats and perform date/time calculations. We'll select and perform the optimal JOIN for a data science inquiry and clean data within an analysis dataset by deduping, running quality checks, backfilling, and handling nulls. We'll learn how to segment and analyze data per segment using windowing functions and use case statements to execute conditional logic to address a data science inquiry. We'll also describe how to convert a query into a scheduled job and how to insert data into a date partition. Finally, given a predictive analysis need, we'll engineer a feature from raw data using the tools and skills we've built over the course. The real-world application of these skills will give you the framework for performing the analysis of an AB test.
- TSQL 100.0%
IMAGES
VIDEO