coursera applied machine learning in python assignment 4

For educators
English (US)
English (India)
English (UK)
Greek Alphabet

This problem has been solved!

You'll get a detailed solution from a subject matter expert that helps you learn core concepts.

Question: Course - Coursera - Applied machine learning by Python - module 4 - Assignment 4 - Predicting and understanding viewer engagement with educational videos. About the prediction problem One critical property of a video is engagement: how interesting or "engaging" it is for viewers, so that they decide to keep watching. Engagement is critical for learning,

Course - Coursera - Applied machine learning by Python - module 4 - Assignment 4 - Predicting and understanding viewer engagement with educational videos.

About the prediction problem

One critical property of a video is engagement: how interesting or "engaging" it is for viewers, so that they decide to keep watching. Engagement is critical for learning, whether the instruction is coming from a video or any other source. There are many ways to define engagement with video, but one common approach is to estimate it by measuring how much of the video a user watches. If the video is not interesting and does not engage a viewer, they will typically abandon it quickly, e.g. only watch 5 or 10% of the total.

A first step towards providing the best-matching educational content is to understand which features of educational material make it engaging for learners in general. This is where predictive modeling can be applied, via supervised machine learning. For this assignment, your task is to predict how engaging an educational video is likely to be for viewers, based on a set of features extracted from the video's transcript, audio track, hosting site, and other sources.

We chose this prediction problem for several reasons:

It combines a variety of features derived from a rich set of resources connected to the original data;

The manageable dataset size means the dataset and supervised models for it can be easily explored on a wide variety of computing platforms;

Predicting popularity or engagement for a media item, especially combined with understanding which features contribute to its success with viewers, is a fun problem but also a practical representative application of machine learning in a number of business and educational sectors.

About the dataset

We extracted training and test datasets of educational video features from the VLE Dataset put together by researcher Sahan Bulathwela at University College London.

We provide you with two data files for use in training and validating your models: train.csv and test.csv. Each row in these two files corresponds to a single educational video, and includes information about diverse properties of the video content as described further below. The target variable is engagement which was defined as True if the median percentage of the video watched across all viewers was at least 30%, and False otherwise.

Note: Any extra variables that may be included in the training set are simply for your interest if you want an additional source of data for visualization, or to enable unsupervised and semi-supervised approaches. However, they are not included in the test set and thus cannot be used for prediction. Only the data already included in your Coursera directory can be used for training the model for this assignment.

For this final assignment, you will bring together what you've learned across all four weeks of this course, by exploring different prediction models for this new dataset. In addition, we encourage you to apply what you've learned about model selection to do hyperparameter tuning using training/validation splits of the training data, to optimize the model and further increase its performance. In addition to a basic evaluation of model accuracy, we've also provided a utility function to visualize which features are most and least contributing to the overall model performance.

File descriptions assets/train.csv - the training set (Use only this data for training your model!) assets/test.csv - the test set

Data fields

train.csv & test.csv:

train.csv only:

Your predictions will be given as the probability that the corresponding video will be engaging to learners.

The evaluation metric for this assignment is the Area Under the ROC Curve (AUC).

Your grade will be based on the AUC score computed for your classifier. A model with an AUC (area under ROC curve) of at least 0.8 passes this assignment, and over 0.85 will receive full points.

For this assignment, create a function that trains a model to predict significant learner engagement with a video using asset/train.csv. Using this model, return a Pandas Series object of length 2309 with the data being the probability that each corresponding video from readonly/test.csv will be engaging (according to a model learned from the 'engagement' label in the training set), and the video index being in the id field.

Make sure your code is working before submitting it to the autograder.

Print out and check your result to see whether there is anything weird (e.g., all probabilities are the same).

Generally the total runtime should be less than 10 mins.

Try to avoid global variables. If you have other functions besides engagement_model, you should move those functions inside the scope of engagement_model.

Be sure to first check the pinned threads in Week 4's discussion forum if you run into a problem you can't figure out.

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.metrics import roc_auc_score

from sklearn.ensemble import RandomForestClassifier

from sklearn.model_selection import GridSearchCV

def engagement_model():

train_df = pd.read_csv('assets/train.csv')

test_df = pd.read_csv('assets/test.csv')

# create a copy of the test dataset to store the predictions

predictions_df = pd.DataFrame({'id': test_df['id']})

# split the training dataset into training and validation sets

train_set, val_set = train_test_split(train_df, test_size=0.2, random_state=42)

# separate the target variable from the features in the training and validation sets

y_train = train_set['engagement']

X_train = train_set.drop(['id', 'engagement'], axis=1)

y_val = val_set['engagement']

X_val = val_set.drop(['id', 'engagement'], axis=1)

# set up the parameter grid for GridSearchCV

param_grid = {

'n_estimators': [100, 200, 300],

'max_depth': [10, 20, 30],

'min_samples_leaf': [1, 2, 4]

# create a Random Forest classifier object

rf_clf = RandomForestClassifier(random_state=42)

# create a GridSearchCV object

grid_search = GridSearchCV(rf_clf, param_grid, cv=5, scoring='roc_auc', n_jobs=-1)

# fit the GridSearchCV object to the training set

grid_search.fit(X_train, y_train)

# print the best hyperparameters

print(grid_search.best_params_)

# get the best model found by GridSearchCV

best_rf_clf = grid_search.best_estimator_

# define X_test by dropping the 'id' column from the test dataset

X_test = test_df.drop('id', axis=1)

# make predictions on the validation set

y_val_pred = best_rf_clf.predict_proba(X_val)[:, 1]

# calculate the AUC score

val_auc_score = roc_auc_score(y_val, y_val_pred)

print('Validation AUC score:', val_auc_score)

# make predictions on the test set

y_test_pred = best_rf_clf.predict(X_test)

# create a series with values of y_predicted and ticket_id as index values

ans = pd.Series(y_test_pred, index=test_df['id'], name='engagement')

return ans # call the function and print the predictions

ans = engagement_model()

Getting this error:

Please help to resolve the problem.

It looks like the indentation in your code is inconsistent. Python relies on indentation to define t...

Not the question you’re looking for?

Post any question and get expert help quickly.

IMAGES

Applied Machine Learning in Python
Applied Machine Learning in Python Coursera Assignment Answers
Applied Machine Learning in Python
Applied Machine Learning in python university of michigan All weeks assignment and quiz Ans Coursera
coursera-applied-machine-learning-with-python/Module+4.py at master
Coursera-Applied-Machine-Learning-with-Python-/week_4_Assignment.ipynb

VIDEO

Top 5 Python AI Projects For Resume
NPTEL Programming, Data Structures and Algorithms using Python Assignment 4 Answers Week 4 July 2024
Introduction to Data Science in Python University of Michigan
Coursera's Applied Data Science with Python Specialization first impression
Python for Data Science, AI & Development IBM Skills Network
many students in many courses assignment using Databases with Python