Top 12 Real Estate Data Science Projects (Updated for 2024)

Top 12 Real Estate Data Science Projects (Updated for 2024)

In the KPMG Global PropTech Survey 2018 , 49% of respondents identified artificial intelligence, big data, and data analytics as the technologies expected to have the most significant long-term impact on the real estate industry.

Are you interested in participating in the data-driven development of the real estate industry? Do you want to discover patterns in the real estate market? Here are 12 awesome real estate machine learning projects to get you started.

Doma: Property Risk Evaluator Take-Home

Doma Take-home challenge

We would like you to use a Jupyter (python) notebook to work with a slice of this data. You’ll get a sense of the type of questions that we deal with at States Title, and we’ll get a sense of your data science approach.

How you can do it:

Write python code that allows you to stand up a nationwide title insurance company:

  • It should read the files default_notices.csv , train_property_data.csv , and test_property_data.csv , described below.
  • It should append a new column, risk, to the test_property_data.csv file, which represents your prediction of the overall title risk for the property. This column should behave in such a way that properties with lower risk are predicted to be more profitable than properties with higher risk.
  • You are at complete freedom to set the method for measuring risk, and the column itself can contain any real-valued number that satisfies part.

Real Estate Machine Learning Project For House Price Prediction

Want to learn how to build and evaluate a model’s performance and predictive power using machine learning regression algorithms? Developing a house price prediction model is a great way to start.

There’s a ton of accessible housing data online, e.g., sites like Zillow and Airbnb, and these datasets are perfect for executing this type of project. Zillow’s free datasets are a popular choice; the Zillow Home Value Index (ZHVI) is a smoothed, seasonally adjusted average of housing market values by region and housing type. There are also datasets on rentals, housing inventories, and price forecasts.

The project consists of two phases: Developing a model and training the data, then applying different regression algorithms and testing for the best fit.

London house price indices

How you can do it: This tutorial by Victor Roman takes you through all the steps of collecting, cleaning, and exploring housing data, then developing a machine learning model and applying different regression algorithms, and evaluating the model’s performance.

A more straightforward approach can be building a linear regression model and using K-fold cross-validation to measure the model’s accuracy. VarunSonavni uses this method with Python to examine the Bengaluru House price dataset on Kaggle in this tutorial .

WanderJaunt: Rental Price Analysis Take-Home

Wanderjaunt Take-Home

Data on short-term rental prices and occupancy is very important to WanderJaunt. It helps inform us how competitors are pricing, which influences our own pricing strategy and helps us benchmark our own occupancy and revenue per available room against similar properties.

In addition, it provides key inputs to the decision of what locations and markets we enter and what types of properties can be the most profitable.

Questions to answer:

  • What data would you exclude from analysis for being unreliable or potentially a block instead of an actual booking?
  • What is a good approach to estimating occupancy and revenue per unit?
  • Which month appears to be more profitable? April or May?
  • How much more revenue do places with 3 bedrooms make vs. places with 2 bedrooms?
  • What are any other interesting insights you may have found?

Real Estate Data Science Capstone Project

Real estate developers and investors have always sought to understand where to acquire property and when to trigger development. They look for places where the housing prices are low, and the facilities (shops, restaurants, parks, hotels, etc.) and social venues are nearby.

According to the latest report by the prestigious Mckinsey consulting firm , big data and data analytics is the way to analyze the ton of nontraditional valuables that affect house prices and quickly identify potential investment opportunities.

k-means clustering Real Estate Data Science Capstone Project

How you can do it: This real estate data science capstone project tutorial by Muhammad Taha Khan uses publicly available data from Wikipedia and Foursquare API to develop a machine learning model that can cluster the data mentioned above visually for the large city of London.

The model uses an unsupervised learning K-means algorithm to cluster the boroughs and folium Python library to visualize and display the resulting clusters.The project includes housing data sets, and you can also check the code in its GitHub repository .

House Price Forecasting Using Zillow Economics Dataset

Clients, real estate agents, home trading firms, and other investors often have biased assumptions about whether home values ​​in a particular area will rise or fall. The recent UK and Australian-based studies suggest valuations between two professionals can differ by up to 40% .

So instead of making potentially biased or inaccurate assumptions, it’s better to use statistical methods to predict the value of homes over time.

The latest application combining an extensive database of traditional and nontraditional data, was used to forecast the three-year rent per square foot for multifamily buildings in Seattle. These machine-learning models predicted rents with an accuracy rate that exceeded 90 percent .

House Price Forecasting Using Zillow Economics Dataset

How you can do it: Follow Uma Gajendragadkar’s tutorial Using the Zillow Economic Dataset and Time Series Modeling with ARIMA to see how this project performs.

Identifying Real Estate Opportunities Using Machine Learning

In 2018, Skyline AI, a NewYork-based commercial real estate investment startup that uses machine learning algorithms to identify possible investment opportunities, acquired two multifamily residential complexes in Philadelphia for $26 million.

According to their PR release, they claim that they closed the deal with a price that was 12% under its expected value. “We saw that similar assets that had already been renovated were able to increase their rents by about $300 per unit,” Skyline AI CEO Guy Zipori .

Such a remarkable performance convinced lots of real estate investors that maybe they should be increasingly relying on machine learning. But developing machine learning algorithms that can accurately identify these opportunities is not easy, as the variables that affect pricing are not always easy to recognize.

Identifying Real Estate Opportunities Using Machine Learning data set

How you can do it: This project develops a property price classification model using a current decade dataset from publicly available data from the Volusia County, Florida, Real Estate Appraisers website.

Algorithms utilize powerful machine learning, namely logistic regression, random forest, voting classifier, and XGBoost. The developed model can help real estate investors, mortgage lenders, and financial institutions make informed decisions.

You can use the study by Alejandro Baldominos to learn more about accomplishing such a daunting task. Published Public case studies are available at Cornell for more in-depth analysis.

Exploratory Data Analysis Of House Prices

Exploratory data analysis is a core skill for any aspiring data scientist. Learning how to explore and analyze data is a necessary process not only for training a particular model but also for various other purposes.

Advantages of performing an EDA:

  • Significantly improves one’s understanding of the dataset.
  • It helps to identify distribution, unique characteristics, or patterns in the dataset.
  • It enables one to find outliers, duplicates, or null values.
  • It represents the data visually in a more understandable manner.

House Prices data set

How you can do it: This project uses a house prices dataset from Kaggle to perform such analysis in a simple and easy-to-understand way. You can also complete your research using this weekly updated USA housing dataset .

California Housing Price Prediction Machine Learning Project

Experimenting with accurate data is always the best way to learn about the fundamental challenges you face in the workplace. In this real-data project tutorial , Gurupratap S Matharu goes through an end-to-end real estate machine learning project to predict house prices in California using advanced regression.

California Housing Price Prediction data set

How You Can Do It: The tutorial covers all the steps from understanding the business goals and acquiring the dataset, processing the data and experimenting with different ML models to find the best fit, and finally launching, monitoring, and maintaining the system.

If you like it, you can try to recreate the same project using different housing datasets from Kaggle.

Predicting Crimes And Creating A Safety District Index

Living in a safe community is something everyone is actively seeking. The Seattle Open Data Project provides access to the Seattle City Police Department’s 911 emergency response as a part of its open data project.

Using this data, you can cluster and map different types of crime and organize them by severity. Then overlay them on a population density-based crime density map to construct a model that predicts crimes and groups regions based on a safety index.

Predicting Crimes And Creating A Safety District Index

The cleaned dataset and code used to build this project are available in Jay Feng’s GitHub repository , and you can follow his blog post for more details on how to perform this type of analysis.

Airbnb Market Analysis & Real Estate Sales Data

This dataset offers an extensive collection of information related to the Airbnb rental market and property sales in two distinct regions in California: Big Bear and Joshua Tree, complete with their corresponding zip codes (92314, 92315, 92284, and 92252).

By using this dataset, you can gain insights into the real estate market in CA.

Building Permit Classifier Take-Home

Buildzoom logo

We would like you to use a Jupyter (Python) notebook to build a classifier model for predicting the type of building permits. By working with this dataset, you will gain insight into practical data science tasks and showcase your approach to solving classification problems.

Develop a Python script that performs the following tasks:

  • Read the files train_data.csv and xtest_data.csv , which contain building permit data with various features.
  • Build a classifier to predict whether a building permit’s type is “ELECTRICAL” or not using the training data.
  • Apply the trained classifier to the xtest_data.csv file to generate predictions for each permit.
  • Output your predictions in a file named ytest_pred.csv , where each row corresponds to a permit in xtest_data.csv and contains either a 1 (for “ELECTRICAL”) or a 0 (for not “ELECTRICAL”).

BONUS: House Prices – Advanced Regression Techniques Competition

– the well-known data science community – hosted a competition for data science students who have completed an online machine learning course and want to expand their skills before trying out featured competitions.

Joining competitions is an excellent opportunity to build and test all data scientist skills and expand your portfolio.

More Project Ideas from Interview Query

If you want more projects to develop your skills further, try our new Takehomes , where you solve more prolonged problems step-by-step with notebooks from different companies.

Takehomes will help you build your data science skills, including Python, SQL, and machine learning, and try out projects used in high-profile companies.

Additionally, you can look at other data science project lists and datasets from Interview Query:

  • Top 10 Regression Datasets and Projects
  • 31 Free Datasets for Your Next Project
  • 12 Machine Learning Projects (Beginner to Advanced)
  • 10 Python Projects with Source Code
  • 21 Data Analytics Project Ideas and Datasets
  • Top 12 Classification Machine Learning Projects

Applying Data Science and Machine Learning to Real Estate

We offer a hands-on course on applying data science, machine learning, and gis to real estate. this is an in-depth version of a masters' level course our instructors ran at a top global university. previous course runs have attracted global participants with backgrounds in institutional real estate, appraisals, proptech, data science, and more., masters' level real estate data science course, course overview.

We are proud to offer one of the world’s first data science, machine learning, and GIS courses dedicated to analyzing, investing in and forecasting property globally.

Through a series of interactive online lectures, hands-on learning and the completion of a key capstone project, you will gain the knowledge and expertise to construct indexes, automate valuations, analyze clusters and forecast time series.

You will gain the necessary skills to utilize large datasets to determine fair transaction prices, forecast future returns and how to analyze locations with geographic information systems (GIS).

Course Modules

What will you learn.

  • Data Science Fundamentals (as applicable to all industries) including Python, Pandas, and Scikit-Learn;
  • Geographic Information Systems;
  • Data Science Methods for Real Estate, including index construction, automated valuation, cluster analysis, and time series forecasting (ARIMA, VAR, and VECM).
  • The ability to utilize large datasets to determine fair transaction prices and forecast future returns.

Course Segments & Time Commitment

The program is made up of three (3) main segments. Segments can be taken individually, pending your interest and level of expertise.

Course Segments Course Bundles Modules
A B C D
Bootcamp
(20-30 hrs)
B1,B2,B3
Real Estate Course
(25-35 hrs)
1,2,3,4,5,6,7
GIS Courses
(6-8 hrs)
G1,G2

Classes consist of interactive, online video conferences. Office hours are one-on-one online video conferences. The capstone project is presented online – students are invited to (but not required to) attend the project presentations of their classmates.

Total Course Time Commitment: ~60 hours - 5 hours/week

Meet your instructors.

real estate data science capstone project

Nelson Lau, PhD, CFA

Nelson is the CEO of PropertyQuants Pte. Ltd., a PropTech startup bringing quantitative methods to global real estate. He has a PhD in Decision Sciences from INSEAD, is a CFA Charterholder, and completed his undergraduate work at Columbia University, double majoring in Economics and Mathematics-Statistics. Nelson holds adjunct faculty instructor roles at the Singapore Management University and National University of Singapore's Asian Institute of Digital Finance.

He has published papers in Management Science, Decision Support Systems, and Decision Analysis, one of which received a special recognition award. Nelson started his career as a trader/researcher at R G Niederhoffer Capital Management, an award-winning US hedge fund deploying systematic data-driven medium and low frequency strategies to global markets, and also spent significant time as lead trader at KCG, a leading global high frequency algorithmic trading firm.

He was also a Quantitative Macro Strategist at GIC and Managing Director at a proprietary trading firm (Acceletrade Technologies). Nelson has been investing in international residential real estate in a personal capacity for 10 years, and has a deep interest in bringing more systematic, quantitative, and data-driven approaches to real estate practice.

real estate data science capstone project

Xingzhi Cheng, PhD

Xingzhi is CTO of PropertyQuants and has a PhD in Statistical Physics from the National University of Singapore (NUS) and a B.S. in Computer Science from Peking University, with papers published in Physical Review Letters and elsewhere.

He was a postdoctoral research fellow at the Santa Fe Institute and NUS before moving to quantitative trading, where he has 5 years of experience as a researcher, trader, and quantitative developer.

Xingzhi enjoys architecting and developing software and frameworks for systematic and automated research. He’s also developed mobile apps and several different websites in his free time, one of which focused on tracking SGX-listed REITs, and another which analyzed which properties were best to buy or rent for parents in Singapore looking to maximize primary school admission priority for their children. He’s currently excited about building the PropertyQuants platform enabling quantitative and systematic approaches to be applied to real estate investing globally.

Who is this course for?

Participants with a basic foundation in mathematics / statistics at a high school level (GCE ‘A’ level, International Baccalaurate, or equivalent) or higher.

Participants without a background in Python, Pandas, and Sci-kit Learn are required to participate in the bootcamp prior to the course.

The topics we cover are novel and constitute an extension to typical data science courses. Experienced data scientists will gain significant value from participating in all sections of the program.

Questions? Read our FAQ.

UK participants: You can Pay with Knoma ! See our FAQ for more details.

Complete the form below to recieve our full program brochure containing additional course information and pricing.

Course screenshots.

First slide

Video Clips

PropertyQuants Achievements

PropertyQuants SFA

Certified FinTech: AI & Data Analytics Provider

PropertyQuants Techstars

Class of 2019, Colliers Techstars PropTech Accelerator

PropertyQuants MAS Megatr8

Grant Recipient, MAS FSTI POC Scheme

PropertyQuants SLA ShortListed

Shortlist, Innoleap Call for Solutions (CFS) 2019

PropertyQuants BWT Asia Awards

Finalist, Built World Technology (BWT) Asia 2019

Learn more about propertyquants.

Data Science capstone projects batch #23

by Ekaterina Butyugina

Energy in being consumed though electricity

Understanding energy consumption in real estate

Graphs depicting feature contributions and predicted consumption of energy

Multiple myeloma: a survival story

A graph that depicts the low and high risk factors

Elevate your career with Constructor Academy's cutting-edge Data Science Bootcamp.

Interested in reading more about constructor academy and tech related topics then check out our other blog posts..

Blog

NYC Data Science Acedemy

  • DATA SCIENCE BOOTCAMP
  • ONLINE DATA SCIENCE BOOTCAMP
  • Professional Development Courses
  • CORPORATE OFFERINGS
  • HIRING PARTNERS

NYC Data Science Academy

  • Learn Pandas
  • Learn NumPy
  • Learn SciPy
  • Learn Matplotlib
  • Random Forest
  • Linear Regression
  • Decision Tree
  • Learn Generative AI
  • Learn ChatGPT-3.5
  • Learn ChatGPT-4
  • Learn Google Bard
  • Learn Python
  • Learn MySQL
  • Learn NoSQL
  • Learn PySpark
  • Learn PyTorch
  • Python Hard
  • Python Easy

Demographic-Based Real Estate Investing

real estate data science capstone project

Presentation Video

Introduction

Organizations are constantly seeking new ways of leveraging data to guide strategic decision-making and increase returns on investment (ROI). One of the areas of investment that benefits from a data-driven approach is real estate. To explore the impact data science can have on this form of investment, we partnered with Haystacks , a real estate investment strategy company. This blog post dives into our innovative approach, specifically focusing on the interplay between investor portfolios and Points of Interest (POIs) in real estate areas. We also detail the obstacles we encountered and the crucial insights acquired along the journey.

Project Context and Challenges

The task at hand was to explore how POIs affect a real estate portfolio, specifically utilizing a non time-series approach to automate correlation tests and visualizing potential correlation of POIs with real estate returns. POIs are of significant interest in real estate due to several reasons. Firstly, POIs provide insights into the available amenities and resources in a neighborhood. For potential homeowners or investors, knowing the proximity of schools, hospitals, restaurants, parks, and other essential facilities can greatly influence their decision-making process. Access to quality education, healthcare, and recreational areas are often important factors that contribute to the desirability and value of a property. Additionally, POIs can provide a sense of the overall infrastructure and development in an area, indicating its potential for future growth and investment opportunities. By analyzing the relationship between POIs and real estate portfolios, investors can gain valuable insights into the attractiveness and livability of a location, helping them make informed investment decisions.

Our journey commenced with extensive explorations around Atlanta, GA, employing data from Google Places and correlating it with real estate listings data in the same area. However, this initial approach led to overfitting issues. Consequently, despite the sizable sample, our models yielded low accuracy scores.

Upon realizing that our initial approach, which focused solely on linear relationships, did not yield satisfactory results, we needed to reassess our understanding of the interplay between POIs and home values. To support this shift in perspective, we conducted further analysis and observed that the relationship between POIs and home values is more complex than a simple linear correlation. This insight was reinforced by examining various statistical measures, such as scatter plots and correlation matrices, which revealed that the influence of POIs on real estate returns is not strictly linear. While proximity to POIs can be a contributing factor, other variables such as neighborhood demographics, local market trends, and property characteristics also play significant roles. By acknowledging the multifaceted nature of the relationship between POIs and home values, we were able to adjust our approach and explore alternative methods to capture the true impact of POIs on real estate portfolios.

So instead, we opted for a shift in perspective.

Reframing the Question

Our exploration led us to an intriguing alternative approach. Instead of correlating POIs directly to home value, we aimed to correlate the target demographics of businesses to zip-code and tenant demographics. The premise was simple: Businesses, large or small, conduct meticulous market research before selecting a location. This choice can offer valuable demographic insights to real estate investors.

To give some broader context, real estate investment invariably requires a comprehensive understanding of numerous factors influencing the success of an investment. These include market dynamics, supply and demand, economic conditions, and demographics such as population growth, age distribution, and income levels. Traditional mortgage data, despite its value, is often limited by the frequency of updates. In rapidly changing markets, this sluggishness in data collection and delayed updates can impede investors' ability to make timely, accurate investment decisions.

Our Novel Approach

To mitigate these challenges, we propose a novel methodology, leveraging points of interest (POIs) and their correlation with zip codes as proxies for demographic insights to match investors' portfolio preferences. POIs, which can range from businesses and amenities to landmarks and entertainment venues, reflect the characteristics and preferences of the local population dictated by market demands. Analyzing the distribution and types of POIs in a given area can give us a thorough understanding of the target audience and their preferences.

The rationale behind using POIs as proxies for demographic insights is that businesses invest considerable time, money, and research into understanding their target market before deciding where to open new locations. In the case of larger businesses, they possess proprietary data like customer profiles, buying patterns, and market trends, which help them identify areas with the highest potential for success.

Businesses select locations in proximity to their target demographics. If these target demographics line up with an investor's portfolio demographics, then that area could be a viable investment. Conversely, if a business closes, it may indicate shifts in demographics or market conditions. By piggybacking on businesses' extensive research and expertise, we can tap into this valuable knowledge, creating a symbiotic relationship with an investor's profile.

real estate data science capstone project

As an illustrative example, let's consider a real estate investor heavily invested in an area 20 minutes outside of Atlanta, GA. They own an apartment complex and various rental properties scattered throughout a specific zip code, which have particular demographics. This investor aims to expand their portfolio into other areas with similar returns.

In parallel, Starbucks, which conducts thorough market research to identify areas with high success probabilities, shares a similar interest in these demographics. It follows that if the presence of certain POIs like Starbucks, Michaels, and Jimmy Johns aligns with the investor's profile, we can infer that the demographics and market conditions are conducive to successful real estate investments for the investor.

Data Sources and Application

The data utilized for this project comprises a combination of Census data, HMDA data, and Google Places data. These datasets provide valuable insights into the demographics, housing market, and amenities of different areas within the Atlanta Metro region. While we focused on over 200 zip codes in Atlanta as a proof of concept, the methodology can be applied universally to other regions.

The Census data, summarized by zip code, offered a wealth of demographic details such as population density, income levels, education levels, and housing statistics. This information helps paint a comprehensive picture of the characteristics and composition of each neighborhood or area under analysis. Understanding the demographics of an area is crucial for real estate investors as it can provide insights into the target market, rental demand, and potential property value appreciation.

The HMDA, or Home Mortgage Disclosure Act, data is another essential dataset for our project. This data, also summarized by zip codes, provides information about loan applications, approvals, rates, and types. The HMDA dataset offers a wealth of information on mortgage activity within different areas, enabling us to assess the lending environment and creditworthiness of borrowers. The HMDA data is particularly useful for many real estate use cases as it allows us to analyze loan approval rates, which can serve as an indicator of economic stability and creditworthiness of borrowers in a specific area.

To illustrate the significance of HMDA data, let's examine the map below, which displays the HMDA approval rates in the Atlanta Metro region. Lighter shades represent higher approval rates. Upon analysis, a discernable pattern emerges, especially in areas north of Atlanta, such as Buckhead, which exhibits notably higher approval rates compared to the rest of the region.

real estate data science capstone project

Now, let's consider another map below, which showcases the percentage of people earning more than $200,000 per year in each zip code. Remarkably, we observe a striking overlap with the HMDA approval rates map. Once again, the zip codes north of Atlanta exhibit lighter shades, indicating both higher approval rates and a higher percentage of the wealthy population.

real estate data science capstone project

This correlation suggests that these specific areas not only have a relatively affluent population but also signify a robust economic environment. The combination of high income levels and high approval rates indicates a healthy real estate market and potentially lucrative investment opportunities.

So, how does this information serve an investor? High approval rates are often indicative of stable economic conditions and lower credit risk, which could suggest higher property values. Furthermore, the knowledge that high approval rates align with areas where a larger percentage of the population earns over $200,000 adds an additional layer of confidence for an investor. It indicates that the area attracts a wealthier demographic that is likely to maintain steady rent payments and have a lower risk of defaulting on their mortgages. Thus, incorporating HMDA data into our strategy further refines our approach and allows for a more nuanced understanding of investment potentials.

Given the importance of the HMDA data for our project, it is essential to delve into its significance. The HMDA dataset provides a comprehensive view of mortgage activity and lending patterns within different areas, making it a valuable resource for various real estate use cases. By analyzing HMDA data, investors and real estate professionals can gain insights into the lending environment, creditworthiness of borrowers, and the overall economic stability of specific regions or neighborhoods. This information is invaluable for identifying areas with potential for investment, assessing market conditions, and making informed decisions based on credit risk and loan approval rates. Therefore, leveraging the HMDA data can significantly enhance the accuracy and effectiveness of real estate investment strategies.

Gross Rental Yield and POI Categories: Untapped Opportunities

Continuing our exploration, we turned our attention to the Gross Rental Yield. This crucial metric provides an idea of how much an investor could make on an investment property before considering expenses like property management, taxes, and insurance. It enables investors to evaluate the potential return on investment based solely on rental income.

To calculate gross rental yield, we divided the property's annual rental income by its purchase price or market value, then expressed it as a percentage. For instance, a gross rental yield of 7% means the rental income is approximately 7% of the property's value. Since we didn't have the sale and rental price of individual properties, we used the mean prices for each zip code.

After establishing the gross rental yield, we explored various POI categories with at least 100 locations across the Atlanta Metro Region. The bottom ten categories, displayed in red on the right of the below bar chart, are most prevalent in zip codes with a low gross rental yield. These categories include industries such as real estate, which thrive in areas with high home values and high rental prices.

real estate data science capstone project

Conversely, the top ten categories found in zip codes with a high gross rental yield are displayed in green on the left of the chart. The 'Trucking Company' category emerged as a standout, with an impressive 7.5% average gross rental yield across locations. Interestingly, other POI categories such as 'Pawn Shops,' 'Warehouses,' and 'Laundromats' were also among the top ten.

This data implies that these areas, despite potentially being seen as temporary living spaces, may offer lucrative investment opportunities. The presence of these business types may indicate an underserved or transient demographic. While they may not be long-term residents, they still represent a sector with housing needs. In other words, high rental yields are not necessarily linked to high-end POIs like luxury retail or gourmet restaurants. Instead, practical and essential services seem to dominate the list.

Understanding the distribution and types of POIs in these high yield areas can serve as a strong indicator of the kind of tenant an investor can expect. Consequently, it will enable investors to tailor their properties to cater to these specific demographics, leading to higher occupancy rates, stable rental income, and, ultimately, higher returns on their investment.

It is worth mentioning that while gross rental yield is a valuable metric for assessing investment potential, it is essential to consider other factors such as property expenses, market trends, and local regulations to make well-informed investment decisions. In our analysis, we focused on gross rental yield as a starting point, and by not accounting for expenses, we aimed to highlight the untapped opportunities in certain POI categories. However, in practice, investors should thoroughly evaluate all relevant aspects before finalizing their investment strategies.

Exploring Business Correlations with Census and Mortgage Data

Continuing our analysis with Atlanta, GA as a proof of concept, we delve into how various businesses in the area correlate with an assortment of census and mortgage data. This investigation helps us identify the demographics of the customers these businesses serve in the zip codes they occupy and align these with investor preferences.

For instance, let's consider Dollar General. The below heatmap, which displays the POI name on the X-axis and census data on the Y-axis, reveals that Dollar General is most strongly correlated with areas that have low rental/property value, a large percentage of car commuters (particularly those traveling 60 min or more to work), and a household income of $35-50k. Conversely, Dollar General stores do not typically show up in areas with incomes higher than $200k or high median property values.

real estate data science capstone project

On the other hand, a store like 'Hollywood Feed,' a pet store chain, correlates highly with households that make over $200k. These correlations allow investors to match their interests with specific demographics served by these businesses.

Applying K Nearest Neighbor for Strategic Recommendations

To streamline this concept, we have created a function using K Nearest Neighbor (KNN). This function accepts an investor's current zip code and desired demographic profile, and it outputs a recommendation of 'K' number of zip codes that share a similar demographic profile. Additionally, it suggests prevalent POIs within that zip code.

real estate data science capstone project

The process begins with the input of a zip code and features an investor is interested in. The data is then used in a K-Nearest Neighbors (KNN) algorithm to identify the k most similar zip codes based on the provided columns' profiles.  Moreover, the function is scalable and can handle large datasets efficiently, making it suitable for large data sets and real-world scenarios.

Visualizing Results with PCA and KNN

To make these concepts more digestible, we have visualized the results of the KNN algorithm using Principal Component Analysis (PCA) in a 2D space. PCA helps us capture the most important patterns and variances in the data. This visual aid confirms that we are identifying the “closest” relationships between zip codes. The gray dots represent all the zip codes in our dataset, the red dot is the selected zip code, and the blue dots represent the ”closest” zip codes.

real estate data science capstone project

A New Tool for Real Estate Investors

We used Plotly Dash to develop a tool that leverages businesses' proprietary information and demographic analysis expertise for the benefit of real estate investors. It is an inexpensive solution to identify areas that align with investor profiles and demonstrate significant investment potential. It also uncovers areas of future growth potential by identifying areas that match the preferences of successful businesses.

Our approach is cost-efficient, as it capitalizes on businesses' extensive data, reducing the need for expensive data acquisition. Furthermore, by using POIs as proxies for demographic information, we overcome the limitations of slow and infrequent data updates, enabling investors to stay ahead of market trends.  

Our project underscores the power of data science in revolutionizing real estate investment. By aligning data from various sources and correlating them in novel ways, we have unearthed valuable insights that promise to redefine real estate investment strategies.

We have not only drawn connections between seemingly unrelated variables but also created a powerful, user-friendly tool that offers real-time, granular insights to real estate investors. This interactive dashboard allows investors to explore various aspects of zip codes, from housing and commuting attributes to demographic and earnings attributes.

Special Thanks

  • Joe Lee from Haystacks.ai for his sponsorship and mentorship on this project.
  • Cole Ingraham from NYC Data Science Academy for his mentorship on this project.

About Authors

real estate data science capstone project

Brian Ralston

real estate data science capstone project

Jason Phillip

Related articles, leave a comment, view posts by categories, our recent popular posts, view posts by tags, nyc data science academy.

NYC Data Science Academy teaches data science, trains companies and their employees to better profit from data, excels at big data project consulting, and connects trained Data Scientists to our industry.

NYC Data Science Academy is licensed by New York State Education Department.

Get detailed curriculum information about our amazing bootcamp!

  • Refund Policy

SOCIAL MEDIA

real estate data science capstone project

Data Science: Capstone

Show what you’ve learned from the Professional Certificate Program in Data Science.

Stained glass windows arranged in a spiraling shape

Associated Schools

Harvard T.H. Chan School of Public Health

Harvard T.H. Chan School of Public Health

What you'll learn.

How to apply the knowledge base and skills learned throughout the series to a real-world problem

Independently work on a data analysis project

Course description

To become an expert data scientist you need practice and experience. By completing this capstone project you will get an opportunity to apply the knowledge and skills in R data analysis that you have gained throughout the series. This final project will test your skills in data visualization, probability, inference and modeling, data wrangling, data organization, regression, and machine learning.

Unlike the rest of our Professional Certificate Program in Data Science, in this course, you will receive much less guidance from the instructors. When you complete the project you will have a data product to show off to potential employers or educational programs, a strong indicator of your expertise in the field of data science.

Instructors

Rafael Irizarry

Rafael Irizarry

You may also like.

Colorful confetti against a blue background

Data Science: Probability

Learn probability theory — essential for a data scientist — using a case study on the financial crisis of 2007–2008.

Purple and teal geometric shapes

Data Science: Inference and Modeling

Learn inference and modeling: two of the most widely used statistical tools in data analysis.

lines of genomic data (dna is made up of sequences of a, t, g, c)

High-Dimensional Data Analysis

A focus on several techniques that are widely used in the analysis of high-dimensional data.

Join our list to learn more

  • UC Berkeley
  • Sign Up to Volunteer
  • I School Slack
  • Alumni News
  • Alumni Events
  • Alumni Accounts
  • Career Support
  • Academic Mission
  • Diversity & Inclusion Resources
  • DEIBJ Leadership
  • Featured Faculty
  • Featured Alumni
  • Work at the I School
  • Subscribe to Email Announcements
  • Logos & Style Guide
  • Directions & Parking

The School of Information is UC Berkeley’s newest professional school. Located in the center of campus, the I School is a graduate research and education community committed to expanding access to information and to improving its usability, reliability, and credibility while preserving security and privacy.

  • Career Outcomes
  • Degree Requirements
  • Paths Through the MIMS Degree
  • Final Project
  • Funding Your Education
  • Admissions Events
  • Request Information
  • Capstone Project
  • Jack Larson Data for Good Fellowship
  • Tuition & Fees
  • Women in MIDS
  • MIDS Curriculum News
  • MICS Student News
  • Dissertations
  • Applied Data Science Certificate
  • ICTD Certificate
  • Cybersecurity Clinic

The School of Information offers four degrees:

The Master of Information Management and Systems (MIMS) program educates information professionals to provide leadership for an information-driven world.

The Master of Information and Data Science (MIDS) is an online degree preparing data science professionals to solve real-world problems. The 5th Year MIDS program is a streamlined path to a MIDS degree for Cal undergraduates.

The Master of Information and Cybersecurity (MICS) is an online degree preparing cybersecurity leaders for complex cybersecurity challenges.

Our Ph.D. in Information Science is a research program for next-generation scholars of the information age.

  • Fall 2024 Course Schedule

The School of Information's courses bridge the disciplines of information and computer science, design, social sciences, management, law, and policy. We welcome interest in our graduate-level Information classes from current UC Berkeley graduate and undergraduate students and community members.  More information about signing up for classes.

  • Ladder & Adjunct Faculty
  • MIMS Students
  • MIDS Students
  • 5th Year MIDS Students
  • MICS Students
  • Ph.D. Students

real estate data science capstone project

  • Publications
  • Centers & Labs
  • Computer-mediated Communication
  • Data Science
  • Entrepreneurship
  • Human-computer Interaction (HCI)
  • Information Economics
  • Information Organization
  • Information Policy
  • Information Retrieval & Search
  • Information Visualization
  • Social & Cultural Studies
  • Technology for Developing Regions
  • User Experience Research

Research by faculty members and doctoral students keeps the I School on the vanguard of contemporary information needs and solutions.

The I School is also home to several active centers and labs, including the Center for Long-Term Cybersecurity (CLTC) , the Center for Technology, Society & Policy , and the BioSENSE Lab .

  • Why Hire I School?
  • Request a Resume Book
  • For Nonprofit and Government Employers
  • Leadership Development Program
  • Mailing List
  • Jobscan & Applicant Tracking Systems
  • Resume & LinkedIn Review

I School graduate students and alumni have expertise in data science, user experience design & research, product management, engineering, information policy, cybersecurity, and more — learn more about hiring I School students and alumni .

  • Press Coverage
  • I School Voices

Eric meyer in suit posing

  • Distinguished Lecture Series
  • I School Lectures
  • Information Access Seminars
  • CLTC Events
  • Women in MIDS Events

Photo of Afshin Nikzad

Data Science Spring 2023 Capstone Project Showcase

Capstone projects are the culmination of the MIDS students’ work in the School of Information’s Master of Information and Data Science program.

Over the course of their final semester, teams of students propose and select project ideas, conduct and communicate their work, receive and provide feedback, and deliver compelling presentations along with a web-based final deliverable.

Join us for an online presentation of these capstone projects. Six teams will present for twenty minutes each, including Q&A.

A panel of judges will select an outstanding project for the Hal R. Varian MIDS Capstone Award .

Join the online showcase

Dr. Julia Meo holds a Ph.D. in physics from the University of Pennsylvania. Her professional experience ranges from creating recommender systems using natural language processing to time series analysis, and forecasting. During her time at Zillow, Dr. Meo’s research has focused on creating data products that characterize the housing economy. Her in-depth analysis of housing market dynamics, pricing trends, and investment opportunities has provided valuable insights to inform policymakers, real estate professionals, and investors. Her work has been instrumental in shaping strategies and decision-making in the real estate industry, driving growth and innovation.

Ryan Neo, MIDS ’20, is a data science leader with over a decade of experience in finance, consulting and tech. He currently leads a data science team at Meta focusing on advertiser products. He enjoys consulting on academic and nonprofit research projects in his spare time. Ryan notes that his favorite course within the MIDS program is DATASCI 241. He currently lives in San Francisco with his wife, daughter, and cat.

Pauline Wang

Pauline Wang is a data science manager with extensive expertise in machine learning and data analytics. In her current role as manager of machine learning triage at IBM Watson Orders, she leads the triage data analytics capability to guide and prioritize the development of AI-powered voice agents for quick service restaurants (QSRs).  She has been instrumental in building tools and processes that have expanded the team’s capability from just 5 to over 100 live restaurants. Prior to joining IBM, Pauline served as director of research at Elliott Management Corporation, where she leveraged unstructured data to identify investment opportunities. When she is not busy training robots, Pauline enjoys dancing, cooking, and keeping up with the latest trends in AI technologies.

More information

Spring 2023 MIDS Project Descriptions

If you have questions about this event, please contact the Student Affairs team at [email protected] .

real estate data science capstone project

real estate data science capstone project

  • Realting.com
  • Saint Petersburg
  • Residential

Residential properties for sale in Saint Petersburg, Russia

1 room apartment in Nevsky District, Russia

Property types in Saint Petersburg

Properties features in saint petersburg, russia, frequently asked questions about real estate in st. petersburg, in what areas of st. petersburg real estate is purchased most often, what documents do i need to have to buy property in st. petersburg, what are the restrictions and special conditions for foreigners buying apartments and houses in st. petersburg.

  • Find an Agent
  • Redfin Premier
  • Sell My Home
  • List My Home for Rent
  • Saved Searches
  • Why Sell with Redfin?
  • Owner Dashboard
  • Open House Schedule
  • Appointments
  • Be a Redfin Agent
  • Notification Settings
  • Homes for sale
  • Condos for sale
  • Land for sale
  • Open houses
  • Buy with Redfin
  • Affordability calculator
  • Home buying guide
  • Find lenders & inspectors
  • Free home buying classes
  • US housing market
  • Rental market tracker
  • How much rent can I afford?
  • Should I rent or buy?
  • Renter guide
  • List my home for rent
  • Rental Tools dashboard
  • US rental market trends
  • Should I sell or rent my home?
  • What's my home worth?
  • My home dashboard
  • Why sell with Redfin?
  • Redfin Full Service
  • Find an agent
  • Home selling guide
  • Will selling pay off?
  • Find handypeople and stagers
  • Home improvement trends
  • Get pre-approved
  • Today's mortgage rates
  • Payment calculator
  • Join as a Redfin Agent
  • Join our referral network
  • Agent Resource Center

St. Petersburg, FL Housing Market

The St. Petersburg housing market is somewhat competitive. Homes in St. Petersburg receive 2 offers on average and sell in around 41 days. The median sale price of a home in St. Petersburg was $435K last month, down 4.4% since last year. The median sale price per square foot in St. Petersburg is $345, down 0.43% since last year.

  • Transportation

St. Petersburg Housing Market Trends

In August 2024, St. Petersburg home prices were down 4.4% compared to last year, selling for a median price of $435K. On average, homes in St. Petersburg sell after 41 days on the market compared to 21 days last year. There were 397 homes sold in August this year, down from 465 last year.

Track home value

How much is your home worth? Track your home’s value and compare it to nearby sales.

A Redfin Agent

How hot is the St. Petersburg housing market?

The Redfin Compete Score rates how competitive an area is on a scale of 0 to 100, where 100 is the most competitive.

Calculated over the last 3 months

  • Some homes get multiple offers.
  • The average homes sell for about 3% below list price and go pending in around 35 days .
  • Hot homes can sell for around list price and go pending in around 9 days .
  • The average homes sell for about 3% below list price and go pending in around 26 days .
  • Hot homes can sell for around list price and go pending in around 7 days .
  • The average homes sell for about 4% below list price and go pending in around 46 days .
  • Hot homes can sell for around list price and go pending in around 14 days .

St. Petersburg Migration & Relocation Trends

  • Across the nation, 2 % of homebuyers searched to move into St. Petersburg from outside metros.
  • New York homebuyers searched to move into St. Petersburg more than any other metro followed by Washington and Chicago .
  • 70 % of St. Petersburg homebuyers searched to stay within the St. Petersburg metropolitan area.
  • Sarasota was the most popular destination among St. Petersburg homebuyers followed by Orlando and Homosassa Springs .
InboundNet inflow
Metros
1
2
3
4
5
6
7
8
9
10
OutboundNet outflow
Metros
1
2
3
4
5
6
7
8
9
10

Home offer insights for St. Petersburg

Recent offers, view houses in st. petersburg that sold recently, recently sold homes.

Photo of 401 59th Ln S, Saint Petersburg, FL 33707

See all recently sold homes in St. Petersburg .

Schools in St. Petersburg

Greatschools rating.

/10
/10
/10
/10
/10

Climate's impact on St. Petersburg housing

Learn about natural hazards and environmental risks, such as floods, fires, wind, and heat that could impact homes in St. Petersburg.

Flood Factor - Major

Flood factor.

flood factor score logo

Fire Factor - Moderate

Fire factor.

fire factor score logo

Wind Factor - Extreme

Wind factor.

wind factor score logo

Wind likelihood over time

Heat factor - extreme, heat factor.

heat factor score logo

Transportation in St. Petersburg

Walk score ®, transit score ®, bike score ®, more real estate resources for st. petersburg, new listings in st. petersburg.

  • 6100 S Gulfport Blvd #412
  • 613 79th Cir S
  • 4681 1st St NE #309
  • 5220 Brittany Dr S #106
  • 1165 63rd Ave S
  • 1164 Murok Way S
  • 175 1st St S #1508
  • 741 58th Ave NE
  • 831 72nd Ave N
  • 4681 1st St NE #406
  • 4900 38th Way S #503
  • 1044 82nd Ter N Unit 1044C
  • 1044 82nd Ter N Unit 1044D
  • 4635 22nd Ave N
  • 3751 4th Ave N
  • All St. Petersburg New Listings

Nearby City Housing Markets

  • Largo Housing Market
  • Ruskin Housing Market
  • St. Pete Beach Housing Market
  • Belleair Housing Market
  • Indian Shores Housing Market
  • Madeira Beach Housing Market
  • Seminole Housing Market
  • Apollo Beach Housing Market
  • Clearwater Housing Market
  • Treasure Island Housing Market
  • Belleair Bluffs Housing Market
  • Gulfport Housing Market
  • Tampa Housing Market
  • Indian Rocks Beach Housing Market
  • Pinellas Park Housing Market

Neighborhood Housing Markets

  • Pass-a-Grille Beach Housing Market
  • Coquina Key Housing Market
  • Downtown St. Petersburg Housing Market
  • Old Northeast Housing Market
  • Snell Isle Housing Market
  • Historic Kenwood Housing Market
  • Isla del Sol Housing Market
  • Shore Acres Housing Market
  • Historic Old Northeast Housing Market
  • Greater Pinellas Point Housing Market

Zip Code Housing Markets

  • 33711 Housing Market
  • 33713 Housing Market
  • 33714 Housing Market
  • 33716 Housing Market
  • 33777 Housing Market
  • 33781 Housing Market
  • 33701 Housing Market
  • 33702 Housing Market
  • 33703 Housing Market
  • 33704 Housing Market
  • 33705 Housing Market
  • 33706 Housing Market
  • 33707 Housing Market
  • 33708 Housing Market
  • 33710 Housing Market

Nearby City Listings

  • Largo Real Estate
  • Ruskin Real Estate
  • St. Pete Beach Real Estate
  • Belleair Real Estate
  • Indian Shores Real Estate
  • Madeira Beach Real Estate
  • Seminole Real Estate
  • Apollo Beach Real Estate
  • Clearwater Real Estate
  • Treasure Island Real Estate
  • Belleair Bluffs Real Estate
  • Gulfport Real Estate
  • Tampa Real Estate
  • Indian Rocks Beach Real Estate
  • Pinellas Park Real Estate

Neighborhood Listings

  • Pass-a-Grille Beach Real Estate
  • Coquina Key Real Estate
  • Downtown St. Petersburg Real Estate
  • Old Northeast Real Estate
  • Snell Isle Real Estate
  • Historic Kenwood Real Estate
  • Isla del Sol Real Estate
  • Shore Acres Real Estate
  • Historic Old Northeast Real Estate
  • Greater Pinellas Point Real Estate

Zip Code Listings

  • 33711 Real Estate
  • 33713 Real Estate
  • 33714 Real Estate
  • 33716 Real Estate
  • 33777 Real Estate
  • 33781 Real Estate
  • 33701 Real Estate
  • 33702 Real Estate
  • 33703 Real Estate
  • 33704 Real Estate
  • 33705 Real Estate
  • 33706 Real Estate
  • 33707 Real Estate
  • 33708 Real Estate
  • 33710 Real Estate
  • Apartments for Rent in St. Petersburg
  • Houses for Rent in St. Petersburg

Download the Redfin App on the Apple App Store

Find homes faster

Subsidiaries

US flag

Copyright: © 2024 Redfin. All rights reserved.

Updated January 2023: By searching, you agree to the Terms of Use , and  Privacy Policy .

Do not sell or share my personal information .

REDFIN and all REDFIN variants, TITLE FORWARD, WALK SCORE, and the R logos, are trademarks of Redfin Corporation, registered or pending in the USPTO.

California DRE #01521930

Redfin is licensed to do business in New York as Redfin Real Estate.  NY Standard Operating Procedures

New Mexico  Real Estate Licenses

TREC:  Info About Brokerage Services ,  Consumer Protection Notice

If you are using a screen reader, or having trouble reading this website, please call Redfin Customer Support for help at 1-844-759-7732 .

Equal Housing Opportunity

Begin typing to search, use arrow keys to navigate results, Enter to select.

  • Atlanta, GA
  • San Francisco, CA
  • Los Angeles, CA
  • Houston, TX
  • Chicago, IL
  • Orlando, FL
  • Philadelphia, PA
  • Phoenix, AZ

Homes for Sale Saint Petersburg, FL

  • Coming Soon
  • Virtual Open Houses
  • Newest First
  • Price High-Low
  • Price Low-High
  • Square Feet
  • Open Houses

real estate data science capstone project

5521 80th ST N #310 St Petersburg, FL 33709

Listed By CENTURY 21 Jim White & Associates

real estate data science capstone project

300 Beach Drive NE Unit 1101 Saint Petersburg, FL 33701

Courtesy Of Coldwell Banker St Petersburg NE

real estate data science capstone project

1401 5th Street N Saint Petersburg, FL 33704

Courtesy Of Coldwell Banker Winter Park

real estate data science capstone project

423 55th Avenue St Pete Beach, FL 33706

Courtesy Of The Toni Everett Company

real estate data science capstone project

12400 Capri Circle N Unit A Treasure Island, FL 33706

Courtesy Of Coldwell Banker St Pete Beach

real estate data science capstone project

7850 2 Avenue S St Petersburg, FL 33707

Courtesy Of Keller Williams St. Pete Realty

real estate data science capstone project

3500 35th Street N Saint Petersburg, FL 33713

Courtesy Of Berkshire Hathaway Florida Properties Group

real estate data science capstone project

6083 Bahia Del Mar Circle 361 Saint Petersburg, FL 33715

real estate data science capstone project

7510 Sunshine Skyway Lane S P4 Saint Petersburg, FL 33711

Courtesy Of ADDvantage Real Estate

real estate data science capstone project

2960 59th Street S 102 Gulfport, FL 33707

Courtesy Of GULFPORT REALTY

real estate data science capstone project

400 150th Ave # 304 Madeira Beach, FL 33708

Listed By CENTURY 21 Beggins Enterprises

real estate data science capstone project

130 34th Ave. N. St. Petersburg, FL 33704

Listed By CENTURY 21 Real Estate Champions

real estate data science capstone project

6129 Leeland Street S Saint Petersburg, FL 33715

real estate data science capstone project

237 7th Ave N St Petersburg, FL 33701

real estate data science capstone project

1330 Cherry Street NE Saint Petersburg, FL 33701

real estate data science capstone project

400 150th Ave # 302 Madeira Beach, FL 33708

real estate data science capstone project

400 150th Ave # 305 Madeira Beach, FL 33708

real estate data science capstone project

6495 Shoreline Dr. #8202 St. Petersburg, FL 33708

real estate data science capstone project

3146 29th Avenue N 203 Saint Petersburg, FL 33713

Courtesy Of Dalton Wade, Inc.

real estate data science capstone project

18053 2nd St. E. Redington Shores, FL 33708

real estate data science capstone project

200 45th Avenue Ne Saint Petersburg, FL 33703

Courtesy Of Coastal Properties Group International, LLC

real estate data science capstone project

Use this financial tool to help you determine the maximum price you can afford

Must have a step-free guest entrance, accessible parking spot, and step-free path tp the guest entrance.

Select this option to include Adult Living Communities

Toggling this option hides all non CENTURY 21 listings

Find a School

Find homes for sale by school.

Your saved item list is empty.

You have no recent saved items.

Recently Saved Items

Become a My C21 Member

Use your account to save homes & agents across devices.

Your Current Search Summary:

Save Search Error

There was an error saving your search. Please try again later.

Search Saved

Your search has been saved successfully!

Navigation Menu

Search code, repositories, users, issues, pull requests..., provide feedback.

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

  • Notifications You must be signed in to change notification settings

utkarsh1412/-Data-Science-Capstone-Real-Estate

Folders and files.

NameName
2 Commits

Repository files navigation

Data-science-capstone-real-estate, problem statement.

A banking institution requires actionable insights into mortgage-backed securities, geographic business investment, and real estate analysis. The mortgage bank would like to identify potential monthly mortgage expenses for each region based on monthly family income and rental of the real estate. A statistical model needs to be created to predict the potential demand in dollars amount of loan for each of the region in the USA. Also, there is a need to create a dashboard which would refresh periodically post data retrieval from the agencies. The dashboard must demonstrate relationships and trends for the key metrics as follows: number of loans, average rental income, monthly mortgage and owner’s cost, family income vs mortgage cost comparison across different regions. The metrics described here do not limit the dashboard to these few.

Dataset Description

Second mortgage: Households with a second mortgage statistics

Home equity: Households with a home equity loan statistics

Debt: Households with any type of debt statistics

Mortgage Costs: Statistics regarding mortgage payments, home equity loans, utilities, and property taxes

Home Owner Costs: Sum of utilities, and property taxes statistics

Gross Rent: Contract rent plus the estimated average monthly cost of utility features

High school Graduation: High school graduation statistics

Population Demographics: Population demographics statistics

Age Demographics: Age demographic statistics

Household Income: Total income of people residing in the household

Family Income: Total income of people related to the householder

Project Task: Week 1

Data import and preparation:.

1.Import data.

2.Figure out the primary key and look for the requirement of indexing.

3.Gauge the fill rate of the variables and devise plans for missing value treatment. Please explain explicitly the reason for the treatment chosen for each variable.

Exploratory Data Analysis (EDA):

4.Perform debt analysis. You may take the following steps:

a) Explore the top 2,500 locations where the percentage of households with a second mortgage is the highest and percent ownership is above 10 percent. Visualize using geo-map. You may keep the upper limit for the percent of households with a second mortgage to 50 percent

b) Use the following bad debt equation: Bad Debt = P (Second Mortgage ∩ Home Equity Loan) Bad Debt = second_mortgage + home_equity - home_equity_second_mortgage c) Create pie charts to show overall debt and bad debt

d) Create Box and whisker plot and analyze the distribution for 2nd mortgage, home equity, good debt, and bad debt for different cities

e) Create a collated income distribution chart for family income, house hold income, and remaining income

Project Task: Week 2

  • Perform EDA and come out with insights into population density and age. You may have to derive new fields (make sure to weight averages for accurate measurements):

a) Use pop and ALand variables to create a new field called population density

b) Use male_age_median, female_age_median, male_pop, and female_pop to create a new field called median age c) Visualize the findings using appropriate chart type

  • Create bins for population into a new variable by selecting appropriate class interval so that the number of categories don’t exceed 5 for the ease of analysis.

a) Analyze the married, separated, and divorced population for these population brackets

b) Visualize using appropriate chart type

Please detail your observations for rent as a percentage of income at an overall level, and for different states.

Perform correlation analysis for all the relevant variables by creating a heatmap. Describe your findings.

Project Task: Week 3

Data pre-processing:.

  • The economic multivariate data has a significant number of measured variables. The goal is to find where the measured variables depend on a number of smaller unobserved common factors or latent variables. 2. Each variable is assumed to be dependent upon a linear combination of the common factors, and the coefficients are known as loadings. Each measured variable also includes a component due to independent random variability, known as “specific variance” because it is specific to one variable. Obtain the common factors and then plot the loadings. Use factor analysis to find latent variables in our dataset and gain insight into the linear relationships in the data. Following are the list of latent variables:

• Highschool graduation rates

• Median population age

• Second mortgage statistics

• Percent own

• Bad debt expense

Project Task: Week 4

Data modeling :.

  • Build a linear Regression model to predict the total monthly expenditure for home mortgages loan. Please refer ‘deplotment_RE.xlsx’. Column hc_mortgage_mean is predicted variable. This is the mean monthly mortgage and owner costs of specified geographical location. Note: Exclude loans from prediction model which have NaN (Not a Number) values for hc_mortgage_mean.

a) Run a model at a Nation level. If the accuracy levels and R square are not satisfactory proceed to below step.

b) Run another model at State level. There are 52 states in USA.

c) Keep below considerations while building a linear regression model. Data Modeling :

• Variables should have significant impact on predicting Monthly mortgage and owner costs

• Utilize all predictor variable to start with initial hypothesis

• R square of 60 percent and above should be achieved

• Ensure Multi-collinearity does not exist in dependent variables

• Test if predicted variable is normally distributed

Data Reporting:

  • Create a dashboard in tableau by choosing appropriate chart types and metrics useful for the business. The dashboard must entail the following:

a) Box plot of distribution of average rent by type of place (village, urban, town, etc.).

b) Pie charts to show overall debt and bad debt.

c) Explore the top 2,500 locations where the percentage of households with a second mortgage is the highest and percent ownership is above 10 percent. Visualize using geo-map.

d) Heat map for correlation matrix.

e) Pie chart to show the population distribution across different types of places (village, urban, town etc.)

COMMENTS

  1. HarrshaVardhan/Real-Estate: Capstone Project in Simplilearn

    Real-Estate. Capstone Project in Simplilearn. Business understanding and Data understanding are very critical first couple of steps for any data science project. Read the information given below and also refer to the data dictionary provided separately in an excel file to build your understanding. Problem Statement:

  2. Top 12 Real Estate Data Science Projects (Updated for 2024)

    Developing a house price prediction model is a great way to start. There's a ton of accessible housing data online, e.g., sites like Zillow and Airbnb, and these datasets are perfect for executing this type of project. Zillow's free datasets are a popular choice; the Zillow Home Value Index (ZHVI) is a smoothed, seasonally adjusted average ...

  3. rajeevvhanhuve/Real-Estate: Data Science Capstone Project

    DESCRIPTION. Problem Statement. A banking institution requires actionable insights into mortgage-backed securities, geographic business investment, and real estate analysis. The mortgage bank would like to identify potential monthly mortgage expenses for each region based on monthly family income and rental of the real estate.

  4. Masters' Level Real Estate Data Science Course

    Data Science Methods for Real Estate, including index construction, automated valuation, cluster analysis, and time series forecasting (ARIMA, VAR, and VECM). ... online video conferences. Office hours are one-on-one online video conferences. The capstone project is presented online - students are invited to (but not required to) attend the ...

  5. Enhancing Analysis with Model Interpretability: A Real Estate Dashboard

    Data Science Blog > Capstone > Enhancing Analysis with Model Interpretability: ... In practical applications, such as real estate, models are used to identify candidate single-family residences (SFR) for purchase and rent. Cap rate, the rate of return on a property based on the income the property is expected to generate, crucially aids this ...

  6. 21 Interesting Data Science Capstone Project Ideas [2024]

    Best Data Science Capstone Project Ideas - According to Skill Level. Data science capstone projects are a great way to showcase your skills and apply what you've learned in a real-world context. Here are some project ideas categorized by skill level: Beginner-Level Data Science Capstone Project Ideas. 1. Exploratory Data Analysis (EDA) on a ...

  7. Data Science capstone projects: Insights & survival stories

    Data Science capstone projects batch #23. by Ekaterina Butyugina. ... Novalytica, a data science startup with real estate expertise, is looking into helping investors address these challenges with custom-tailored data and machine learning solutions. To do this, they gave our team access to several datasets concerning energy consumption and ...

  8. PDF Data-Science-Capstone-Project/Real Estate- capstone Shivam ...

    Real Estate Capstone Project. Contribute to shiva8826/Data-Science-Capstone-Project development by creating an account on GitHub.

  9. Demographic-Based Real Estate Investing

    One of the areas of investment that benefits from a data-driven approach is real estate. To explore the impact data science can have on this form of investment, we partnered with Haystacks, a real estate investment strategy company. This blog post dives into our innovative approach, specifically focusing on the interplay between investor ...

  10. Towards a Revamped Real Estate Index

    Project Overview. This semester we worked with REX, a real estate technology company that is trying to bring innovation to an industry that hasn't seen much of it over the past 50+ years. In the spirit of REX's mission, our goal was to address these two weaknesses of traditional real estate indices.

  11. Real Esate _Simplilearn capstone Project

    Explore and run machine learning code with Kaggle Notebooks | Using data from Real Estate_simpilearn Project. Kaggle uses cookies from Google to deliver and enhance the quality of its services and to analyze traffic. Learn more. OK, Got it. Something went wrong and this page crashed!

  12. Data Science: Capstone

    To become an expert data scientist you need practice and experience. By completing this capstone project you will get an opportunity to apply the knowledge and skills in R data analysis that you have gained throughout the series. This final project will test your skills in data visualization, probability, inference and modeling, data wrangling ...

  13. Data Science Spring 2023 Capstone Project Showcase

    Online. Capstone projects are the culmination of the MIDS students' work in the School of Information's Master of Information and Data Science program. Over the course of their final semester, teams of students propose and select project ideas, conduct and communicate their work, receive and provide feedback, and deliver compelling ...

  14. XLSX Data-Science-Capstone-Project/data_dictionary_Real_estate.xlsx at main

    Real Estate Capstone Project. Contribute to shiva8826/Data-Science-Capstone-Project development by creating an account on GitHub.

  15. Saint Petersburg Housing Market Report

    5+ Bedrooms. 112. 119. + 6.3%. Summary: The Saint Petersburg housing inventory by bedroom type for August 2024 compared to the previous month: The inventory of 1 bedroom homes increased by 5.3%, 2 bedroom homes increased by 5.7%, 3 bedroom homes increased by 3.1%, 4 bedroom homes increased by 2.1%, and 5+ bedroom homes increased by 6.3%.

  16. GitHub

    Real Estate Prices & Venues Data Analysis of London - mtk12/IBM-Data-science-capstone-project

  17. Residential properties for sale in Saint Petersburg, Russia

    Find Residential properties for Sale in Saint Petersburg, Russia Large selection of residential properties in latest listings Actual prices Photos Description and Location on the map.

  18. St. Petersburg Housing Market: House Prices & Trends

    The St. Petersburg housing market is somewhat competitive. Homes in St. Petersburg receive 2 offers on average and sell in around 42 days. The median sale price of a home in St. Petersburg was $415K last month, up 0.4% since last year. The median sale price per square foot in St. Petersburg is $334, up 0.1% since last year.

  19. Homes for Sale Saint Petersburg, FL

    View our Saint Petersburg real estate area information to learn about the weather, local school districts, demographic data, and general information about Saint Petersburg, FL. ... Century 21 Real Estate LLC fully supports the principles of the Fair Housing Act and the Equal Opportunity Act. Each office is independently owned and operated.

  20. utkarsh1412/-Data-Science-Capstone-Real-Estate

    Data Import and Preparation: 1.Import data. 2.Figure out the primary key and look for the requirement of indexing. 3.Gauge the fill rate of the variables and devise plans for missing value treatment. Please explain explicitly the reason for the treatment chosen for each variable.