The Metro City Transit Dilemma: Predicting Breakdowns Before They Happen

The Case

The alert on your screen flashes for the third time this morning: "Service Disruption - Route 12." For Metro City Transit (MCT), this means another unscheduled bus breakdown, another group of stranded passengers, and another expensive emergency repair. As the newest Data Analyst on the team, you were hired to help solve this exact problem. The city manager is tired of budget overruns and angry citizen emails.

Your new boss is Anya Sharma, the Head of Fleet Maintenance. Anya is a legend at MCT, with 20 years of experience keeping the city's aging fleet on the road. She trusts her gut and the instincts of her veteran mechanics far more than any spreadsheet. She sees the breakdowns as an unavoidable cost of doing business with older vehicles and is skeptical that your data models can tell her anything her team doesn't already know.

"The city manager wants us to be 'data-driven'," she told you yesterday, her skepticism barely concealed. "He thinks a computer can predict when a 12-year-old bus is going to have its transmission fail. I think that's wishful thinking." The fleet is a mix of diesel, hybrid, and a few new electric buses, and the problems seem to pop up everywhere.

To prove her point, she's given you a challenge. She's forwarded you a raw data dump from the fleet management system, containing a mix of operational stats and maintenance logs from the past few years. "Here's the data you wanted," her email read. "It's messy. It's probably incomplete. See if you can find anything in there that will actually help us stop a breakdown before it happens. Show me something that can help my team decide which bus to bring in for service on Monday morning."

This is your first major project and your chance to demonstrate the value of a predictive approach. The data is in your hands. Your task is to turn this jumble of numbers into a clear, actionable strategy that can win over a skeptical expert and get MCT's buses running on time.

Resources and Data

Anya has provided you with the following files to begin your analysis. You'll need to examine them carefully to formulate your plan.

Metro City Transit - Fleet Data (2020-2023)

Bus IDAge YearsMileageLast Service DateEngine Temp AvgOil Pressure AvgVibration LevelFuel TypeMajor Failure Next 30 Days
MTA-7811124821052022-11-20102.538.21.85Diesel1
MTA-820483156782023-01-1599.845.11.22Diesel0
MTA-6559145984322022-10-05104.1-55.02.15Diesel1
MTA-913051987502023-03-22None52.80.76Hybrid0
MTA-E1012654302023-05-10NoneNone0.25Electric0
MTA-7902114501112023-01-08101.941.51.67Diesel0
MTA-884562405002023-04-0198.25230.10.95Hybrid0
MTA-6988135308882022-12-18105.336.42.01Diesel1
MTA-E1081321002023-06-01NoneNone0.18Electric0
MTA-921041502452023-02-28None55.60.65Hybrid0
MTA-7734125013452023-01-25103.837.91.95Diesel0
MTA-815093554802022-11-30100.5-25.81.45Diesel1
MTA-900552108302023-05-1897.461.20.88Hybrid0
MTA-852172891232023-03-1099.148.71.1Hybrid0
MTA-7699135452102022-09-14None35.12.3Diesel1
MTA-E1153985002023-04-20NoneNone0.45Electric0
Info Icon

Data is Rarely Perfect

Real-world datasets are often incomplete or contain errors. Before any analysis, a critical first step is to inspect the data for missing values, outliers, and inconsistencies. This process, known as data cleansing, is essential for building a reliable model.

Regression Analysis

Your Task

You are the new Data Analyst for Metro City Transit. Your first assignment is to respond to Anya Sharma's challenge. Using the provided narrative and resources, your task is to develop a plan to predict bus maintenance needs. You are not required to build a full statistical model yourself, but you must outline the complete process you would take.

Your final output should be a brief, professional recommendation for Anya that explains your initial findings from the data and proposes a clear path forward. Your goal is to convince her that a data-driven approach is not only possible but valuable for her team.

Tip Icon

How to Structure Your Response

A strong analysis follows a clear structure.

  1. Define the problem: Briefly state the core business problem.
  2. Identify core issues: Analyze the provided data. What are the immediate issues with the dataset? What initial relationships do you observe?
  3. Identify possible solutions: Describe the steps you would take to clean the data and perform a regression analysis.
  4. Recommend a best solution: Propose a clear plan of action and explain how your proposed model will help Anya's team move toward predictive maintenance.

Guiding Questions

Use these questions to help structure your thinking as you analyze the case and prepare your recommendation.

  1. What is the primary business problem Metro City Transit is facing, according to the narrative?
  2. After reviewing the mct-bus-maintenance-history dataset, what specific data quality issues (missing values, errors, outliers) can you identify?
  3. Which variables in the dataset do you hypothesize might be strong predictors of a Major_Failure_Next_30_Days? Why?
  4. What specific steps would you need to take to clean and prepare this dataset before you could use it for regression analysis?
  5. In the context of this problem, what is the dependent variable? What are the independent variables?
  6. If your regression model showed that Age_Years and Engine_Temp_Avg have a strong, positive correlation with Major_Failure_Next_30_Days, how would you explain this finding to Anya Sharma in simple, non-technical terms?
  7. Based on your complete analysis, what is your final recommendation for MCT? What are the immediate next steps you would propose to Anya?

An Expert Response

Info Icon

An Expert's Approach

This is one example of a strong response. Your own analysis might have different points of emphasis or a unique structure. The goal is to apply the core principles of data analysis to the problem, so other excellent responses are possible.

To: Anya Sharma

From: [Your Name], Data Analyst

Subject: Plan for Predictive Maintenance Analysis

Anya,

Thank you for providing the fleet data. After an initial review, I'm confident we can develop a useful tool to help your team prioritize maintenance and prevent breakdowns. Here is a brief outline of my findings and a proposed plan.

1. The Business Problem MCT is currently in a reactive maintenance mode, leading to service disruptions and high costs. Our goal is to shift to a predictive model, using data to identify at-risk buses for proactive service.

2. Initial Data Analysis & Core Issues The historical dataset is a great starting point. My initial review confirms your suspicion that the data is "messy." Specifically: * The Engine_Temp_Avg column has several missing entries. * The Oil_Pressure_Avg column contains impossible data points, including negative values and extreme outliers.

These data quality issues are common and must be resolved before we can build a reliable model.

3. Proposed Process: From Raw Data to Actionable Insights I propose a four-step process: * Step 1: Data Cleansing. I will address the data quality issues. Missing temperature data can be filled using an average for that bus model (imputation). The erroneous oil pressure data points will be corrected or removed so they don't skew the results. * Step 2: Exploratory Analysis. I will visualize the cleaned data to identify trends. For example, creating scatter plots to see if there's a visual link between bus age and failures. * Step 3: Model Building. I will use logistic regression to model the relationship between operational data (age, mileage, temp, etc.) and the likelihood of a major failure. This is the right technique because our target—Major_Failure_Next_30_Days—is a "yes/no" outcome. * Step 4: Interpretation. The model will produce a "risk score" for each bus. It will tell us how much each factor, like an extra year of age or a 5-degree increase in temperature, raises the probability of a failure.

4. Recommendation & Next Steps I recommend we proceed with this plan as a pilot project. The outcome will not be a magic box that replaces your mechanics' expertise. Instead, it will be a "Bus Risk List" that we can give your team every week. This list will rank every bus in the fleet by its probability of failure.

Your team can then use their deep knowledge to focus their attention on the top 5 or 10 buses on that list, using this tool to guide their expert inspections. This approach combines the power of data with the invaluable experience of your team to catch problems before they happen.

I am ready to begin the data cleansing process immediately and can provide a preliminary risk list for a subset of the fleet within a week.

Assess Yourself

Info Icon

Reflect on Your Analysis

Use the following criteria to evaluate your own response to the task. Consider where your analysis was strongest and where you could provide more detail or a clearer rationale.

Learning Progress

By working through this case, you have practiced the essential skills of a data-driven asset manager. You evaluated a raw dataset for quality issues, outlined the steps for conducting a regression analysis, and considered how you would interpret the model's results to make a concrete business recommendation.

Next Steps

Excellent work applying your analytical skills to this real-world challenge. You've successfully outlined a plan to move Metro City Transit toward a more predictive, efficient maintenance strategy. Please navigate back to the course to continue your learning journey.