Engineering > STUDY GUIDE > ISYE 6501 Homework 12: Power Company Case (All)

ISYE 6501 Homework 12: Power Company Case

Document Content and Description Below

ISYE 6501 Homework 12: Power Company Case In approaching the power company case study, I find it best to break the problem into three more discrete and manageable parts. For each of these parts we c ... an then better define the specific data needs, model usage, and results for each individual part. Then, it is possible bring these individually addressed issues together to create a cohesive analysis for the case. The three primary addressable parts are the following: 1) Classifying a non-paying household that never intends to pay 2) Predicting benefits and cost of shutoffs 3) Optimizing the shutoff process within the company’s operational capacity Part 1: Classification of household In addressing this portion of the case, it is vastly important to have a broad database of customer/household records, information, and usage within legal boundaries. If such a centralized source of data has not yet been established within the company the construction of a customer database and a system to collect and clean the data would be priority one. Given that such a database is in place we would begin addressing our classification model by first aggregating a dataset of features that could be useful in identifying separability between classes. Some examples of features that would likely aid in model development include the following: - Unique customer ID (numeric) - Address (tuple; latitude, longitude) *use latitude/longitude for python Kepler mapping library* - Account Status (binary; 1 = non-current account; 0 = current account) - Cumulative Months Served (numeric) - Cumulative Amount Past Due (numeric) - Days Since Last Payment (numeric) - Credit Score (numeric) - Residential Address (binary; 1 = household/residential address, 0 = commercial address) - 12 Month Usage History kWH (numeric time series) (non-customer/missing = NaN) - 12 Month Payment History (numeric time series) (non-customer/missing = NaN) - 12 Month On-Peak % Usage (numeric time series) (non-customer/missing = NaN) - Number of Residents (numeric) - Economic Stress Indicator (binary; 0 = non-contractionary, 1 = contractionary) - Labeled Historical Classes This list of variables (given we don’t have actual data) is just a back of envelope list of factors that seem analytically important from a surface level at classifying customers. There are likely factors included above that would yield insignificant coefficients or would be removed via feature shrinkage techniques. I also find it important to split the dataset before moving onto classification via the binary variable current vs. non-current and days since last payment greater than 30. The reason I find it important to do this is to simplify our classification task. There are three groups: This study source was downloaded by 100000834091502 from CourseHero.com on 05-16-2022 06:48:56 GMT -05:00 https://www.coursehero.com/file/63361583/POWER-CASE-SOLNdocx/ 1) People who are consistently timely paying customers 2) People who are late but intend to pay 3) People who have no intention of paying It would thus be greatly preferable in this situation to limit our model to only having to tell the difference between the latter two classes. Removing the first label would allow us to stick to binary classification models which would reduce complexity. Additionally, given a customer is current it is unlikely the company would be justified in cutting off their power before becoming late. Additionally, I would like to engineer another feature using a CUSUM model over customer activity time series data. In this instance the result would be a binary variable reflective of whether our CUSUM model for the respective customer has flagged a significant change on for example the customers payment activity or usage. At this stage it is also important that we have used proper methods to handle missing data values. Next, I would divide the dataset into training, validation, and testing sets. Now that our data is established and separated, we move on to defining what binary classification model we will need to use. In this case I find it best to use a logistic regression over a reduced features space if necessary (PCA, Lasso, Elastic Nets, or Ridge) to yield models that are reasonably accurate based on our training data. These models will then be used on the validation data of which we select the best performing model. This best performing model is finally brought to our test set where it out of sample performance will be evaluated and hopefully be satisfactory at predicting customer classes. The use of logistic regression allows us to monitor the change in customer class probabilities. However, it also means we must choose a cutoff threshold. The cutoff threshold of the model will allow us to implement an adjustment for the costs of misclassification which we will form in part 2 of the case by modeling to cost and benefits of shutoff. In summary: Given: - Wide range of customer data Use: - CUSUM - Imputation Methods - Logistic regression models - Dimensionality Reduction (PCA) and or Variable Shrinkage Methods - Validation Techniques To: - Engineer binary features - Properly adjust NaN values - Accurately predict customer classes - Reduce model complexity, correlation, and/or highlight non-correlated variance - Unbiasedly accept the best model This study source was downloaded by 100000834091502 from CourseHero.com on 05-16-2022 06:48:56 GMT -05:00 https://www.coursehero.com/file/63361583/POWER-CASE-SOLNdocx/ Part 2: Modeling Benefits and Cost of Shutoffs [Show More]

Last updated: 3 years ago

Preview 1 out of 4 pages

Buy Now

Instant download

We Accept: