Unpickle Best Model
Database Insert
Database Query
Pickle Best Model
I-Tags Represent
Risk to Future Missed
Payments
Store Predictions
as I-Tags
Prediction Task
Load Unseen
Data Task
Airflow
Prediction DAG
Notify
Model Development
- Explore multiple algorithms and optimize with cross validation of model performance and grid/random search for optimal hyper parameters
Database
Query
Transformation
Pipeline Development
- Develop data transformation functions to be implemented by Airflow
- Implement tranformation functions in a transformation pipeline to be executable on both training and test data separately
- important for proper isolation during future training data refreshes (avoid training data getting into test data)
Exploratory
Data
Analysis
Jupyter Notebook
Evaluate
& Select
Best Model
- un-pickle training models
- predict test data labels
- evaluate performance
- select and pickle best model for production
Train Model N
Train Model 2
Train Model 1
- un-pickle training data frame
- implement training function to produce model
- pickle model
Transform Task
- Receive path from x-com and un-pickle data frame
- Execute Scikit-Learn transformation pipeline
- Pickle transformed training and test data frames
Load Task
Database-Prototype: Local Postgres Container
Database-Production: Hyundai Card Hive DB
Pickle Pandas data frame and transfer its path to next task
Hyundai Card
Borrower Data
Find similar tags to those in the prototype to implement a real prediction model
Mock Borrower
Data
Develop prototype while awaiting permission to access Hyundai Card database
Idea: Identify Potential
Future Missed Payments
Prediction of late credit card payments to aid Hyundai Card with
- prior to a missed payment, market repayment plans to high risk customers who may be struggling to make monthly payments
- capturing borrower changes in financial circumstances over time
- repayment forecasting
Airflow
Training DAG
Develop I-tag that indicates borrow risk of a future missed credit card payment