This is a great video presentation on Fraud Analytics use case with RapidMiner. See my notes below.
Some key concepts
- More complex model, the lower the training error but higher test error.
- Simple models are better, try explaining them to children.
- Data Scientists understand the technical aspect, need to communicate results with analysts.
- Sell results to businesses. Tie $ to the results.
- Speak same language with business. Map performance metrics to business related figures.
- AUC and recall doesn't necessarily mean $ to the business, show how.
- A/B testing method widely used in Marketing. Also consider a "do nothing" model and compare with implementing data science solution.
- Don't fear sharing best practices and ideas with similar businesses.
- Fraud model follows traditional validation method. 80% Training and 20% as Holdout.
- Both training/holdout sets taken across same time period.
- Handy trick, use sum of transactions as example weights. (this is cool)
- Apply $ value to your true/false positive/negatives.
- Compare with Default model (no model).
- Generate a money plot!
- How does this relate to regression?
- If a simple model not good enough, how do you sell a complex one like Deep Learning?
- Is it better to have the Data Science team be embedded in the Business Unit or as a separate team?
- How do you try to explain the uncertainty of prediction intervals to business stakeholders?
- How do you account for seasonal drift?
- The model will drift overtime, should the model be updated or retrained over time?
- Do you build a model to optimize business results or is it a byproduct of the prediction?