RiskSentry ML

Python Flask scikit-learn Machine Learning Risk Analysis Pattern Detection

Project Overview

RiskSentry ML is a demonstration project showcasing practical machine learning implementation for transaction risk analysis. Built as a learning exercise, it implements a Random Forest Classifier model through a Flask API, demonstrating the complete workflow from data processing to model deployment.

Core ML

Random Forest
Feature Engineering
Pattern Detection

Implementation

Flask API
Real-time Processing
Interactive UI

Data Focus

Transaction Analysis
Risk Scoring
Pattern Recognition

Risk Assessment Interface

Transaction Analysis

Comprehensive risk analysis with multiple detection patterns:

Risk Patterns

  • Location anomalies
  • Time-based risks
  • Amount patterns
  • Transaction combinations

Detection Features

  • High-risk country detection
  • Night transaction analysis
  • Suspicious amount ranges
  • Pattern correlation

Model Analytics

Performance Metrics

  • Cross-validation scores
  • Feature importance
  • Confusion matrix
  • ROC curves

Real-time Monitoring

  • Prediction confidence
  • Processing latency
  • Model drift detection
  • Error tracking

Technical Architecture

ML Pipeline

  • Model: Random Forest Classifier
  • Features: 15 engineered features
  • Processing: Real-time scoring
  • Validation: Cross-validation setup

API Layer

  • Framework: Flask with REST endpoints
  • Processing: Async prediction handling
  • Security: Request validation
  • Format: JSON response structure

Implementation Details

Feature Engineering

  • Time-based pattern extraction
  • Transaction amount analysis
  • Location risk scoring
  • Behavioral pattern detection

Model Development

  • Synthetic data generation
  • Hyperparameter optimization
  • Cross-validation implementation
  • Performance metrics tracking

Data Processing Pipeline

Feature Engineering

  • Time-based feature extraction
  • Geographic risk analysis
  • Amount pattern detection
  • Transaction type classification

Model Processing

  • Random Forest classification
  • Multi-factor risk scoring
  • Pattern correlation analysis
  • Real-time prediction pipeline

Technical Implementation

Core Technologies

  • Python with Flask web framework
  • scikit-learn ML implementation
  • NumPy/Pandas data processing
  • Bootstrap responsive design

Machine Learning Pipeline

  • Random Forest Classifier model
  • Feature engineering system
  • Pattern detection algorithms
  • Probability-based scoring

API Architecture

  • RESTful endpoint design
  • CORS security configuration
  • Input validation system
  • Error handling protocols

Data Engineering

  • Synthetic data generation
  • Pattern simulation
  • Time series processing
  • Data normalization

Performance Features

  • Memory optimization
  • Response time tuning
  • Resource management
  • Efficient data handling

Testing Framework

  • Unit test implementation
  • API endpoint validation
  • Pattern detection testing
  • Response verification

Code Implementation Highlights

Pattern Detection System

Multi-factor risk analysis with weighted scoring:


# Complex pattern detection with weighted risk factors
pattern_risks = {
    'location_anomaly': bool(df['location'].iloc[0] in ['RU', 'BR', 'CN', 'UK']),
    'time_anomaly': bool(0 <= df['hour'].iloc[0] <= 5),
    'amount_anomaly': bool(
        df['amount'].iloc[0] < 10 or  # Small amounts (card testing)
        df['amount'].iloc[0] > 500 or  # Large amounts
        (df['amount'].iloc[0] >= 900 and df['amount'].iloc[0] <= 999.99)  # Suspicious range
    ),
    'transaction_type_risk': bool(
        (df['transaction_type'].iloc[0] == 'atm' and 0 <= df['hour'].iloc[0] <= 5) or
        (df['transaction_type'].iloc[0] == 'online' and df['amount'].iloc[0] < 10)
    )
}

risk_weights = {
    'location_anomaly': 1.5,
    'time_anomaly': 1.0,
    'amount_anomaly': 2.0,
    'transaction_type_risk': 1.5
}
                    

Feature Engineering

Complex risk scoring and indicator generation:


# Composite risk indicators with multiple factors
df['risk_score'] = (
    ((df['hour'] < 6).astype(int)) + 
    df['amount_999'].astype(int) * 2 +
    df['amount_high'].astype(int) * 2 +
    df['is_high_risk_location'].astype(int) * 3
)

df['card_testing_risk'] = (df['small_amount'] + df['high_velocity'] + df['is_online']) / 3
df['location_time_risk'] = (df['is_high_risk_location'] + (df['hour'] < 6).astype(int)) / 2
                    

Synthetic Data Generation

Intelligent pattern-based test data creation:


def generate_fraudulent_pattern(user_profile):
    patterns = [
        # Card testing pattern
        lambda: {
            'amount': np.random.uniform(1, 5),
            'location': np.random.choice(['UK', 'RU', 'BR', 'CN']),
            'is_fraud': 1,
            'is_card_testing': True
        },
        # After hours + location change
        lambda: {
            'amount': user_profile['avg_amount'] * np.random.uniform(3, 6),
            'location': np.random.choice(['UK', 'RU', 'BR', 'CN']),
            'timestamp': datetime.now().replace(hour=np.random.randint(1, 5)),
            'is_fraud': 1
        }
    ]
    return np.random.choice(patterns, p=weights)()
                    

Risk Display Algorithm

Intelligent probability normalization for UI:


// Intelligent risk display with normalized probability
const rawProbability = result.fraud_probability * 100;
let displayProbability;

if (rawProbability <= 11) {
    // Normal range (0-11% maps to 0-30%)
    displayProbability = (rawProbability / 11) * 30;
} else if (rawProbability <= 29) {
    // Elevated range (12-29% maps to 31-70%)
    displayProbability = 30 + ((rawProbability - 11) / 18) * 40;
} else {
    // High risk range (30-36% maps to 71-100%)
    displayProbability = 70 + ((rawProbability - 29) / 7) * 30;
}
                    

Development Challenges

Model Training

Creating realistic synthetic training data

Solution:

  • Generated balanced pattern sets
  • Implemented varied risk scenarios
  • Validated pattern distributions

Real-time Processing

Optimizing prediction response time

Solution:

  • Implemented feature caching
  • Optimized model loading
  • Added response streaming

Credits

This project serves as a demonstration of machine learning applications in risk analysis. Built with scikit-learn for the ML components and Flask for the API layer, it showcases the potential of automated risk assessment in financial transactions.

Special thanks to the open-source community for the powerful libraries and tools that made this project possible. The project emphasizes the practical application of machine learning techniques for pattern recognition and risk analysis.