System.Return('/syslog')
Data EngineeringPythonProphet AIMultiprocessingSQLiteSupply Chain

Building an AI Inventory Forecasting System: From 18 Hours to 4 Hours

A technical deep-dive into how I optimized supply chain operations for 7,800+ SKUs by replacing crashing Excel models with a Python multiprocessing pipeline using Meta's Prophet AI.

Maryana Group manages approximately 7,800 SKUs across 4 branches in Oman and the UAE. Our product range is highly volatile, encompassing rapid-trend cosmetics, steady skincare lines, and seasonal personal care items.

Previously, the procurement team was spending 18+ grueling hours weekly on manual inventory planning using Excel spreadsheets.

This is the technical log of how I replaced that broken process with a fully automated, multiprocessing Python pipeline powered by Meta’s Prophet AI algorithm.


The Breaking Point

Our primary ERP is Tally Prime. While it is an excellent, robust system for raw financial accounting, extracting granular, historical sales velocity for 7,800 individual items forces the system to generate massive XML files. These exports frequently exceeded 1.5GB.

The legacy manual workflow was a nightmare:

  1. Exporting: Waiting hours for branch-wise sales reports to export as raw XML.
  2. Cleaning: Trudging through the data to fix encoding issues and filter out returns/damages. Excel would routinely freeze or crash (Not Responding) when attempting to load the 3-year historical dataset.
  3. Forecasting: Applying basic arithmetic averages for future stock predictions.
  4. Guesswork: Manually attempting to estimate massive seasonal swings (like Ramadan or Eid) based on “gut feeling” rather than data.

This approach was heavily error-prone, completely unsustainable, and resulted in millions of dollars of capital being trapped in “Dead Stock”.


The Solution: A Python ETL & AI Pipeline

I engineered a custom ETL (Extract, Transform, Load) and forecasting pipeline from scratch to eliminate human intervention.

The Technical Stack

  • Core Engine: Python 3.11
  • Data Manipulation: Pandas & NumPy
  • Forecasting AI: Prophet (Open-sourced by Meta)
  • Data Persistence: SQLite (Lightweight, robust, file-based)
  • Performance: concurrent.futures (Multiprocessing)

Resolving the 18-Hour Bottleneck

The biggest engineering challenge wasn’t writing the forecasting logic—it was executing it fast enough to be useful.

Prophet is incredibly accurate because it builds a unique statistical model for each individual time series. Generating 7,800 independent models sequentially in a classic for loop on a standard workstation took over 18 hours.

To bring this down to a usable timeframe (e.g., an overnight cron job), I had to architect a parallel processing solution that bypassed Python’s Global Interpreter Lock (GIL). Enter ProcessPoolExecutor.

The Code Architecture

By mapping the forecasting function across a pool of worker processes, I forced the server to utilize 100% of all available physical CPU cores simultaneously.

import pandas as pd
from prophet import Prophet
from concurrent.futures import ProcessPoolExecutor, as_completed
import logging

# Configure basic logging for visibility
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(processName)s - %(message)s')

def forecast_single_item(item_id: str, history_df: pd.DataFrame) -> dict:
    """
    Isolated worker function to train a Prophet model for a single SKU.
    Must be fully self-contained for safe multiprocessing.
    """
    try:
        # 1. Isolate the specific SKU's history
        df = history_df[history_df['item_id'] == item_id][['date', 'quantity']]
        
        # Prophet strictly requires columns to be named 'ds' (datestamp) and 'y' (metric)
        df.columns = ['ds', 'y']
        
        # 2. Skip items with insufficient history (Prophet needs at least ~30 data points)
        if len(df) < 30:
            return {'item_id': item_id, 'status': 'insufficient_data', 'forecast': 0}
        
        # 3. Initialize Model with strict seasonality
        model = Prophet(yearly_seasonality=True, weekly_seasonality=True, daily_seasonality=False)
        
        # 4. Critical: Add specific regional holidays to account for major retail spikes
        model.add_country_holidays(country_name='OM')  # Oman holidays (Eid, National Day, etc.)
        
        # 5. Train the model
        model.fit(df)
        
        # 6. Predict the next 30 days
        future = model.make_future_dataframe(periods=30)
        forecast = model.predict(future)
        
        # Extract the final predicted scalar value for the 30-day horizon
        predicted_demand = forecast[['ds', 'yhat']].iloc[-1]['yhat']
        
        return {
            'item_id': item_id, 
            'status': 'success', 
            'forecast': round(max(0, predicted_demand)) # Ensure no negative forecasts
        }
        
    except Exception as e:
        return {'item_id': item_id, 'status': f'error: {str(e)}', 'forecast': 0}

def execute_parallel_pipeline(all_item_ids: list, global_sales_data: pd.DataFrame):
    """
    Distributes the workload across all available CPU cores.
    """
    results = []
    
    # max_workers=None automatically defaults to the number of processors on the machine
    with ProcessPoolExecutor(max_workers=None) as executor:
        logging.info(f"Starting parallel execution pool...")
        
        # Submit all tasks to the process pool
        futures = {
            executor.submit(forecast_single_item, item_id, global_sales_data): item_id 
            for item_id in all_item_ids
        }
        
        # Gather results as they complete, out-of-order
        for future in as_completed(futures):
            results.append(future.result())
            if len(results) % 500 == 0:
                logging.info(f"Processed {len(results)} / {len(all_item_ids)} SKUs...")
                
    return pd.DataFrame(results)

By leveraging ProcessPoolExecutor, the execution time plummeted from 18+ hours to just ~4.5 hours, turning a weekend-destroying task into a simple automated overnight process.


Handling Hijri Seasonality (Oman/UAE)

A standard moving average algorithm fails spectacularly in the Middle East. Retail demand spikes drastically during Ramadan and Eid, which trace the lunar Hijri calendar and shift backward by roughly 11 days every Gregorian year.

Prophet natively solves this. By invoking model.add_country_holidays(country_name='OM'), the AI automatically identifies these floating holidays in the historical dataset and maps massive demand multipliers to the correct future dates, preventing devastating stockouts during peak shopping seasons.


The Output & Results

The final output is no longer a massive Excel file. It is a strictly filtered, decision-ready CSV that explicitly tells the purchasing manager exactly what to reorder, when, and exactly how many units are required to breach the statistical safety stock threshold.

  • Time reduced: 18 hours of manual Excel wrangling → 4 hours of automated backend processing.
  • Forecast accuracy: Improved by a measured 22% compared to the legacy Moving Average method.
  • Process: 100% automated from the moment the XML is exported from Tally to the generation of the final procurement list.

🧠 Key Engineering Takeaways

  1. Data Quality determines everything: Approx. 60% of the project engineering time went into writing resilient lxml parsers to seamlessly handle the corrupted XML strings generated by the legacy ERP. If the ETL layer fails, the AI layer is useless.
  2. Prophet’s Limitations: Prophet is incredibly powerful, but it requires a solid baseline of history to identify trends (minimum 30+ transactions). For brand new SKUs with zero history, the system must elegantly fallback to a simple generic moving average.
  3. Right-sized architecture: It’s tempting to throw Postgres or MongoDB at everything. However, for an offline, chron-job ETL pipeline spanning ~1 million rows, SQLite is phenomenally fast, has zero network overhead, and requires zero maintenance. It was the perfect tool for the job.