Automated AI Inventory Forecasting: Python & Prophet

Overview

Python Console Output Tracking

The supply chain team at Maryana Group managed replenishment for roughly 7,800 SKUs across four retail branches. Every week, staff had to work through years of historical transaction data exported from the legacy Tally Prime ERP — a process built on crashing Excel spreadsheets, manual guesswork, and 1.8 GB XML dumps. Stock imbalances were the norm: fast-moving items ran out while slow-moving inventory tied up capital.

I built a Python-based ETL and forecasting engine that automated this entirely. It ingests massive XML exports from Tally, sanitizes the data, and runs time-series predictions using Meta’s Prophet algorithm. What used to take days of manual spreadsheet work now completes in about 4.5 hours with no human intervention.

1. The Challenge: Big Data vs. Manual Processes

This manual approach was breaking down in several ways:

System Limitations: Excel frequently crashed when handling the 3-year historical dataset, which weighed in at nearly 1.8 GB of raw XML data.
Data Corruption: The raw XML exports contained missing tags, malformed nodes, and inconsistent character encodings (often a mix of UTF-8 and UTF-16) that broke standard import scripts.
Human Error & Bias: Without a systematic forecasting method, order quantities were based on branch manager estimates. This led to overstocking of slow-moving items and stockouts of high-velocity “A-Class” items.

2. System Architecture: The Automated Pipeline

I built a custom Python application designed to handle the full data lifecycle autonomously: Ingest → Sanitize → Model → Forecast → Report.

Inventory Forecasting ETL & AI Pipeline Architecture

Architecture flow: Heavy XML Export → lxml parsing → Pandas dataframe sanitization → SQLite indexed storage → Multiprocessed Prophet AI modeling → Final Actionable CSV.

Key Technical Subsystems

High-Speed ETL Engine: Used the lxml C-library wrapper for fast parsing of large XML files. Implemented custom exception handling to bypass corrupted XML nodes without halting the multi-gigabyte ingestion process.
Prophet AI Forecasting: Employs Facebook’s (Meta) Prophet library to generate independent time-series models for over 7,500 individual items. The model accounts for weekly seasonality and demand spikes from specific UAE and Oman national holidays (e.g., Eid surges).
Dynamic ABC Classification: Automatically segments inventory (e.g., “A — Ultra-Fast Moving”, “B — Steady”, “C — Dead Stock”) based on configurable velocity thresholds stored in an external settings.py file. This lets management adjust business logic without touching application code.

3. The Interface: Generated Output

While it’s primarily a backend tool, its output drives real decisions.

Console Output Terminal The Python console output displaying multi-threaded processing initialization and progress tracking.

The system condenses millions of rows of history into a single decision-ready CSV report. Here is a structural representation of the generated business logic:

SKU_Code,Product_Name,Category,Current_Stock,Safety_Stock,Forecast_Next_30d,Recommended_Reorder_Qty
"104882","L'Oreal Hyaluron Serum 50ml","Fast-Moving",120,50,450.5,381
"99231","Nivea Soft Cream 300ml","Steady",85,20,60.2,0
"44391","Discontinued Brand X Toner","Dead-Stock",200,0,0,0

4. Engineering Challenges & Solutions

Challenge 1: The Malformed XML Problem

Tally ERP generates legacy XML files (often 500MB+ per file) with inconsistent tags, unescaped characters, and unpredictable UTF-16 encoding errors that caused standard Python decoders to throw continuous UnicodeDecodeErrors.

The Fix: I bypassed standard library tools and wrote a hardened parser using lxml.etree.iterparse(). This let the script stream the file piece-by-piece from disk instead of loading 1.8 GB into RAM (which caused MemoryError crashes on standard office PCs). I combined this with a text pre-processor that stripped invalid Unicode characters before passing the stream to the XML parser.

Challenge 2: Performance Bottlenecks

Generating individual forecasting models is computationally expensive. Running Prophet sequentially across 7,800 items on a standard desktop CPU was projected to take over 18 hours.

The Fix: I refactored the application to use Multiprocessing Parallelism.

I implemented Python’s concurrent.futures.ProcessPoolExecutor.
Instead of processing one item at a time, the system divides the 7,800 SKUs into batches and spawns sub-processes mapped to the number of physical CPU cores available.
Result: By achieving near 100% CPU utilization across 8 cores, total processing runtime dropped from ~18 hours to about ~4.5 hours, letting reports generate overnight.

5. Final Outcome & Business Value

This system replaced manual estimates with data-driven ordering logic.

Efficiency: Fully automated the weekend work of data cleaning and consolidation, saving dozens of human-hours per month.
Capital Optimization: By accurately identifying and forecasting items, purchasing budgets were re-allocated. Capital previously tied up in slow-moving stock was redirected toward fast-moving essentials.
Scalability: The lxml and SQLite foundation handles years of historical gigabyte-scale data without the memory crashes that plagued the legacy Excel workflow.