Hamid Vakilzadeh

Ph.D · CFE

Applied AI/ML builder — LLM evaluation, observability & agentic systems. I ship production AI, eval frameworks, and open-source infrastructure for high-stakes financial workflows.

Chicago, IL U.S. Permanent Resident · No sponsorship required

Profile

Applied AI/ML engineer focused on LLM evaluation, observability, and agentic systems. Builds production AI applications, multi-agent architectures, MCP servers, and evaluation frameworks for high-stakes, high-accuracy workflows — including anti-hallucination pipelines, LLM-as-judge calibration, and deterministic citation-grounding audits. Recently designed correctness monitoring for an AI tax-preparation agent and built financialreport.ing, an AI equity-research platform with full Langfuse / OpenTelemetry observability. A rare combination of production AI engineering, evaluation rigor, open-source systems, and financial-domain depth.

↳ Filter the work below by focus area

Selected Work — AI, Evaluation & Open Source

Skills Languages Tools

financialreport.ing · Creator & Lead Engineer

2026
AI-native equity-research platform Production
  • Architected a multi-agent LLM system — lead orchestrator, parallel specialist sub-agents, research planner, and an LLM-as-judge verifier — with tiered model routing for ~40% cost savings and an evaluator-optimizer loop that revises until every figure is verified.
  • Built an anti-hallucination pipeline pairing an LLM verifier with a deterministic, pure-code citation-grounding audit that re-matches every cited number against source SEC/XBRL facts at any unit scale — making each figure traceable to its filing.
  • Engineered a custom LLM evaluation harness with a numeric hallucination guard, enforced by an automated test suite with CI gating on every change.
  • Established LLM observability with Langfuse + OpenTelemetry span trees (per-run, per-tool, per-generation token / cost tracking) on a serverless Cloudflare Workers + React 19 stack.
LLM evaluation LLM-as-judge Hallucination detection Citation grounding Multi-agent Observability TypeScript Cloudflare Workers Langfuse OpenTelemetry Hono React Vitest

LLM Correctness Monitoring · Accrual Inc.

May – Jun 2026
Contract Data Scientist / ML Engineer Contract
  • Designed a correctness-monitoring framework for an agentic AI workflow, separating model-owned input quality from downstream system behavior.
  • Defined SLIs, a multi-label error taxonomy, human labeling workflow, and an LLM-as-judge calibration process for high-risk outputs.
  • Built dashboard-ready metric definitions for evaluation coverage, review triage, and judge-vs-human alignment.
LLM-as-judge LLM evaluation SLIs Observability Python TypeScript SQL Temporal Metabase

AI Research Assistant (AIRA) · Creator & Lead Developer

2023 – Present
Production AI platform Production
  • Evolved from RAG-based literature Q&A into an MCP-enabled agentic system for source-grounded search, citation-network analysis, and full-text extraction across Semantic Scholar, arXiv, and Wiley.
  • Scaled to 100,000+ queries/month across hosted and local workflows — Smithery.ai, Claude, Cursor, Claude Code, and npm installs.
  • Published system architecture and evaluation results in the Journal of Information Systems (ABDC A).
Agentic Workflows Vector DBs Vector search TypeScript MCP RAG Smithery.ai NPM

VictorAI.bot · Founder & Developer

2024 – Present
AI Teaching Assistant Production
  • Built an AI teaching assistant integrated with Canvas LMS, automating student support and curriculum-aligned Q&A for university courses.
  • Developed multi-turn conversational AI with context-aware retrieval across course materials, syllabi, and assignment databases.
Agentic Workflows Conversational AI Python TypeScript Next.js Firebase Google Cloud Canvas LMS

XBRL-US Python Package · Developer

2023 – Present
Open-source · PyPI Open Source
  • Created and maintain the official Python API wrapper for XBRL.US with 30,000+ downloads, enabling programmatic access to structured financial data.
  • Designed clean API abstractions for complex SEC filing queries, cutting analyst data-extraction time from hours to minutes.
API design Open source Python PyPI SEC EDGAR XBRL

XBRL-US MCP Server · Co-Developer

2025 – Present
Agentic AI system Open Source
  • Co-built an MCP server for querying as-filed SEC, FERC, and ESEF reports with structured XBRL context — labels, periods, units, dimensions, and provenance; ranks on page 1 of Smithery with 26,000+ connections.
Agentic Workflows Provenance Python MCP SEC/FERC/ESEF

Academic Appointments

Associate / Assistant Professor of Accounting · UW–Whitewater

2019 – Present
  • Lead AI, ML, and NLP research applied to financial reporting, audit quality, and corporate governance; published 9 peer-reviewed articles (600+ citations).
  • Serve on the Chancellor's AI Advisory Committee as Business School Dean's Liaison; teach Accounting Information Systems (mean eval 4.48/5.00); mentor 10+ doctoral students.
  • Delivered 20+ invited talks and keynotes on generative AI for the Michigan CPA Association, AAA, and XBRL.

Selected Publications

Is it All Hype? ChatGPT's Performance and Disruptive Potential in Accounting and Auditing

Eulerich, Sanatizadeh, Vakilzadeh & Wood · Review of Accounting Studies, 2024

Best Paper180 citationsBloomberg

The Development of a RAG-Based Artificial Intelligence Research Assistant (AIRA)

Vakilzadeh & Wood · Journal of Information Systems, 2025

First production-RAG architecture paper in accounting

Corporate Culture Similarity between Audit Firms and Clients on Reporting Quality

Golden, Mammadov & Vakilzadeh · The British Accounting Review, 2026

ABDC A*

Does Corporate Culture Impact Tax Avoidance: A Machine Learning Approach

Golden, Mammadov & Vakilzadeh · International Review of Financial Analysis, 2024

Novel ML methodology

The Implications of AI Agents for the Accounting Profession

Vakilzadeh & Wood · Working Paper

Working paper

9 peer-reviewed articles (2 A*) · 2 under review · 4 working papers · 600+ total citations