How Boards Can Shape AI Strategy

Generative AI (GenAI) continues to capture the enterprise imagination, sparking widespread experimentation with its potential to reshape productivity and innovation. Initial pilots often deliver impressive, headline-grabbing results, but a significant disconnect has emerged: many promising GenAI initiatives become ensnared in “pilot purgatory,” failing to transition from controlled experiments into scalable, production-ready systems that deliver real business value. The problem isn’t the technology, but rather the enterprise’s lack of readiness for operationalization.

Businesses now demand accountability and demonstrable return on investment (ROI) from GenAI, not just intriguing demos. This shift from experimentation to execution puts pressure on technology leaders to bridge the gap between pilot potential and production reality.

Remaining stuck in the pilot phase isn’t just frustrating; it’s risky. It represents wasted resources and opens the door for competitors to gain an advantage. Internal momentum fades, and significant potential value—productivity gains, cost savings, and new revenue streams—remains untapped. The initial ease of piloting, often using generic tools, can mask the profound infrastructure, data, security, and organizational hurdles that arise when attempting to scale. If you want to successfully operationalize GenAI in your enterprise, you must understand why so many pilot programs stall.

Why promising GenAI pilots hit a wall

The journey from a successful GenAI pilot to a scaled production deployment is proving difficult for most organizations. Harvard Business Review notes that 80% of AI projects fail. IDC found that in one study, only 12% of AI proof of concept projects launched in production, while Gartner predicts that 30% of GenAI projects will be abandoned after proof of concept by the end of 2025 due to poor data quality, inadequate risk controls, escalating costs, or unclear business value. Several interconnected technical and organizational factors contribute to constricting GenAI’s enterprise potential.

You likely have already experienced some technical hurdles. Production systems, especially real-time applications, demand low, predictable latency, which generic APIs often can’t guarantee, and pilots don’t often need. Scaling inference infrastructure efficiently is a major challenge, especially since production requires processing continuous streams of real-world enterprise data, which is often messy, siloed, and inconsistent.

Poor data quality, inadequate preparation pipelines, and integration difficulties are primary reasons for failure. Scaling also demands rigorous security protocols, data sovereignty controls, granular access management, and compliance with regulations (GDPR, HIPAA, etc.). These risk concerns are often disregarded during pilots, but quickly become barriers in production. General-purpose models used in pilots often lack the nuanced understanding of specific business logic, leading to inaccurate or irrelevant outputs (such as hallucinations). Production AI often must integrate seamlessly with existing enterprise systems (ERP, CRM) and many organizations lack mature Machine Learning Operations (MLOps) for deployment, monitoring, retraining, and ensuring reliability at scale.

You may also experience plenty of organizational friction. Pilots often lack a clear business case, defined KPIs tied to strategic goals, and methods for measuring impact, hindering funding for scaling. Significant and unpredictable costs are major inhibitors, especially because hiring across MLOps, platform engineering, data engineering, and AI security can be expensive. Additionally, pilots originating in isolated labs struggle to scale without cross-functional collaboration (IT, data science, business lines, legal, compliance, and ops) and clear ownership. Finally, if you don’t have AI governance frameworks, risk management protocols, and ethical guidelines, employees can struggle with uncertainty around deployment.

These blockers are interconnected. Poor data quality undermines ROI demonstration,security issues heighten risk aversion, and lack of MLOps talent prevents overcoming technical scaling challenges. Successfully navigating these obstacles requires a holistic strategy addressing technology, data, security, process, people, and governance. Generic tools often fail under this complexity, highlighting the need for solutions that offer greater control.

Achieving enterprise-ready GenAI

Production-grade GenAI requires building a comprehensive system designed for the enterprise environment, going far beyond simply accessing an LLM. An enterprise-ready GenAI system must possess:

Models that are fine-tuned on relevant data, or task-specific models optimized for enterprise functions, reducing hallucinations and ensuring accuracy
Consistent, low-latency response times meeting defined SLAs for real-time applications
Virtual Private Clouds (VPCs), private cloud, or on-premises solutions for data sovereignty, regulatory compliance (GDPR, HIPAA), and to protect sensitive data
Security, privacy, responsible AI principles, audit trails, and risk management
Mature MLOps capabilities for automating deployment, performance and cost, retraining, QA loops, versioning, and system observability

In many cases, easy connection with existing enterprise systems (CRM, ERP, etc.) via well-documented APIs/SDKs is also vital. In all cases, your architecture must handle enterprise-scale demands reliably and cost-effectively, ensuring high availability and fault tolerance.

Achieving all the above requires significant engineering and strategic effort after the pilot. This means a slew of considerations: architectural decisions, data pipeline construction, infrastructure provisioning, security implementation, integration development, MLOps workflow creation, and change management.

Levers for scaling successfully

Successfully scaling GenAI requires mastering both technical capabilities and organizational dynamics. In other words, intelligent rollout strategies encompass both tools and people.

Technical levers

Key technical levers include domain-specific customization, human-in-the-loop (HITL), and modular architecture.

Domain-specific customization is critical, no matter your enterprise’s sector. This can involve techniques like fine-tuning foundation models on proprietary enterprise data or employing task-specific models explicitly designed for functions like summarization or grounded Q&A. This customization is essential for helping achieve the accuracy, relevance, and adherence to business logic required for reliable enterprise use.

HITL refers to integrating human oversight, not merely as a stopgap, but as a strategic necessity for ensuring quality, building trust, and handling the inevitable edge cases where AI might falter. HITL processes should be designed for specific purposes, such as validating critical outputs, providing feedback for continuous model improvement, ensuring fairness, and maintaining control in high-risk applications.

Designing your AI system with modularity allows for greater flexibility and resilience. This enables you to swap out models as better ones become available, integrate new tools or data sources, and adapt the system to evolving business needs without requiring a complete rebuild. This approach also helps avoid vendor lock-in and promotes long-term adaptability.

Organizational levers

Key organizational levers include cross-functional teams, business value focus, and change management.

Siloed development is a recipe for failure during scaling up. You need dedicated, cross-functional teams that bring together expertise from IT, data science, the relevant business units, product, legal, compliance, and operations from the outset. Clear roles, shared objectives, and robust governance structures increase the chances of success.

You should also maintain relentless focus on tangible business value, continuously prioritizing use cases based on their potential to deliver results—be it increased efficiency, reduced costs, improved customer satisfaction, or new revenue streams. Define clear, measurable KPIs that go beyond technical metrics like model accuracy and track progress against these business goals.

Scaling GenAI requires a workforce equipped with the right skills. This means investing in upskilling and reskilling programs to build AI literacy and proficiency across relevant roles. Equally important is managing the human impact of change: communicating transparently about AI’s role, addressing fears, fostering a culture that embraces experimentation within safe boundaries, and ensuring user adoption through training and support.

A phased rollout strategy, from pilot to limited production to full scale, helps manage risk and complexity, allowing for validation, integration testing, feedback gathering, and refinement before enterprise-wide deployment. Deploying GenAI is not just a tech project, but a strategic transformation requiring expert guidance. Building organizational capacity is as critical as building the technology itself.

Patterns of scalable GenAI impact

Despite the challenges, organizations in demanding sectors like finance and technology are successfully scaling GenAI. Their approaches reveal common patterns:

Prioritizing secure deployment architectures (VPCs, on-prem) from the start ensures data sovereignty, compliance, and trust, avoiding costly retrofitting.
Integrated project teams involving all key stakeholders (IT, data science, business, legal, compliance, and ops) ensure holistic solutions and shared ownership.
Moving beyond generic APIs, leaders form strategic partnerships with vendors offering enterprise-grade, secure, scalable solutions and expertise, accelerating deployment and bridging skill gaps.
Initial successes often involve using GenAI to augment human capabilities and drive efficiency in specific processes like AI-assisted coding, customer service support, content creation, data analysis, risk assessment, or claims processing, often with HITL validation.
Starting with a few high impact use cases allows for quicker ROI demonstration, building confidence and momentum for broader rollouts.

Below are a few real-world examples that illustrate these patterns especially well.

Persistent Systems developed ContractAssIst, an AI-powered agent built using generative AI and Microsoft 365 Copilot, to enhance collaboration, streamline workflows, and accelerate decision-making. As a result, ContractAssIst reduced emails during negotiations by 95% and cut navigation and negotiation time by 70%.

Morgan Stanley embedded GPT‑4 into its workflows, improving how financial advisors access the firm’s knowledge base and respond to client needs. Over 98% of Morgan Stanley’s advisor teams actively use the Morgan Stanley Assistant—Morgan Stanley’s internal chatbot for answering financial advisors’ questions—to seamless retrieve internal information.

These successes prove scalable GenAI is achievable now, often underpinned by purpose-built, controllable AI systems and strategic partnerships that provide enterprise-grade foundations and expertise.

Think AI systems, not just models

Transitioning GenAI from pilot to production is a strategic transformation, demanding a shift from a fascination with models to the disciplined engineering of robust AI systems. Technology leaders must move beyond “Can it work?” to address critical operationalization questions:

Are we designing for the enterprise (security, latency, and integration) from Day 1?
Do we have a clear path to demonstrating business value and ROI?
Is the necessary cross-functional structure, ownership, and governance in place?
Are we prepared for long-term MLOps, maintenance, and optimization?
Are we tailoring solutions to our unique context, ensuring control and reliability?

The era of scattered GenAI experiments is ending. High pilot failure rates signal the need for a deliberate, strategic production approach. Enterprises need a holistic strategy addressing challenges grounded in business realities.

Success requires moving beyond hype, embracing operational complexity, and investing in both technology and organizational readiness. It demands proactive planning to build GenAI systems designed not just to impress, but to scale reliably and securely.

How Boards Can Shape AI Strategy

Why promising GenAI pilots hit a wall

Achieving enterprise-ready GenAI

Levers for scaling successfully

Technical levers

Organizational levers

Patterns of scalable GenAI impact

Think AI systems, not just models

Products

Developers

Company

Resources

Trust Center

Why promising GenAI pilots hit a wall

Achieving enterprise-ready GenAI

Levers for scaling successfully

Technical levers

Organizational levers

Patterns of scalable GenAI impact

Think AI systems, not just models

Discover more

LLM Judge Models: The AI Critics You Never Knew You Needed

RAG Evaluation: You’re Doing It Wrong

AI21 Joins NVIDIA Inception for Enterprise AI

Subscribe to our newsletter