How We Went Live with AI CLM in 6 Weeks (Anonymized General Counsel Q&A)

Podcast interview with the anonymous in‑house General Counsel of a mid‑market organization. Names and identifying details have been modified or omitted.

Theo: Welcome back to The Clause & Current Show, where we talk to the people actually shipping change in legal. Today’s episode is a special one: How We Went Live with AI CLM in Six Weeks. Our guest is a General Counsel from a mid‑market company who led a fast, pragmatic rollout of AI‑powered contract lifecycle management. GC, thanks for joining us.

GC: Thanks for having me. Excited to share what we did, and what we’d do differently.

Theo: Let’s set the stage. Industry, size, and what “AI CLM” meant in your context.

GC: We’re mid‑market—around 700 employees—operating in B2B software with a global footprint across North America and Europe. Before the project, we had a traditional CLM for storage and basic workflows, but negotiations lived in inboxes and redlines in Word. “AI CLM” for us meant three things: (1) AI‑assisted intake and triage, (2) draft and review copilots grounded in our playbook, and (3) automated extraction of key terms for reporting and renewals. No magic—just faster, safer execution.

Theo: Why six weeks? That’s brisk by any standard.

GC: We tied the deadline to a revenue‑critical quarter. Sales leadership asked, “What would it take to remove legal as a perceived bottleneck without lowering our standards?” We scoped a minimum lovable product: cover our top three contract types—NDAs, vendor MSAs, and customer order forms—plus must‑have workflows. We deliberately said no to edge cases. The six‑week clock created focus.

Theo: How did you win buy‑in? AI programs can trigger skepticism.

GC: Two angles. First, we framed it as risk‑reduction by standardization. AI helped us enforce playbooks consistently. Second, we promised measurable business outcomes: cycle time and touchpoint reduction, and better renewal hygiene. We presented a simple model: “If we shave 3 days off 200 deals per quarter, what’s the revenue pull‑forward worth at our average ACV?” That got attention.

Theo: Let’s walk the timeline. Week by week?

GC: Sure.

Week 1 – Discovery & guardrails. We ran a two‑hour workshop with sales ops, security, procurement, and finance. We agreed on data boundaries—no customer data leaving our tenant, zero retention with the LLM provider where available, encryption at rest, SSO via Okta, and role‑based access. We also defined human‑in‑the‑loop checkpoints: AI could draft and propose, but legal made final calls on non‑standard clauses.
Week 2 – Playbook codification. We took our tacit knowledge and turned it into explicit instructions. We wrote positive and negative rules: “Prefer X; if counterparty pushes Y, accept if thresholds A/B/C; else escalate.” We attached sample clauses, fallback options, and annotated rationales. That became the backbone for the AI.
Week 3 – Template & metadata alignment. We standardized variables and clause IDs across NDA, MSA, and order form. We mapped the metadata we needed—renewal dates, price caps, liability limits—then defined how the system should extract and validate them, including confidence thresholds.
Week 4 – Integration & intake. We connected the CLM to Salesforce, Slack, and DocuSign. We built a single intake form with five required fields and conditional logic to keep it short. Requests via email were auto‑responded with a link to the form, which cut triage time.
Week 5 – Pilot & evals. We trained a pilot group—three sales reps, two procurement specialists, and two legal counsels. We used an evaluation set of 50 historical agreements and a small red‑teaming exercise. We tracked precision/recall on extraction, variance from playbook for drafting, and user satisfaction.
Week 6 – Launch & change management. We announced a clear go‑live date, hosted office hours, and implemented a “fast lane” for compliant deals. Post‑launch, we did daily standups for two weeks to squash issues quickly.

Theo: You mentioned guardrails up front. What did those look like in practice?

GC: Three layers. Technical: SSO, least‑privilege roles, tenant‑scoped data stores, and audit logs on every AI action—who asked what, what context was used, what changed. Policy: A written AI usage policy mapped to our code of conduct and information security standards; no uploading of third‑party sensitive data unless necessary and approved. Operational: Human review thresholds. For example, if the AI suggested a deviation on liability caps beyond our Tier‑2 fallback, it auto‑flagged for counsel review. We also implemented “reason tracing”—the AI had to cite which playbook rule or precedent informed its suggestion.

Theo: Let’s talk about the playbook. How did you turn narrative policy into something an AI could follow?

GC: We created a structured schema. Each clause entry included: (1) Intent—what the clause protects; (2) Preferred language; (3) Acceptable alternatives with conditions; (4) Red lines; (5) Negotiation notes for humans; and (6) Examples from our corpus with ratings. We wrote prompts as functions: “Given incoming clause X and contract type Y, return classification {Accept/Modify/Reject}, proposed language, and cite rules by ID.” That allowed consistent behavior and easier audits.

Theo: Sounds like you treated prompts like product.

GC: Exactly. We version‑controlled prompts and playbooks like code. When a change request came—say, new data residency requirements—we logged an issue, updated the rule, re‑ran evals, and noted the effective date. That discipline prevented “prompt drift.”

Theo: You chose three contract types. Why those?

GC: Volume and business impact. NDAs were high‑volume and low‑risk—perfect for early wins. Customer order forms directly affect revenue and touch many teams. Vendor MSAs were strategic for security and cost control. Together, they represented about 70% of our monthly legal touchpoints.

Theo: What did success look like post‑launch?

GC: We set four KPIs:

Median cycle time from intake to signature per contract type. We targeted a 30% reduction for NDAs and 20% for order forms.
Legal touchpoints per contract, measured as the number of human interventions. Target: cut one touchpoint on average.
Playbook adherence, using diff analysis between final language and preferred templates. Target: >85% adherence without legal escalation on standard deals.
Extraction accuracy, measured on required metadata fields with confidence scores. Target: >95% accuracy on renewal dates and counterparties; >90% on liability caps.

Within eight weeks post‑launch, we were at or ahead of target on three of four. Liability cap extraction lagged, so we improved clause labeling and added a second pass confirmation for caps expressed as formulas.

Theo: Let’s dig into intake. What changed for business users?

GC: We standardized the front door. Previously, sales would email whatever they had. Now, the intake form asks: contract type, counterparty, urgency, link to opportunity, and whether the counterparty paper will be used. If they select “counterparty paper,” the system auto‑creates a review task and uses AI to pre‑classify clauses against our playbook, flagging likely deviances. If they select “our paper,” the AI drafts the agreement with the right template, pre‑fills variables, and posts back to the requestor in Slack for quick review.

Theo: What about redlines? Where did AI fit in the negotiation loop?

GC: Two places. First, first‑pass review: when we received a redlined MSA from a customer, the AI annotated each changed clause with our policy position and recommended accepts or counter‑proposals. Second, counter drafting: for counter‑proposals, the AI generated language from our acceptable alternatives and prepared a rationale paragraph our counsel could tweak. We kept the lawyer as the decision maker, but the busywork was greatly reduced.

Theo: The security piece still makes legal and IT folks nervous. What due diligence did you run?

GC: We treated it like any vendor with access to sensitive data. Security questionnaire, SOC 2 Type II, penetration testing summary, data flow diagrams. We validated data residency and encryption. For the AI components, we asked about model providers, data retention, and the ability to disable training on our prompts and outputs. We also validated that PII detection was available—we run a pre‑processing step to mask personal data where possible before prompts are constructed.

Theo: Let’s talk training the humans. What did enablement look like?

GC: We separated it by audience. Sales/Procurement got a 30‑minute session focused on the new intake, what to expect, and the “fast lane” criteria. Legal got hands‑on labs: five realistic scenarios, open the AI suggestion, compare with our playbook citations, accept or edit, then log any change requests to the playbook. We also created a one‑page “skeptic’s guide” explaining limitations and how to spot hallucinations. That built trust.

Theo: Did you run into resistance?

GC: Absolutely. A common refrain was, “Will AI make the wrong call and we’ll miss it?” Our response: we’re not removing humans; we’re re‑positioning them at the highest‑value checkpoints. We showed before/after timelines. In the old world, humans spent time formatting, hunting for terms, and doing repetitive edits. In the new world, humans reviewed exceptions, assured alignment to policy, and handled novel issues. After two weeks, even the initial skeptics were submitting playbook improvement ideas.

Theo: You mentioned evals. How did you actually measure the AI components during pilot?

GC: We built a small evaluation harness. For extraction, we used 200 clauses labeled by two reviewers; we measured precision, recall, and disagreement rates. For drafting, we compared AI‑suggested language to our preferred and acceptable alternatives, scoring 0 (unusable), 1 (needs major edits), 2 (minor edits), 3 (ready). We tracked average scores by clause type. For classification (is this clause acceptable?), we measured F1 by clause category. Anything below target got a prompt or playbook tweak and a re‑run.

Theo: Any surprises in the data?

GC: Yes—acceptance criteria matters more than we expected. Early on, the AI produced plausible variations that were technically acceptable but not identical to our preferred wording. Lawyers wanted consistency to reduce review time later. We added a “strictness knob” in the prompt: default to exact preferred language when available; only propose variations when the counterparty pushes. That improved perceived quality overnight.

Theo: What about cross‑functional governance? Who owned what?

GC: We created a small AI CLM Working Group: Legal (me plus our contracts lead), Sales Ops, Security, and RevOps. Legal owned the playbook and exception policy. Sales Ops owned intake and CRM integration. Security owned the vendor assessment and periodic audits. RevOps owned reporting. We met twice weekly during rollout, then weekly post‑launch.

Theo: How did you avoid scope creep?

GC: We kept a “parking lot” of good ideas. For example, people wanted supplier risk scoring and automated third‑party DPIAs in Phase 1. Great ideas—but out of scope for a six‑week sprint. We promised a Phase 2/3 roadmap and shipped the MLP first. That credibility made later phases smoother.

Theo: Can we double‑click on the metadata layer? What fields did you extract, and how did you use them?

GC: For NDAs: effective date, term, jurisdiction, unilateral vs. mutual, and non‑solicit presence. For MSAs: governing law, liability cap structure and amount, insurance requirements, data processing addendum link, termination rights, assignment. For order forms: renewal term, auto‑renewal flag, price caps or uplifts, service levels, and special terms. We used the data to drive alerts—renewal notices 90/60/30 days out, reporting on deviations for QBRs, and Salesforce widgets that showed legal status and risk flags.

Theo: How did you handle counterparty paper, which is always messy?

GC: We pre‑processed PDFs with OCR, normalized headings, and used a clause classifier to map sections to our taxonomy. The AI then did a side‑by‑side with our policy positions. We learned to prefer line‑by‑line diffs rather than paragraph‑level summaries; it made it easier for lawyers to accept/reject. We also added an “unknown” bucket—if confidence was below threshold, the system said, “Not sure; please review.” Honesty beat false precision.

Theo: Did you change the way you store and search contracts?

GC: Yes. We moved from folder‑based storage to a record‑centric model: each agreement is a record with structured metadata, the latest file, and a conversation log. Search became semantic and filterable by metadata. We also implemented document lineage—you can see the birth of a clause from template to final executed language. That’s been valuable for post‑mortems.

Theo: What about outside counsel? Did they use the system?

GC: We didn’t onboard them in Phase 1, but we gave them read‑only access to specific matters via secure links. In Phase 2, we plan to provide them the same playbook‑guided annotation workspace to keep their output aligned with ours.

Theo: Let’s go tactical. What are three things you’d repeat and three you’d change?

GC: Repeat:

Start with a hard date and a narrow scope. It forces trade‑offs.
Make the playbook the product. Most AI success came from clarity, not cleverness.
Daily standups post‑launch. Fixing real user pain quickly builds momentum.

Change:

We’d invest earlier in clause labeling—clean training examples improved extraction more than any prompt trick.
We’d involve finance sooner on order form terms; the uplift logic touched billing in subtle ways.
We’d define a formal feedback taxonomy: bug, model miss, playbook gap, or UX friction. It helped later but would have helped from day one.

Theo: Let’s address the elephant in the room: hallucinations. Did you see any?

GC: Rarely, because we constrained the system. We used retrieval with a tight context window—only our templates, playbook, and the active document. We blocked the model from inventing citations by requiring it to quote the clause ID or abstain. When the model abstained, it triggered a human task. It’s boring, but boring is good in legal.

Theo: How did you think about ethics and transparency with counterparties?

GC: We don’t disclose the tool stack in negotiations, but we are transparent about process. If a counterparty asks why we propose a change, the rationale comes from the playbook—not “the AI said so.” Internally, we log AI involvement for auditability. If a dispute arose, we could show the human decision points.

Theo: Let’s quantify impact. What did your dashboards say after six weeks in production?

GC: For NDAs, median cycle time dropped from 2.4 days to 0.8 days. For order forms, from 7.1 to 5.2 days. Legal touchpoints per contract fell from 3.2 to 2.0 on standard deals. Playbook adherence climbed from 62% baseline to 89%—a huge consistency win. And we hit 96–98% extraction accuracy on renewal dates and jurisdictions. Liability caps improved to 92% after we added formulas handling.

Theo: Any anecdotes from the field that numbers don’t capture?

GC: Two. First, a sales director told me, “I don’t ping legal at 7 p.m. anymore—I can see status and next steps in Salesforce.” That’s culture change. Second, our junior counsel said, “I’m spending time on escalations and novel issues instead of formatting.” That’s career development.

Theo: How did you handle change requests from leadership once they saw the speed gains?

GC: We clarified that speed follows clarity. If leadership wanted faster approvals, we needed tighter definitions of “standard” vs. “non‑standard.” We created deal tiers with explicit thresholds on things like liability caps and SLAs. Tier 1 could auto‑approve; Tier 2 required counsel sign‑off; Tier 3 needed my review. That prevented accidental rule erosion.

Theo: You mentioned a “fast lane.” What qualified?

GC: For customer order forms: no changes to our MSA, standard liability caps, standard DPA, no unusual security asks, and price within the discount band. If all checkboxes were green, the system pushed the draft to DocuSign automatically after business approval, with legal getting a FYI notification. If any box was yellow or red, it routed to counsel.

Theo: Did you automate approvals too?

GC: Yes, with conditionals. For example, if a discount exceeded threshold or if a data residency exception was requested, it triggered finance or security approvals respectively. The key was transparency: approvers saw context—contract snapshot, flagged clauses, and the rationale—so they could approve quickly instead of asking for more info.

Theo: Looking ahead, what’s on your Phase 2/3 roadmap?

GC: Phase 2: expand to SOWs and partner agreements; onboard outside counsel to our annotation workspace; add clause lineage analytics. Phase 3: supplier risk scoring integrated with procurement, and multilingual clause equivalence for EU deals. We’ll also experiment with active learning: when lawyers correct the AI, the system proposes playbook updates with diffs.

Theo: For teams earlier in the journey, what’s the minimum they need in place to start?

GC: A clear playbook, even if it’s a Google Doc; one or two clean templates; an intake form; and a named owner who can make decisions quickly. The rest can be layered in. Don’t wait for perfect data.

Theo: What about companies in regulated industries—financial services, healthcare—does the six‑week timeline still apply?

GC: You’ll add more security review, and your playbook may be stricter, but the cadence holds. The trick is to right‑size the first release: pick lower‑risk contracts and lock the guardrails. Simplicity is your friend.

Theo: Last technical question: did you use one LLM for everything?

GC: No. We used a router: smaller, faster models for classification and extraction; larger models for drafting. The routing was invisible to users. We also kept a deterministic fallback: if the model confidence was low, we defaulted to exact template language or escalated. The aim wasn’t to “use AI”; it was to deliver reliable outcomes.

Theo: Were there any metrics you decided not to chase?

GC: We didn’t chase “AI acceptance rate” as a vanity metric. Sometimes the right call is to ignore the suggestion and escalate. We focused on business outcomes and quality.

Theo: If you could give your six‑weeks‑ago self one piece of advice, what would it be?

GC: Timebox the playbook creation, but invest in examples. Ten high‑quality, annotated examples per clause beat a hundred mediocre ones. And appoint a playbook editor who says “no” to complexity.

Theo: Let’s do a lightning round. Short answers.

GC: Let’s do it.

Theo: Biggest myth about AI in legal?

GC: That it’s either magic or dangerous. It’s a tool. The risk comes from ambiguity.

Theo: Most under‑appreciated enabler?

GC: Clean intake. Garbage in, garbage out.

Theo: Favorite quality‑of‑life improvement since launch?

GC: Clause lineage and rationale citations. Future me can see why we agreed to something.

Theo: One thing you banned?

GC: “Paste the whole contract into a chatbot and see what happens.” We require scoped context.

Theo: Best moment in the rollout?

GC: When a skeptical account executive said, “I thought this would slow me down, and it didn’t.”

Theo: Worst moment?

GC: Day two, the liability cap extractor mis‑read an aggregate vs. per‑incident cap. We caught it, added an example, re‑ran evals, and moved on. The system got better.

Theo: If a GC is listening and thinking, “We don’t have time for this,” what would you say?

GC: You’re already spending the time—just in ad hoc triage. A focused six‑week sprint pays back quickly. Start small, pick measurable wins, and ship.

Theo: Any closing thoughts on leading change as a GC?

GC: Your credibility comes from protecting the business and enabling the business. AI CLM lets you do both if you’re deliberate. Anchor in policy, measure outcomes, and infuse transparency. Finally, celebrate the humans—good tools make teams better.

Theo: GC, this has been fantastic. Thanks for being so candid and practical.

GC: Thanks for having me. Good luck to everyone building their own version of this.

Theo: And that’s a wrap on How We Went Live with AI CLM in Six Weeks. If this was helpful, share it with your sales ops or legal ops partner and steal the playbook. See you next time on The Modern Legal Ops Show.

Appendix (verbal summary referenced on the show):

Six‑Week Plan: (1) Guardrails & discovery, (2) Playbook codification, (3) Template/metadata alignment, (4) Integrations & intake, (5) Pilot & evals, (6) Launch & change management.
Core Guardrails: SSO/least privilege; no retention; audit logs; human‑in‑the‑loop thresholds; “reason tracing.”
Top KPIs: Cycle time, touchpoints, adherence, extraction accuracy.
Phase 2/3 Ideas: SOWs, partner agreements, outside counsel workspace, clause lineage analytics, supplier risk scoring, multilingual equivalence, active learning.

End of transcript.

How We Went Live with AI CLM in 6 Weeks (Anonymized General Counsel Q&A)

Leave a Reply Cancel reply