Methodology

Benchmarked against your own trajectory, not against other users or models.

TVCRpro evaluates a single human-to-AI interaction across eleven dimensions in two suites — five for the human input, six for the AI-generated output — and combines them into a composite score that accounts for token efficiency. Pre-pilot, design-intent.

Prompt engineering is a craft applied before an interaction. TVCRpro is the measurement architecture applied after one. The first asks how to phrase a prompt; the second asks whether the answer was worth the tokens it cost. Different question, different output.

The scoring architecture below is the above-the-line, publishable version. Dimension names are documented in patent application 64/045,951; numerical weights, rubric anchors, and the math of the 90-day trajectory framework are trade-secret and not published — see What is not on this page.

What we have today. What we do not claim yet.

TVCRpro is an early-stage measurement architecture for AI interactions, designed for enterprise governance and coaching. It is not a public ranking of users, teams, or models, and it is not a scientifically validated standard. Honest framing follows.

What we have today

Documented rubrics for all eleven dimensions, named verbatim from the filed patent application.
An analyzer in production at api.tvcrpro.com/score, with a version-locked evaluator and the published scoring shape.
Synthetic validation that the engine behaves as designed across representative interaction types.
An expert-judged seed set used to calibrate the evaluator at each scoring epoch.
Eleven captured pairs on /comparisons showing the same scoring shape applied to real responses from public AI systems.
A four-tier BVS data-source hierarchy, with Tier A reserved for licensee system-of-record linkage.

What we do not claim yet

No published inter-rater reliability metrics.
No peer-reviewed studies linking scores to long-run business outcomes.
No claim that TVCR is an industry standard or scientifically established benchmark.
No longitudinal outcome data from production pilots yet.
No claim that any individual score is portable across other users, other teams, or other models as a relative ranking.

The next phase is pilot engagements that link Tier A outcomes to scores and produce the longitudinal data the architecture is designed to support.

The composite — TVCR

TVCR is a dimensionless ratio expressing value per token. It is composed of three components:

PQS — Prompt Quality Score. Evaluates the quality of human-authored input to an AI system across five trainable competencies. Scored 0.0–1.0 per dimension; PQS is a weighted average across all five.
BVS — Business Value Score. Evaluates the business value generated by AI output across up to six outcome-anchored dimensions. Scored 0.0–1.0 per dimension; BVS is a weighted average across applicable dimensions only — see the applicability matrix below.
Token efficiency normalization. TVCR is normalized by token consumption such that a high-quality result achieved with fewer tokens scores higher than the same result achieved with more tokens.

For display, the TVCR composite is mapped to a 0–100 scale and placed into one of five buckets. The bucket label is always shown alongside the color so meaning never depends on color alone.

Severe Emerging Partial Competent Optimal

PQS — five trainable prompt-quality dimensions

PQS evaluates the human-authored input to an AI system. The five dimensions below are filed names, used verbatim everywhere on the site, in the Analyzer, and in the API. No marketing aliases, no shortened forms outside space-constrained UI badges.

#	Dimension	Abbr.	Trainable competency
1	Specificity	`SP`	Task precision and clarity of intent.
2	Contextual Completeness	`CC`	Context-setting and information packaging.
3	Strategic Framing	`SF`	Business-decision orientation.
4	Constraint Definition	`CD`	Output boundary specification and token efficiency.
5	Iteration Efficiency	`IE`	Productive use of multi-turn context.

PQS calculation (¶0049): weighted average of all five criterion scores. Default weighting is equal (each = 0.20).

BVS — six outcome-anchored business-value dimensions

BVS evaluates the AI-generated output, anchored to observable business outcomes. The dimensions below are filed names.

#	Dimension	Abbr.	Anchored to
1	Decision Advancement	`DA`	Specific, verifiable business decisions in systems of record.
2	Information Quality	`IQ`	Accuracy, relevance, and actionability of output.
3	Action Generation	`AG`	Concrete assignable action items tracked to completion.
4	Efficiency Gain	`EG`	Measurable time / cost reduction vs. historical baselines.
5	Organizational Value	`OV`	Named OKRs, KPIs, or strategic objectives.
6	Reusability	`RU`	Repeated application across similar future tasks.

BVS calculation (¶0060): BVS = Σ(vₙ × criterion_score_n) / Σ(vₙ) for applicable criteria only. Default weighting is equal among applicable criteria (each = 1.0 / count_applicable).

BVS Data Sources — the four-tier hierarchy (¶0062)

The patent specifies four data sources for Business Value Score, in declared order of confidence (¶0062):

(a) Downstream Outcome Integration. Where available, BVS is anchored to system-of-record data — ERP purchase orders, CRM revenue events, scheduling systems, document management systems, HR information systems. Gold standard. Empirically confirmed, not estimated.
(b) Automated Assessment (calibrated proxy). An LLM-based evaluator scores against structured rubrics calibrated to human expert benchmarks. Confidence levels accompany every score; low-confidence scores route to human review.
(c) User Self-Report. At interaction time or shortly after, the user rates the output against simplified value dimensions. Used to calibrate automated scores and as a validation signal.
(d) Expert Review. For high-value or high-stakes interactions, domain experts assign BVS scores directly. Expert-scored interactions serve as calibration ground truth for automated scoring.

Where TVCRpro currently sits. The public Light Analyzer at /analyzer implements tier (b) Automated Assessment only. Tiers (a), (c), and (d) are reserved for licensed Full-tier deployments where the integrations and review workflows exist to support them. The optional Stakes, Deliverable type, and Estimated outcome fields on /analyzer are narrative-only context — they do not count as tier (c) self-report and do not affect the BVS calculation.

From inference to measurement: the Light Analyzer scores prompt structure and infers likely outcome quality. The shift from inference to measurement happens when an interaction can be linked to a tangible deliverable — an RFI, a decision memo, a contract, a schedule change, a code artifact — and to the system of record that captures that deliverable's outcome. That linkage is what tier (a) Downstream Outcome Integration provides. It is why the framework treats outcome attribution as a separate, higher-confidence tier rather than as a self-report field on the public surface.

Dynamic applicability — AI conversations

Not all six BVS criteria apply to every interaction. The Interaction Context Category determines applicability, per filed spec ¶0061.

Interaction Context Category	Applicable BVS criteria
Document drafting	IQ, AG, EG, RU
Data analysis	DA, IQ, EG, OV
Decision support	DA, IQ, AG, OV
Customer communication	IQ, AG, EG, OV
Code generation	IQ, EG, RU
Research synthesis	IQ, EG, OV, RU
Strategic planning	DA, IQ, AG, OV, RU
Risk assessment	DA, IQ, OV
General / uncategorized	All six (DA, IQ, AG, EG, OV, RU) — equal weights

Per ¶0061: “This mapping is configurable; organizations may customize criteria selection for their specific use case taxonomies.”

Meeting Intelligence — applicability for human conversations

The same selection mechanism applies to human-only conversation analysis under filed spec ¶0098. When the Analyzer’s Interaction Type is set to a meeting transcript, PQS dimensions are reinterpreted as question quality in human conversation, and BVS dimensions are reinterpreted as conversation outcome quality.

Each Meeting Context Category applies a high / medium / low weighting profile across the eleven dimensions. The pattern is publishable; the production numerical multipliers used in the Full-tier engine are calibrated below the line and remain trade-secret.

Meeting Context	SP	CC	SF	CD	IE	DA	IQ	AG	EG	OV	RU
Decision-making meeting	M	M	H	M	M	H	H	M	M	M	L
Information sharing	M	H	L	M	M	L	H	L	M	M	M
Problem-solving session	H	H	M	M	H	M	H	H	M	L	L
Strategic planning	M	H	H	M	M	H	M	M	L	H	M
Stakeholder alignment	M	H	H	H	M	H	M	H	L	H	L
General / uncategorized	M	M	M	M	M	M	M	M	M	M	M

Legend: H = high weight (1.5×) · M = medium weight (1.0×, default) · L = low weight (0.5×)

The trajectory framework

A 90-day trajectory framework projects how a user’s TVCR score evolves over time when coaching feedback is applied. The framework is published as a concept. The specific math, projection coefficients, and habit-recommendation mappings are not published.

The coaching feedback loop

The product is designed to surface specific, actionable coaching feedback derived from the score. The feedback intent is publishable. The internal mapping from score patterns to specific recommendations is not.

How the analyzer works

The analyzer at /analyzer is a two-stage process. In the first stage, a rubric agent reads the operator-pasted prompt-and-response pair and applies the eleven dimensions exactly as published above. The agent is a general-purpose reasoning model run by an inference vendor — currently Anthropic, listed on the security page — and it returns a structured score for each of the eleven dimensions along with a one-sentence coaching anchor on the lowest-scoring dimension. Nothing in this stage is proprietary; the rubric is the same one any reader can study above.

In the second stage, a TVCRpro-controlled server combines the eleven scores into the composite Token Value Conversion Ratio. This is where the trade-secret weighting lives. The composite is not a simple average; it reflects the relative importance of each dimension for the operator's interaction context, the token-cost normalization that makes long and short interactions comparable, and the bucketing that translates a raw composite into a plain-language band. The weights, the normalization curve, and the bucket thresholds are covered by USPTO Application 64/045,951 and never leave the TVCRpro environment. They are never sent to the inference vendor; they are never shipped to the operator's browser.

The split is deliberate. The publishable rubric is what makes the analyzer auditable — anyone can read the eleven dimensions and judge whether a given score is fair. The protected weighting is what makes the composite meaningful — it is the calibration that turns eleven independent signals into a single ratio that tracks actual outcomes. Together they let the analyzer be transparent about what it measures while keeping protected how the measurements combine.

Evaluator controls

A general-purpose reasoning model is used to apply the published rubric. That introduces real, well-known risks: evaluator drift across vendor model updates, prompt sensitivity in rubric phrasing, scoring instability between runs, and the possibility of rubric gaming. The architecture addresses each one procedurally rather than ignoring it.

Evaluator versions are locked per scoring epoch. Within an epoch, every score is produced by the same evaluator configuration so trajectories are comparable to themselves over time.
Recalibration on vendor model change. When the inference vendor releases a new model, the evaluator is re-anchored against an expert-judged seed set before scores in the new epoch are issued.
Confidence is logged. Each score carries an evaluator-confidence signal. Low-confidence scores in Full-tier deployments route to human review rather than being treated as final.
Rubric phrasing is versioned. The published rubric on this page is the same one the evaluator receives; any change to evaluator instructions is a versioned change to this page.

These are procedural controls, not statistical guarantees. Inter-rater reliability metrics, longitudinal outcome correlations, and benchmarking data are the gap pilots are designed to close — see What we have today / What we do not claim yet.

What is not on this page

The numerical weights applied to each dimension within each rubric.
The rubric anchors that define a 70 vs. an 80 in any specific dimension.
The token-normalization constants in the TVCR formula.
The math of the 90-day trajectory projection.
The score-to-recommendation mapping logic.
The synthetic-validation dataset itself.

These are deliberately withheld as trade-secret. The patent reference and the published architecture above are the full extent of what is publicly disclosed at v31 launch. Pre-pilot, design-intent.

Verify

The eleven dimensions and the applicability matrices on this page are filed in the patent application and are independently verifiable.

USPTO Patent Center — Application 64/045,951