Phantom Radiologists Part Two: The Bounding-Box, Pixel-Segmentation, and Pathology-Confirmation Math That Makes the Whole Veterinary AI Training-Set Claim Structurally Impossible
Part One of this investigation calculated the labor required to apply image-level categorical labels to the training corpora claimed by SignalPET, Vetology, and Antech RapidRead at the Stanford CheXNeXt rate of 34.3 seconds per image — the simplest possible AI training task. The math at that simplest step did not work for the larger claims. This article applies the published bounding-box and pixel-segmentation rates from the human medical imaging literature to the same vendor claims, and adds three structural infrastructure questions Part One did not address: the absence of subspecialty fellowship training in veterinary radiology, the scarcity of pathology-confirmed ground-truth datasets, and the breed-specific anatomic variation that prevents direct application of human chest x-ray training methodology to veterinary subjects. Part Three closes the series by examining the validation-statistics evidence base on commercial veterinary AI versus FDA-cleared human radiology AI, and the corporate-consolidation revenue model that explains why the validation gap exists. The conclusion of Part Two: the foundational claim is not just unlikely. It is structurally impossible at the scales the marketing presents.
Part 1: The Labeling Step — Image-Level Classification at Stanford CheXNeXt’s Documented 34.3 Seconds Per Image. The simplest annotation task, the most charitable possible math.
Part 2 (this article): The Annotation Steps That Actually Build the Product — Bounding-Box Localization, Pixel Segmentation, and Pathology Correlation. Plus three structural infrastructure questions Part One did not address: no veterinary subspecialty fellowship pathway, no pathology-confirmed dataset at scale, breed-specific anatomic variation.
Part 3: Validation Statistics and Revenue Model — What FDA-cleared human radiology AI is required to demonstrate, what commercial veterinary AI actually demonstrates (Joslyn 2025 commentary, Ma 2026 JAVMA pilot study), and the Mars-Antech-VCA-BluePearl corporate consolidation that explains why the validation gap exists. The vendor/provider separation that aligns incentives on the human side, and the conflict of interest and anticompetitive tying that replace it on the veterinary side.
What Part One Established, and What Part Two Adds
Part One of this investigation calculated, against published per-image annotation rates, the radiologist-years required to label the training corpora claimed by three U.S. veterinary AI radiology vendors at the simplest annotation step: image-level categorical classification, the application of pre-defined yes/no flags to pre-defined pathology categories. The benchmark rate, drawn from the Stanford CheXNeXt paper published in PLOS Medicine in 2018, was 34.3 seconds per image at the average. The math at this simplest step produced these conclusions: Vetology’s 300,000-case claim was plausible at the borderline of a small specialist team’s multi-year output; SignalPET’s 2-million-image claim strained the available specialist labor pool but was conceivable with sustained contribution from the entire active North American board-certified veterinary radiologist workforce; and Antech RapidRead’s 16-million-image claim could not be reconciled with board-certified specialist labeling within the documented 600 to 700 active diagnostic imaging Diplomate population, at any rate within Stanford’s published range of 25.7 to 42.9 seconds per image, even at single-pass labeling without independent annotators.
The math at that simplest annotation step was conservative by design. It used the most charitable possible labeling rate (Stanford’s 34.3 seconds per image), the most charitable possible productivity assumption (a full 2,080-hour year of dedicated full-time labeling per radiologist), the most charitable possible workforce figure (the upper bound of available DI Diplomates), and the most charitable possible labeling complexity (image-level categorical classification only, with no localization, no measurement, and no pathology correlation). And even at every charitable input, the larger vendor claim could not be reconciled with the documented specialty workforce.
Part Two demonstrates that the simplest annotation step is not what actually builds a commercial AI radiology product. The capabilities marketed by SignalPET, Vetology, and Antech RapidRead — finding localization on the radiograph, measurement output, multi-pathology classification, breed-specific interpretation — require additional annotation work that Part One’s math did not include. Each of those additional annotation steps takes substantially more radiologist time per image than the categorical labeling step. When the labor required for those additional steps is calculated against the same vendor training-corpus claims and the same workforce constraints, the figures expand by an order of magnitude or more. The Antech 16-million-image claim, which already could not be reconciled with the available workforce at the simplest step, becomes structurally impossible at the bounding-box step alone — and impossible by greater margins at the segmentation step.
The arithmetic alone is sufficient to establish the impossibility. But the structural infrastructure questions this article raises are, if anything, more damaging to the foundational claim than the labor math. Three of those questions are addressed in detail here. First, the absence of subspecialty fellowship training in veterinary radiology — there is no veterinary equivalent of a thoracic radiology fellow, no veterinary equivalent of a fellowship-trained abdominal radiologist, no veterinary equivalent of the subspecialty expertise that human radiology AI training relies on for high-quality annotation. Second, the absence of pathology-confirmed ground-truth datasets at the scale that credible AI validation requires; veterinary cases that proceed to necropsy or surgical confirmation are a small minority of the cumulative imaging volume the vendors claim to have used. Third, the breed-specific anatomic variation that human chest x-ray AI training does not have to address, but that veterinary AI training cannot ignore — and cannot solve at scale with the specialty workforce and pathology-confirmation infrastructure available.
The cumulative implication, walked through section by section in what follows, is that the foundational claim of large specialist-labeled veterinary AI training corpora is not merely difficult to reconcile with the available evidence. It is structurally impossible at the scales the marketing presents. The four reconciliation paths identified in Part One — NLP extraction from existing reports, non-specialist human labeling, AI-generated pseudo-labels, and inflated headline numbers — are the only logically available explanations for how the published vendor figures could exist. None of them is what the marketing implies. None of them has been disclosed by the vendors. The CLAIM checklist exists. The 2025 ACVR/ECVDI position statement called for exactly this kind of disclosure. The vendors have not produced it. This article quantifies why the disclosure cannot reasonably be deferred any longer. Part Three documents what happened in the absence of the disclosure: a category of commercial products that operates without the validation infrastructure human medicine has built, monetized through a corporate-consolidation revenue model that captures specialist labor cost as internal margin while displacing independent veterinary teleradiology providers.
The Bounding-Box Math: An Order-of-Magnitude Increase in Required Labor
Bounding-box annotation is the AI training task that produces the localization capability essentially every commercial veterinary AI radiology product is marketed as having. When SignalPET’s SignalSTAT product or Antech RapidRead’s report displays to the referring veterinarian a marked region on the radiograph where the AI has identified a finding — the cardiomegaly, the alveolar pattern, the foreign body, the suspected mass — the AI produced that marked region because it was trained on images where board-certified radiologists drew rectangles around the same kinds of findings during the training phase. Without bounding-box training data, the AI cannot produce localization outputs. With bounding-box training data, the AI’s localization performance is bounded by the quality and quantity of the bounding-box annotations it was trained against. There is no shortcut.
The most authoritative published per-image bounding-box annotation rate for radiology AI training comes from a peer-reviewed paper in Radiology: Artificial Intelligence (PMC8017380, published in 2021). The paper studied radiologist annotation of 294 coronary CT angiography studies, with 1,843 individual arteries and branches annotated by radiologists for atherosclerotic plaque. The rate documented: 15.2 studies per radiologist per day, with median speed of 6.08 minutes per study (interquartile range 2.8 to 10.6 minutes per study) and 73 seconds per individually annotated vessel. This is for radiologists working on a structured platform, with appropriate annotation tooling, on a task they had been trained for and were experienced with.
The 6.08-minute median is the per-study figure. For per-image bounding-box annotation in chest or abdominal radiograph training, where a single image may contain multiple findings to localize, the realistic per-image rate is at minimum the per-study rate, and potentially several multiples higher depending on how many findings are present and the complexity of localization required. The 73-second-per-vessel figure is closer to the per-finding rate for a single discrete annotation, but most diagnostic radiographs contain more than a single finding to annotate during training.
For the purposes of this article’s math, the 6.08-minute median per-study rate from the published literature is the conservative anchor. Applying it to the three vendor training-corpus claims produces the following sensitivity table:
| Vendor / Claim | Bounding-box rate per study | Total rad-hours required | FTE rad-years (1 rad) | 3-annotator standard rad-years |
|---|---|---|---|---|
| Vetology 300,000 cases |
6.08 min (median) 10.6 min (slow IQR) |
30,400 hrs 53,000 hrs |
14.6 yrs 25.5 yrs |
43.8 yrs 76.4 yrs |
| SignalPET 2,000,000 images |
6.08 min (median) 10.6 min (slow IQR) |
202,667 hrs 353,333 hrs |
97.4 yrs 169.9 yrs |
292.3 yrs 509.6 yrs |
| Antech RapidRead 16,000,000 images |
6.08 min (median) 10.6 min (slow IQR) |
1,621,333 hrs 2,826,667 hrs |
779.5 yrs 1,358.9 yrs |
2,338.5 yrs 4,076.7 yrs |
The numbers warrant pausing on. At the published median bounding-box rate from peer-reviewed AI training literature, the labor required to produce Antech RapidRead’s claimed 16-million-image training corpus is approximately 779 radiologist-years of dedicated full-time bounding-box annotation work. At the slow end of the published interquartile range, the figure rises to 1,359 radiologist-years. At the human-side AI standard of three independent annotators per training image — the methodological minimum for credible AI training data — the figures triple to approximately 2,338 to 4,077 radiologist-years.
A board-certified veterinary radiologist working a 30-year clinical career produces, by definition, 30 radiologist-years of professional output. The Antech 16-million-image bounding-box labor figure at the median rate, three-annotator standard, is approximately 78 full radiologist careers — exceeding the cumulative career output of every active North American diagnostic imaging Diplomate working their entire careers on nothing but bounding-box annotation. The North American DI Diplomate workforce, which Part One documented at approximately 600 to 700 active members, could not have produced this volume of work in its entire collective history, even before the existence of AI as a commercial category. The math is not a close call. It is not a math problem with a defensible reconciliation path through specialist labeling at any scale.
At the published median bounding-box annotation rate of 6.08 minutes per study, three independent annotators per image as the human-side AI standard, the labor required to produce Antech RapidRead’s claimed 16-million-image training corpus is approximately 2,338 radiologist-years.
The North American board-certified veterinary radiologist workforce comprises approximately 600 to 700 active diagnostic imaging Diplomates. Their cumulative career output, working their entire careers exclusively on bounding-box annotation, would be roughly 18,000 to 21,000 radiologist-years across all of veterinary radiology’s history as a specialty. The Antech bounding-box claim at the three-annotator standard would consume approximately 11% to 13% of the entire cumulative career output of every veterinary radiologist who has ever practiced — for one product, for one annotation step, for one company. The figure is not difficult. It is structurally impossible.
The Segmentation Math: Multiplying the Already-Impossible
Pixel-level segmentation is the AI training task that produces the measurement and shape-characterization capabilities marketed in commercial veterinary AI products. Vertebral heart score calculation, lung field volume estimation, mass-margin characterization, body-region-specific quantitative outputs — every one of these capabilities requires the AI to have been trained on images where the relevant anatomic structure or lesion was outlined at the pixel level. Bounding boxes, which mark a finding with a rectangle, are not sufficient for measurement or shape work. The training data must include actual pixel-level boundary annotations.
The peer-reviewed literature on segmentation annotation rates is unambiguous about the labor required. The widely cited Menze et al. BRATS dataset documentation (IEEE Transactions on Medical Imaging, 2014) reports that “a radiologist usually takes about 60 minutes to manually segment brain tumors per patient in its multi-sequence MRI volumes.” The 2024 comprehensive survey on deep active learning in medical image analysis, published in Medical Image Analysis, summarizes the broader literature on segmentation annotation effort: “depending on the complexity of the regions of interest to segment and the local anatomical structures, minutes to hours may be required to annotate one image.” The same survey documents that the median U.S. radiologist hourly rate is $219, providing a defensible per-hour cost anchor for the labor-economics analysis. The 2021 Nature Communications paper on annotation-efficient deep learning for medical image segmentation states: “More than three domain experts are typically needed to generate trustworthy annotations” for pixel-level segmentation work, multiplying the per-image time by the number of independent annotators required.
For the purposes of this article’s math, a conservative per-image segmentation rate of 30 minutes per image is used. This figure is well below the 60-minute per-patient brain tumor MRI segmentation rate documented by Menze, but reflects the realistic effort required for moderate-complexity radiograph segmentation work where anatomic structures and findings are outlined at the pixel level. For the simpler fraction of the corpus that requires segmentation work, a lower rate would apply; for the more complex fraction, a higher rate would apply. The 30-minute figure is a reasonable middle estimate for the kind of segmentation work commercial veterinary AI products require.
The next assumption matters: what fraction of a vendor’s training corpus actually requires segmentation-level annotation? The answer depends on what capabilities the product delivers. Image-level classification capability requires only categorical labeling. Localization capability requires bounding boxes. Measurement capability — vertebral heart score, lung field volume, organ size measurements — requires pixel-level segmentation. Shape characterization capability requires segmentation. Most commercial veterinary AI radiology products deliver some combination of these features, with measurement and shape outputs being marketed prominently. A reasonable conservative estimate is that 10% of a credible commercial AI training corpus would require segmentation-level annotation work, with the other 90% receiving classification and bounding-box annotation only.
Applying these inputs — 10% of the corpus segmented at 30 minutes per image — produces the following:
| Vendor / Claim | Segmented subset (10%) | Total segmentation rad-hours | FTE rad-years (1 rad) | 3-annotator standard rad-years |
|---|---|---|---|---|
| Vetology 300,000 cases |
30,000 segmented images | 15,000 hrs | 7.2 yrs | 21.6 yrs |
| SignalPET 2,000,000 images |
200,000 segmented images | 100,000 hrs | 48.1 yrs | 144.2 yrs |
| Antech RapidRead 16,000,000 images |
1,600,000 segmented images | 800,000 hrs | 384.6 yrs | 1,153.8 yrs |
The segmentation labor figure adds to the bounding-box labor figure rather than replacing it. The same training corpus requires both annotation types — categorical labeling for the entire corpus (Part One’s calculation), bounding-box annotation for the localization-relevant fraction (this article’s first calculation), and segmentation annotation for the measurement-relevant fraction (this article’s second calculation). The total labor figure across all three annotation steps, applied to Antech RapidRead’s 16-million-image claim at the three-annotator standard, sums to approximately 220 + 2,338 + 1,154 = 3,712 radiologist-years of dedicated specialist labeling work.
3,712 radiologist-years. Distributed across the entire active North American DI Diplomate workforce of 700 specialists, with every Diplomate working full-time exclusively on AI labeling and nothing else, the work would take approximately 5.3 years to complete. The North American DI Diplomate workforce is not, however, available to do that work; the same workforce has been documented by ACVR leadership as already inadequate to meet existing clinical demand. The cost of the labor, at $219 per hour median U.S. radiologist hourly rate per the 2024 Medical Image Analysis survey, would be approximately $1.69 billion. Mars Petcare’s annual revenue is large, but it is not a number that suggests a single AI product within Antech absorbed 1.69 billion dollars of specialist labor without disclosure in any public filing or marketing material.
Either the labor was performed and the expenditure occurred without public documentation that any reasonable due-diligence search would have surfaced, or the labor was not performed in this way and one or more of the four reconciliation paths from Part One — NLP extraction, non-specialist labeling, AI-generated pseudo-labels, or inflated headline numbers — accounts for the gap between the marketing claim and the documented specialty workforce capacity. Part Three documents how the same corporate-consolidation structure that owns Antech makes both the implausible-disclosure scenario and the unmet-disclosure scenario commercially viable: in either case, the marketing claim continues to drive AI subscription adoption that captures displaced specialist labor cost as internal margin within the Mars Petcare vertical-integration ecosystem.
The First Structural Question: There Is No Veterinary Equivalent of a Fellowship-Trained Radiologist
Even if the labor figures somehow reconciled, the human-side AI training infrastructure that the veterinary vendors implicitly invoke when they cite “specialist-reviewed” training data does not exist in veterinary medicine. The most consequential difference, which the math sections above only gestured at, is the absence of subspecialty fellowship training in veterinary radiology.
In human medicine, after a board-certified general radiologist completes a four- or five-year diagnostic radiology residency and passes the American Board of Radiology certification examination, they may pursue an additional one- to two-year subspecialty fellowship in a specific area of radiology. The principal subspecialty fellowship pathways in human radiology include cardiothoracic radiology, abdominal/body imaging, musculoskeletal radiology, neuroradiology, breast imaging, pediatric radiology, interventional radiology, nuclear medicine, and emergency radiology. A human radiologist who has completed a thoracic radiology fellowship has spent an additional twelve to twenty-four months focused exclusively on the chest — interpreting chest radiographs, chest CT scans, lung biopsies, pulmonary embolism studies, mediastinal mass workups, post-transplant imaging — under the supervision of fellowship-trained mentors at a high-volume thoracic imaging service.
This subspecialty training matters for AI development because it produces specialists who can reliably make subtle distinctions in their area of focus that even excellent generalist radiologists do not always make. The Stanford CheXNeXt validation set was labeled by cardiothoracic specialist radiologists with an average of twelve years of experience after fellowship completion. The 2020 RSNA-STR Pulmonary Embolism Challenge dataset was labeled by more than 80 expert thoracic radiologists, with documented adjudication procedures. The 2024 FDA-cleared chest x-ray AI training set was labeled by 17 board-certified radiologists with a median 14 years of experience, many of them fellowship-trained in thoracic imaging. When human radiology AI training reaches for the highest-quality categorical or bounding-box labels, it reaches for fellowship-trained subspecialists who have devoted their careers to a single body region.
Veterinary radiology has no such pathway. The American College of Veterinary Radiology’s residency training essentials, published on the ACVR’s own website at acvr.org, specify that a veterinary radiology resident is to receive training “in all of the subspecialty areas of veterinary radiology” across small and large animal diagnostic radiology, diagnostic ultrasound, nuclear medicine, computed tomography, and magnetic resonance imaging — across multiple species and modalities — in a single three-year program. The ACVR’s own description of the residency training requirements states that the resident is expected to be involved in the interpretation of approximately 4,000 radiographic studies, 1,000 abdominal ultrasound studies, and 500 CT and/or MRI studies during the entire program. The training is, by certification design, broad rather than deep. Every ACVR Diplomate emerges from residency as a generalist across all body regions, all imaging modalities, and the species commonly encountered in veterinary medicine.
There is no veterinary thoracic radiology fellowship. There is no veterinary abdominal radiology fellowship. There is no veterinary musculoskeletal radiology fellowship. There is no veterinary subspecialty pathway equivalent to what human radiology has built over the last several decades to support deep expertise development in specific body regions. Equine Diagnostic Imaging is a recognized ACVR subspecialty pathway, but it is a species-specific specialization, not a body-region subspecialty. There is no body-region fellowship infrastructure of the kind human radiology AI training relies upon.
The implication for AI training is direct. When SignalPET, Vetology, or Antech states that its training corpus was “reviewed by veterinary radiologists” or “specialist-reviewed,” the specialist label that the marketing invokes is, by ACVR design, a generalist label. The reviewer is a board-certified veterinary radiologist, but the reviewer is not a fellowship-trained thoracic specialist when reviewing thoracic radiographs, not a fellowship-trained abdominal specialist when reviewing abdominal radiographs, not a fellowship-trained musculoskeletal specialist when reviewing orthopedic radiographs. The veterinary specialty workforce does not produce that level of subspecialty expertise because the training pathway does not produce it.
This is not a deficiency of any individual veterinary radiologist; ACVR Diplomates are well-trained, highly capable specialists who provide excellent clinical interpretation across the breadth of veterinary radiology. It is, however, a structural difference from the human radiology AI training infrastructure. Human chest x-ray AI is trained on labels produced by radiologists who have spent careers in thoracic imaging. Veterinary chest x-ray AI, when it is trained on specialist labels, is trained on labels produced by generalists whose case volume across thoracic imaging is necessarily smaller than that of dedicated thoracic specialists, because they are also reading abdominal, orthopedic, ultrasound, CT, and MRI studies as a matter of routine practice. The expertise asymmetry between the human and veterinary cases is not subtle. It is a structural infrastructure gap that no amount of labor can close.
The Second Structural Question: Pathology-Confirmed Ground Truth Is Scarce in Veterinary Medicine
The second structural infrastructure gap is the limited availability of pathology-confirmed ground truth in veterinary medicine. In human radiology AI training, the gold-standard reference for whether a finding is actually present is independent confirmation by tissue pathology, surgical findings, or autopsy. Radiologist consensus is a fallback used when better confirmation is not available. The Stanford CheXNeXt validation set used radiologist consensus, with the noted limitation; the 2020 RSNA-STR Pulmonary Embolism Challenge dataset used a combination of radiologist consensus and outcome confirmation; multiple published human-side training datasets explicitly disclose what fraction of their corpus has pathology-confirmed reference standards versus radiologist-consensus standards.
In veterinary medicine, pathology-confirmed datasets at the scale that AI training requires fundamentally do not exist. The reasons are practical: most companion-animal cases that go through general practice never receive necropsy. Owners who lose a pet to disease typically do not request a postmortem examination, and the cost and logistics of obtaining one make it a rare occurrence outside teaching institutions and referral hospitals. Tissue biopsy correlation is more available than necropsy but is performed only on a fraction of cases that proceed to advanced workup. Surgical confirmation is more available still but applies only to cases where the underlying pathology was surgical.
The 2023 paper by Cohen, Fischetti, and Daverio published in Veterinary Radiology and Ultrasound studied veterinary radiologist error rates against necropsy findings at the Animal Medical Center in New York. The paper is one of the most rigorously designed pathology-correlation studies in the veterinary radiology literature, and what it documents is informative for the question of pathology-confirmed dataset availability: even at the Animal Medical Center, one of the largest and most specialized veterinary referral hospitals in the United States, the available pathology-confirmed dataset that could be assembled for the study was a fraction of the cumulative imaging volume the institution had handled. Building a pathology-confirmed veterinary radiograph training set of 100,000 images is logistically harder than building a radiologist-consensus-labeled set of 2 million. The cumulative pool of pathology-confirmed veterinary radiograph cases available to AI vendors as of 2026 is, by every available estimate, smaller by orders of magnitude than what the larger commercial training-corpus claims would require if pathology-confirmed ground truth had been used for those claims.
The implication is direct. Any vendor whose accuracy claims rest on pathology-confirmed validation has either used a much smaller corpus than the headline figure suggests, or the validation does not rest on pathology confirmation at all. The 2025 ACVR/ECVDI position statement on AI explicitly identifies pathology correlation as one element of the standards that no current commercial veterinary AI radiology product meets. The AI training data uses radiologist consensus as the reference standard because pathology confirmation at the required scale is not available. This means the AI inherits the accuracy ceiling of the radiologists who labeled its training data. As discussed in this publication’s companion analysis of the engineering rigor gap, this is the inherent ceiling problem: an AI trained against radiologist consensus cannot exceed the accuracy of the radiologists whose labels constitute its training reference standard. Part Three documents what this ceiling has produced in practice: the Ma et al. 2026 pilot study, the first peer-reviewed external validation of multiple commercial veterinary AI services against pathology-confirmed canine abdominal cases, found sensitivity ranging from 71% to 90% with documented “deficiencies in interpretation” — figures that, on the human side, would not pass FDA review for the equivalent product category.
The Third Structural Question: Breed-Specific Anatomic Variation
The third structural infrastructure gap concerns the breed-specific anatomic variation that veterinary AI must handle but human chest x-ray AI does not face. Human chest radiographs vary across the human population — sex, age, body habitus, prior surgical history, and ethnic anatomic differences all matter — but the variation is contained within a single biological species with a relatively conserved anatomy. Canine and feline radiographs face a different and considerably broader variation problem. The diversity of breed phenotypes in companion animal practice is dramatic and consequential for radiograph interpretation.
A bulldog’s thorax, with its broad, shallow chest, characteristic mediastinal fat pads, and brachycephalic vertebral pattern, is genuinely anatomically different from a greyhound’s deep, narrow chest with the elongated skeletal proportions characteristic of sighthounds. A chondrodystrophic dachshund’s spine is anatomically different from a German shepherd’s. A bullmastiff’s abdominal silhouette is different from a Shetland sheepdog’s. A Persian cat’s skull, with its brachycephalic conformation, is different from a Maine Coon’s. These differences are not subtle, are not artifacts of imaging technique, and are not random noise that can be averaged across a large dataset. They reflect genuine breed-level anatomic variation that affects what a normal radiograph looks like and where the boundaries between normal and abnormal lie for each breed phenotype.
The implication for AI training is twofold. First, the training corpus must include adequate representation of each major breed phenotype, with normal and abnormal examples from each breed labeled by specialists who understand the breed-specific normal variants. A vertebral heart score model trained primarily on labrador retrievers will not perform reliably on bulldogs without breed-specific training data. A thoracic mass classifier trained primarily on Siamese cats will not perform reliably on Persians without breed-specific training data. The training set has to span the breed spectrum to support the AI’s marketed capability of working across the breed spectrum. Second, the labeling itself must be breed-aware. A specialist applying labels to a bulldog’s chest radiograph must distinguish breed-normal mediastinal fat pads from pathologic mediastinal mass, distinguish breed-normal cardiac silhouette from cardiomegaly, distinguish breed-normal vertebral patterns from spinal pathology. The same specialist applying labels to a greyhound’s chest radiograph must apply different normal-variant standards. Cross-breed mislabeling — applying labrador-normal standards to a bulldog or greyhound-normal standards to a chondrodystrophic dog — produces training data that systematically biases the resulting AI.
Combine the breed-variation requirement with the pathology-confirmation scarcity, and the cumulative pool of pathology-confirmed, breed-balanced, specialist-labeled veterinary radiographs that would be required to train an AI product to the standard human radiology AI is held to is, by every available estimate, far smaller than what the larger commercial vendor training-corpus claims would require. This is a structural infrastructure gap, not a labor gap. Even unlimited labor could not produce a breed-balanced pathology-confirmed dataset at the required scale, because the underlying clinical material — pathology-confirmed cases of every relevant pathology across every relevant breed — does not exist in the cumulative veterinary clinical record at the volumes claimed.
How Many Abnormal and Normal Cases Were Actually Used? The CheXNet Reference Point
One of the most useful comparisons that puts the veterinary AI claims in perspective is the size of the actual training datasets used to develop credible human chest x-ray AI products. The Stanford CheXNet model, the foundational benchmark in the field, was trained on the NIH ChestX-ray14 dataset of 112,120 frontal-view chest radiographs from 30,805 unique patients, labeled with 14 thoracic pathology categories using NLP extraction from existing radiology reports. The validation set of 420 radiographs was specifically curated to contain at least 50 cases of each of the 14 pathologies and was labeled by board-certified specialists.
The CheXNet training corpus is 112,120 images. The validation set is 420 specialist-labeled images. The published model produced AUC-ROC results that were competitive with practicing radiologists on multiple pathologies. This was sufficient training data for a credible human chest x-ray AI product, given that the labels were extracted from existing radiology reports rather than freshly applied by specialists.
Antech RapidRead’s claimed 16-million-image training corpus is approximately 143 times larger than ChestX-ray14. SignalPET’s 2-million-image claim is approximately 18 times larger. Vetology’s 300,000-case claim is approximately 2.7 times larger than ChestX-ray14. The question that the math, the workforce constraints, and the structural infrastructure gaps in this article all converge on is whether veterinary AI training actually required corpora at the scale claimed, or whether the scale claims are themselves the marketing artifact that the math cannot support. The answer the available evidence supports is that scale claims of 2 million and 16 million radiologist-labeled veterinary radiographs cannot have been generated by board-certified specialists in the way the marketing implies, and that the actual training corpora — either smaller, or generated by different methods than the marketing implies, or some combination of the two — are what produced the products that exist today.
None of this is necessarily disqualifying for the products themselves. AI radiology products built on smaller, well-curated specialist-labeled datasets can perform well within their proper scope. AI radiology products built on larger NLP-extracted or auto-labeled datasets can also perform well, with appropriate methodological documentation. The problem is not the products’ existence or their underlying utility within properly defined clinical scope. The problem is the gap between what the marketing claims and what the math, the workforce, and the upstream data infrastructure could have produced — a gap the vendors have not addressed at the level the human-side AI publication standards would require. Part Three documents why the gap has not been addressed: the corporate-consolidation revenue model that has produced the marketing claims financially favors continuing the gap rather than closing it, because the AI subscription captures displaced specialist labor cost as internal margin within the same vertically integrated corporate parent that produces the AI.
Lots of Questions a Sophisticated Reader Should Be Asking
The labor math, the workforce constraints, and the structural infrastructure gaps documented in this article and Part One together raise a series of questions that the vendors named in this investigation have not publicly answered. Each of these questions has a defensible answer if the vendor chose to provide it. Each, in the absence of an answer, contributes to the conclusion that the foundational training-corpus claim has been allowed to function as marketing rather than as a substantiated representation. The questions, listed without elaboration so a reader can take them directly to a vendor sales conversation:
How many radiologists, exactly, contributed to the labeling of your training corpus? List them by name or by role and credential. Were they board-certified veterinary radiologists, residents, general-practice veterinarians, or veterinary technicians? What fraction of the corpus did each category label?
What fraction of your training labels were generated by direct human review of each image, versus extracted by NLP from existing radiology reports, versus generated by AI models trained on smaller seed datasets and propagated to the larger corpus through self-supervised or semi-supervised techniques?
For the human-reviewed fraction, what was the per-image annotation rate, broken down by annotation type (image-level classification, bounding-box localization, pixel-level segmentation, measurement extraction)? What annotation platform was used, and what quality control protocols governed the work?
How many independent annotators reviewed each training image? What adjudication procedure was used to resolve annotator disagreement? What was the inter-annotator agreement coefficient?
What fraction of your training corpus has pathology-confirmed ground truth (necropsy, biopsy, or surgical confirmation)? What fraction has long-term clinical outcome confirmation? What fraction relies exclusively on radiologist consensus? Was that breakdown disclosed to the clinics that adopted the product?
How is your training corpus balanced across canine breed phenotypes? Across feline breed phenotypes? What proportion of images come from chondrodystrophic versus non-chondrodystrophic breeds? From brachycephalic versus mesocephalic versus dolichocephalic breeds? How does the AI’s performance vary across breed phenotypes that are underrepresented in the training data?
If your training corpus claim is large because it counts multiple views per study or multiple labels per image, please clarify the count of unique distinct studies, unique distinct images, and unique distinct annotations. Are these numbers different? By how much?
How many of your training images correspond to clinically abnormal cases, and how many correspond to normal cases? Within the abnormal subset, what is the distribution of pathology categories? Are rare pathologies represented at sufficient sample size for the AI’s reported sensitivity to those pathologies to be statistically meaningful?
What is the algorithm version that was used for any cited validation study? What is the algorithm version currently deployed in clinical practice? Are these the same? If not, what changed between them, and what are the published performance characteristics of the deployed version?
None of these questions is hostile in a methodologically credible AI training context. Every one of them is the standard documentation that the CLAIM checklist requires of a peer-reviewed manuscript reporting an AI medical imaging study. The same questions are answered, in detail and with documentation, in every credible human-side AI training publication. They are the questions a sophisticated clinic should ask before signing a contract for an AI radiology service. They are the questions the 2025 ACVR/ECVDI position statement implicitly invites every veterinary AI vendor to answer in the affirmative. The vendors, to date, have not answered them at the level that human-side AI publication standards require.
The Cumulative Picture
The cumulative picture across Part One and Part Two is severe. At the simplest annotation step (categorical labeling, Stanford CheXNeXt rate of 34.3 sec/image), the larger vendor training-corpus claims could not be reconciled with the documented North American veterinary diagnostic imaging Diplomate workforce. At the bounding-box step (peer-reviewed median rate of 6.08 minutes per study), the Antech 16-million-image claim alone requires labor exceeding the cumulative career output of every active North American DI Diplomate working their entire careers. At the pixel-segmentation step (conservative published rate, 30 minutes per image, 10% of corpus), the labor figures compound further. Across all three annotation steps, the Antech claim at the human-side three-annotator standard requires approximately 3,712 radiologist-years — a labor figure that does not have a defensible reconciliation path through specialist labeling at any scale or timeframe.
And the structural infrastructure gaps mean that even if the labor existed, the upstream resources required to support credible veterinary AI training at the claimed scale do not. There is no veterinary subspecialty fellowship pathway. There is no pathology-confirmed dataset at scale. There is no breed-balanced reference dataset of the size required to support reliable performance across the canine and feline breed spectrum. These are not deficiencies that can be solved with more labor. They are absences of upstream infrastructure that veterinary medicine, as a profession, has not yet built.
The four reconciliation paths from Part One — NLP extraction, non-specialist labeling, AI-generated pseudo-labels, or inflated headline numbers — are the only logically available explanations for how the published vendor figures could exist. Each is methodologically defensible when disclosed. None has been disclosed at the level human-side AI publication standards would require. The disclosure standard is documented in the CLAIM checklist. The disclosure call has been made by the 2025 ACVR/ECVDI position statement. The disclosure responsibility rests with the vendors. Until the disclosure is produced, the foundational claim of large specialist-labeled veterinary AI training corpora cannot be evaluated, cannot be relied upon by clinics making adoption decisions, and cannot be considered substantiated under the standards the field’s own professional society has documented. Part Three closes this investigation by examining what the validation evidence base on commercial veterinary AI actually shows when external researchers have attempted to evaluate it, and the corporate revenue model that has allowed the disclosure gap to persist commercially despite the structural impossibility documented in this article.
The Bottom Line — Part Two
Part One of this investigation calculated the labor required to apply image-level categorical labels to the training corpora claimed by SignalPET, Vetology, and Antech RapidRead at the simplest annotation step. The math at that simplest step did not work for the larger claims. Part Two has applied the bounding-box and pixel-segmentation rates from peer-reviewed AI training literature to the same claims, and added three structural infrastructure questions Part One did not address: the absence of subspecialty fellowship training in veterinary radiology, the scarcity of pathology-confirmed ground-truth datasets, and the breed-specific anatomic variation that prevents direct application of human chest x-ray training methodology to veterinary subjects. The cumulative finding: the foundational claim of large specialist-labeled veterinary AI training corpora is not merely difficult to reconcile with the available evidence. It is structurally impossible at the scales the marketing presents. The labor figures exceed the cumulative career output of every veterinary radiologist who has ever practiced. The infrastructure gaps mean even unlimited labor could not produce the claimed corpora at the required quality.
The four reconciliation paths — NLP extraction, non-specialist labeling, AI-generated pseudo-labels, or inflated headline numbers — each represent methodologically defensible practice when disclosed. None has been disclosed by the vendors at the level human-side AI publication standards require. The CLAIM checklist exists. The 2025 ACVR/ECVDI position statement has called for exactly this kind of disclosure. The vendors have not produced it. The clinic deciding whether to adopt a veterinary AI radiology product on the basis of “trained on millions of specialist-reviewed cases” should understand that the math, calculated against the only published radiologist labeling rates and the documented North American specialty workforce, does not support the claim as the marketing presents it — and that the structural infrastructure required to support it, even if the labor were available, does not exist in veterinary medicine as the profession is currently configured.
The claim is not unsubstantiated because the vendors have failed to produce documentation. The claim is unsubstantiated because the labor and the infrastructure required to substantiate it do not exist. Part Three of this investigation closes the series by documenting what the validation-statistics evidence base on commercial veterinary AI actually shows — Joslyn 2025 commentary, Ma 2026 JAVMA pilot study, the absence of any FDA-equivalent regulatory framework — and the Mars-Antech-VCA-BluePearl corporate-consolidation revenue model that explains why a category of commercial software that cannot demonstrate the validation human medicine requires has nonetheless become the dominant business model for AI radiology services within corporate veterinary medicine. The math, the infrastructure, and the corporate structure all converge on the same conclusion: the foundational claim is a marketing artifact that the field’s own professional society has documented to be structurally unmet.
Frequently Asked Questions
What is bounding-box annotation in AI training, and how long does it actually take?
Bounding-box annotation is the AI training task in which a radiologist draws a rectangle around each abnormality on an image and applies a categorical label to the rectangle. It is the annotation type required for any AI product that displays to the user where on a radiograph a finding is located, which describes essentially every commercial veterinary AI radiology product on the market. The radiologist must identify the lesion’s edges, click and drag the rectangle to enclose them, apply the appropriate categorical label, verify the label is correct, and proceed to the next finding on the same image. The most authoritative published per-image rate for bounding-box annotation in radiology AI training comes from a peer-reviewed paper published in Radiology: Artificial Intelligence (PMC8017380, 2021), which documented annotation of 294 coronary CT angiography studies at a median speed of 6.08 minutes per study (interquartile range 2.8 to 10.6 minutes per study) and 73 seconds per individual annotated vessel. For a multi-finding radiograph with several abnormalities to localize, the per-image bounding-box annotation rate is realistically 3 to 8 minutes per image, an order of magnitude longer than the image-level categorical labeling rate the Stanford CheXNeXt study measured at 34.3 seconds per image. Bounding-box annotation is the second of four annotation steps required to build a credible commercial AI radiology product. The first step, image-level categorical labeling, is the only one Part One of this investigation calculated. This article addresses bounding-box annotation, pixel-level segmentation, and pathology-confirmed ground truth correlation. Part Three addresses the validation-statistics evidence base on commercial veterinary AI and the corporate-consolidation revenue model that explains why the labor and infrastructure documented across Parts One and Two have not produced the validation evidence the human-side equivalent product category is required to produce.
How long does pixel-level segmentation take in medical AI annotation?
Pixel-level segmentation requires the radiologist to outline the precise boundary of each anatomic structure or lesion at the pixel level, producing a shape mask used for measurement, volumetric analysis, and disease characterization. Per-image segmentation rates documented in peer-reviewed literature range from several minutes to over an hour per image depending on complexity. A widely cited figure from the Menze et al. BRATS dataset documentation (IEEE Transactions on Medical Imaging, 2014) reports that “a radiologist usually takes about 60 minutes to manually segment brain tumors per patient in its multi-sequence MRI volumes.” The 2024 comprehensive survey on deep active learning in medical image analysis (published in Medical Image Analysis) cites this figure and confirms that “depending on the complexity of the regions of interest to segment and the local anatomical structures, minutes to hours may be required to annotate one image” (Annotation-efficient deep learning, Nature Communications, 2021). The same literature is explicit that “more than three domain experts are typically needed to generate trustworthy annotations” for pixel-level segmentation, multiplying the per-image time by the number of independent annotators required. Pixel-level segmentation is required for any AI that produces measurements such as vertebral heart score, lung field volume, mass dimensions, or bone-density quantification, which are common features of commercial veterinary AI radiology products. The labor budget for segmentation work, applied to vendor training-corpus claims at scale, exceeds what the entire history of board-certified veterinary radiology specialty practice could have produced.
Why does the absence of subspecialty fellowship training in veterinary radiology matter for AI training?
Human radiology AI training relies heavily on subspecialty fellowship-trained radiologists. The Stanford CheXNeXt study’s validation set was labeled by cardiothoracic specialist radiologists — physicians who completed a thoracic radiology fellowship after their general radiology residency. The 2020 RSNA-STR Pulmonary Embolism Challenge dataset was labeled by 80-plus expert thoracic radiologists. The 2024 FDA-cleared chest x-ray AI training set was labeled by 17 board-certified radiologists with a median 14 years of experience, many of them subspecialty fellowship-trained. Subspecialty fellowships in human radiology include cardiothoracic, abdominal, musculoskeletal, neuroradiology, breast imaging, pediatric radiology, interventional radiology, and several others. Each pathway requires one to two years of additional training after a four- or five-year diagnostic radiology residency, focused exclusively on a single body region or clinical task. Veterinary radiology has no such pathway. The American College of Veterinary Radiology’s residency training essentials, published on the ACVR website, specify that a veterinary radiology resident is to receive training “in all of the subspecialty areas of veterinary radiology” across small and large animal diagnostic radiology, diagnostic ultrasound, nuclear medicine, CT, and MRI, in a single three-year program. Every ACVR Diplomate is, by certification design, a generalist across all body regions, all modalities, and all species commonly encountered in veterinary medicine. The implication for AI training is significant: when human radiology AI vendors require thoracic-fellowship-trained radiologists to label thoracic training data and abdominal-fellowship-trained radiologists to label abdominal training data, they are deploying a level of subspecialty expertise that does not exist in the veterinary specialty workforce because the training pathway that produces it does not exist.
How many pathology-confirmed cases are actually available in veterinary medicine for AI training?
Pathology-confirmed ground truth — necropsy, surgical biopsy, or surgical confirmation — is the gold standard for AI training validation in human radiology. Most commercial human chest x-ray AI training datasets use pathology-confirmed reference standards for some fraction of their data, with the percentage and methodology disclosed in peer-reviewed publications. In veterinary medicine, pathology-confirmed datasets at scale fundamentally do not exist. The reasons are practical: most companion animal cases that go through general practice never receive necropsy. Pathology confirmation typically requires a referral hospital or a teaching institution and is performed on a minority of cases that proceed to that level of workup. The 2023 paper by Cohen, Fischetti, and Daverio published in Veterinary Radiology and Ultrasound studied veterinary radiologist error rates against necropsy findings at the Animal Medical Center in New York — establishing both that necropsy correlation produces meaningful error rate measurement, and that the available pathology-confirmed dataset at one of the largest veterinary referral institutions in the United States is a fraction of what would be required to validate a commercial AI training corpus. Building a pathology-confirmed veterinary radiograph training set of 100,000 images is logistically harder than building a radiologist-consensus-labeled set of 2 million. The cumulative pool of pathology-confirmed veterinary radiograph cases available to AI vendors as of 2026 is, by every available estimate, smaller by orders of magnitude than what the larger commercial training-corpus claims would require if pathology-confirmed ground truth had been used. Part Three documents the implications: the Ma et al. 2026 JAVMA pilot study, the first peer-reviewed external validation of multiple commercial veterinary AI services against pathology-confirmed canine abdominal cases, found sensitivity of 71% to 90% with documented deficiencies in interpretation.
Does breed-specific anatomic variation affect veterinary AI training requirements?
Yes, in ways that human radiology AI training does not have to account for. Companion animal breeds present substantial anatomic variation across dogs and cats that does not have an equivalent in human radiology practice. A bulldog’s thoracic conformation — the wide, shallow chest with thick mediastinal fat pads and characteristic vertebral patterns — is anatomically different from a greyhound’s deep, narrow chest. A chondrodystrophic dachshund’s spine is different from a German shepherd’s. A Persian cat’s skull is different from a Maine Coon’s. A bullmastiff’s abdominal silhouette is different from a Shetland sheepdog’s. These differences are not subtle and are not artifacts of imaging — they reflect genuine breed-level anatomic variation that affects what a normal radiograph looks like and where the boundaries between normal and abnormal lie. Human radiology AI training datasets do not face this complication. The variation across human chest radiographs is meaningful but smaller in magnitude than the variation across canine breeds. For veterinary AI to perform reliably across the breed spectrum, the training corpus must include adequate representation of each major breed phenotype, with normal and abnormal examples from each breed labeled by specialists who understand the breed-specific normal variants. The cumulative pool of pathology-confirmed, breed-balanced, specialist-labeled veterinary radiographs that would be required to train an AI product to the standard human radiology AI is held to is, by every available estimate, far smaller than what the larger commercial vendor training-corpus claims would require. This is a structural infrastructure gap, not a labor gap.
What is the bounding-box math when applied to veterinary AI vendor claims?
Applying the peer-reviewed bounding-box annotation rate (median 6.08 minutes per study, interquartile range 2.8 to 10.6 minutes per study, from PMC8017380, Radiology: Artificial Intelligence, 2021) to the three vendor training-corpus claims produces labor figures of an entirely different order of magnitude than the image-level classification math in Part One. At the median rate of 6.08 minutes per study, Antech RapidRead’s 16-million-image claim requires approximately 1,621,333 radiologist-hours, or 779 radiologist-years of dedicated full-time bounding-box annotation work. At the slow end of the published interquartile range (10.6 minutes per study), the figure rises to 1,357 radiologist-years. SignalPET’s 2-million-image claim requires approximately 97 to 169 radiologist-years at the same rates, and Vetology’s 300,000-case claim requires approximately 14.6 to 25.5 radiologist-years. At the human-side AI standard of three independent annotators per training image, the figures triple. The Antech claim at three annotators and the median rate becomes approximately 2,338 radiologist-years of dedicated specialist work — exceeding the cumulative career output of every active North American diagnostic imaging Diplomate working their entire careers on nothing but bounding-box annotation. And these calculations cover only Step Two of the four-step annotation pipeline. Pixel-level segmentation and pathology-confirmed ground truth correlation, calculated separately, add additional labor figures that compound rather than reduce the total.
What does the segmentation math look like for veterinary AI claims?
Pixel-level segmentation, at the conservative published rate of approximately 30 minutes per image for moderate-complexity radiograph segmentation work (well below the 60-minute MRI brain tumor figure documented in the Menze et al. BRATS dataset paper), produces still-larger labor figures when applied to vendor training-corpus claims. If even 10% of Antech RapidRead’s 16-million-image training corpus required pixel-level segmentation work — a low estimate given that the product is marketed as producing measurements such as vertebral heart score and other quantitative outputs that require segmentation training data — the labor required is approximately 1.6 million images at 30 minutes per image, or 800,000 radiologist-hours, or 385 radiologist-years of dedicated full-time segmentation work. At the three-annotator standard for trustworthy segmentation cited in the peer-reviewed AI training literature, the figure becomes approximately 1,154 radiologist-years. For SignalPET, applied to 10% of 2 million images at the same rates, the segmentation labor required is approximately 48 radiologist-years for single-pass annotation and approximately 144 radiologist-years at the three-annotator standard. For Vetology’s 300,000-case claim, applied to 10% at the same rates, the figure is approximately 7 to 22 radiologist-years for segmentation alone. None of these figures includes the bounding-box annotation work that the localization features of the products require. The labor figures across the three annotation steps — image-level classification, bounding-box annotation, and pixel-level segmentation — sum to total figures that, for the larger vendor claims, exceed the cumulative career output of the entire history of board-certified veterinary radiology as a specialty.
What does this all mean for the foundational claim that veterinary AI was trained on millions of specialist-labeled cases?
The foundational claim — that commercial veterinary AI radiology products were trained on millions of board-certified specialist-labeled cases — is, when tested against published per-image annotation rates and the documented specialty workforce, structurally impossible at the scales the marketing presents. The math at the simplest annotation step (Part One) produces labor figures that exceed the available specialty workforce. The math at the bounding-box step (this article) produces labor figures that exceed the cumulative career output of every diagnostic imaging Diplomate currently practicing. The math at the segmentation step compounds the figures further. The infrastructure gaps — no veterinary subspecialty fellowship training, no pathology-confirmed dataset at scale, no breed-balanced reference dataset — mean that even if the labor existed, the upstream data infrastructure required to support credible veterinary AI training at the scale claimed does not exist either. The four reconciliation paths identified in Part One — NLP extraction from existing reports, non-specialist human labeling, AI-generated pseudo-labels, and inflated headline numbers — are the only logically available explanations for how the published vendor figures could exist. None of these has been disclosed by any of the three vendors named in this investigation at the level the human-side AI publication standards would require. Part Three closes the investigation by examining the validation-statistics evidence base under which these labor figures were supposedly used to produce commercial products, and the corporate-consolidation revenue model that explains why the validation gap has not been closed. For more on what credible AI radiology validation looks like, see our coverage of the engineering rigor gap; for the regulatory framework, see our coverage of the regulatory gap; for the labor math at the simplest annotation step, see Phantom Radiologists Part One.
- Phantom Radiologists Part One: The Time Math That Exposes Veterinary AI’s Training-Set Problem (The Labeling Step) — The companion Part One: image-level categorical labeling math at Stanford CheXNeXt’s documented 34.3 seconds per image. The simplest annotation task, the most charitable possible math.
- Phantom Radiologists Part Three: The Validation Statistics Veterinary AI Vendors Don’t Have to Publish, and the Revenue Model That Explains Why — The closing of The Math Problem series: FDA-cleared human radiology AI validation evidence vs. the commercial veterinary AI external validation literature (Joslyn 2025 commentary, Ma 2026 JAVMA pilot study), plus the Mars-Antech-VCA-BluePearl corporate consolidation revenue model and the conflict of interest the human-side structure prevents by design.
- How Human Radiology AI Actually Gets Built — and the Wild West of Veterinary AI Where None of That Exists — The companion engineering analysis: how human-side radiology AI is built, validated, regulated, and version-controlled, and why veterinary AI vendors operate in a category that has no equivalent in U.S. human medicine.
- The Safeguards That Don’t Apply Here: How Veterinary AI Radiology Vendors Operate Outside Every Rule That Governs the Human Side — The companion regulatory analysis: FDA clearance, state practice acts, and reimbursement gatekeeping, and why none of them reach veterinary AI radiology.
- A User Interface for Optimizing Radiologist Engagement in Image Data Curation for Artificial Intelligence. Radiology: Artificial Intelligence. 2021. PMC8017380. Source for the median 6.08-minute bounding-box annotation rate (interquartile range 2.8–10.6 min per study) used throughout the bounding-box math in this article.
- Menze BH, Jakab A, Bauer S, et al. The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS). IEEE Transactions on Medical Imaging. 2014. Source for the widely cited 60-minute-per-patient brain tumor MRI segmentation rate.
- A comprehensive survey on deep active learning in medical image analysis. Medical Image Analysis. 2024. Source. Cites the Menze segmentation rate, documents the median U.S. radiologist hourly rate at $219, and confirms that “more than three domain experts are typically needed to generate trustworthy annotations” for medical image segmentation.
- Annotation-efficient deep learning for automatic medical image segmentation. Nature Communications. 2021. PMC8501087. Source for the documented requirement of three domain experts for trustworthy segmentation annotations and the “minutes to hours per image” range for segmentation work.
- Rajpurkar P, Irvin J, Ball RL, et al. Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists. PLOS Medicine. 2018;15(11):e1002686. Open access. Source for the 34.3-sec/image image-level labeling-time benchmark, the cardiothoracic specialist annotator standard, and the foundational human chest x-ray AI training methodology comparison.
- Cohen J, Fischetti AJ, Daverio H. Veterinary radiologic error rate as determined by necropsy. Vet Radiol Ultrasound. 2023;64(4):573–584. The pathology-confirmed error-rate study from the Animal Medical Center, foundational for the pathology-confirmed ground truth scarcity discussion in this article.
- Joslyn SK, Faulkner J, Ma D, Appleby R. Commentary: Comparison of radiological interpretation made by veterinary radiologists and state-of-the-art commercial AI software for canine and feline radiographic studies. Front Vet Sci. 2025;12:1615947. Open access. The peer-reviewed methodological critique identifying the “continuously updated and does not have version numbers” issue and the sensitivity-collapse-to-0.444 finding on difficult cases; analyzed in detail in Part Three.
- Ma D, Faulkner JE, Stander N, Raisis A, Joslyn SK. Pilot study: external validation of commercial veterinary radiology artificial intelligence services shows deficiencies in interpretation of general practice-sourced canine abdominal radiographs. JAVMA. 2026. doi.org/10.2460/javma.25.10.0691. The first peer-reviewed external validation study of multiple commercial veterinary AI radiology services against pathology-confirmed cases; analyzed in detail in Part Three.
- Appleby RB, Difazio M, Cassel N, Hennessey R, Basran PS. American College of Veterinary Radiology and European College of Veterinary Diagnostic Imaging position statement on artificial intelligence. JAVMA. 2025;263(6):773–776. Open access. Source for the categorical statement that no current commercial veterinary AI radiology product meets the required standards for transparency, validation, or safety.
- American College of Veterinary Radiology. Diagnostic Imaging Residency Programs and Training Essentials. acvr.org. Source for the documentation of the ACVR three-year residency training pathway covering all subspecialty areas of veterinary radiology, establishing the absence of body-region subspecialty fellowship pathways.
- Specialists in Short Supply. JAVMA News. October 15, 2018. Open access. Source for ACVR Executive Director Dr. Tod Drost’s documented analysis of the specialty workforce shortage.
- Mongan J, Moy L, Kahn CE. Checklist for artificial intelligence in medical imaging (CLAIM): a guide for authors and reviewers. Radiology: Artificial Intelligence. 2020;2(2):e200029. The 42-item documentation standard used by peer-reviewed human medical imaging journals to evaluate AI manuscripts.
- SignalPET: Training corpus claim of “over 2 million annotated veterinary radiographs” sourced from company materials at https://www.signalpet.com/.
- Vetology: Training corpus claim of “over 300,000 Board Certified veterinary radiologist-reviewed cases” and “38 different deep-learning architectures” sourced from https://vetology.net/ai/.
- Antech RapidRead (Mars): Training corpus claim of “16 million images sourced from an unprecedented library of more than 8 billion images” sourced from https://www.antechdiagnostics.com/imaging-services/rapidread/.
Editorial & Legal Disclaimer. VeterinaryTeleradiology.com is an independent industry publication. This article is Part Two of a three-part investigation into the relationship between commercial veterinary AI radiology vendor training-corpus claims and the documented capacity of the board-certified veterinary radiologist specialty workforce. Part Two addresses the bounding-box, pixel-segmentation, and pathology-correlation annotation tasks that produce the actual capabilities marketed in commercial products, applying primary-source per-image annotation rates from peer-reviewed human medical imaging literature. The article also documents three structural infrastructure gaps in veterinary medicine — the absence of subspecialty fellowship training, the scarcity of pathology-confirmed ground-truth datasets, and breed-specific anatomic variation — that compound the labor-math conclusions. Part Three closes the series by documenting the validation-statistics evidence base for commercial veterinary AI versus FDA-cleared human radiology AI, and the corporate-consolidation revenue model that has produced the marketing claims at issue.
This article is based entirely on publicly available and documented sources, each identified in the Primary Documents Referenced and Vendor Marketing Materials sections above. Sources include: peer-reviewed papers published in PLOS Medicine, Radiology: Artificial Intelligence, IEEE Transactions on Medical Imaging, Medical Image Analysis, Nature Communications, Veterinary Radiology & Ultrasound, Frontiers in Veterinary Science, and JAVMA; institutional sources including the American College of Veterinary Radiology and the American Veterinary Medical Association; trade-press reporting in JAVMA News; and publicly accessible vendor marketing materials and product descriptions from SignalPET, Vetology, and Antech Diagnostics. No confidential sources, non-public documents, or unverified information is relied upon in this article. Every factual claim, every input to the math, and every conclusion is attributable to one or more of the above primary or secondary sources.
This article presents a quantitative analysis applying published per-image radiologist annotation rates and a documented specialty workforce population to publicly stated training-corpus claims by three commercial veterinary AI radiology vendors. The mathematical calculations presented are reproducible from the inputs cited. The conclusions drawn — specifically, that the larger training-corpus claims cannot be reconciled with board-certified specialist annotation within the documented specialty workforce, even at the bounding-box and pixel-segmentation steps — are descriptive of the arithmetic, not assertions of legal wrongdoing or fraudulent representation. The four reconciliation possibilities identified in Part One (NLP extraction from existing reports, non-specialist human labeling, AI-generated pseudo-labels, or composition differences in the headline corpus number) are each methodologically defensible practices when disclosed; this article identifies the absence of disclosure as the central methodological gap, not the practices themselves.
The article does not assert that any vendor has engaged in fraud, misrepresentation, or unfair trade practices. It does not assert that any vendor’s training methodology fails on its merits. It asserts, descriptively, that the published training-corpus claims are not supported at the level of methodological detail the human-side AI literature considers standard, and that the gap between the marketing claims and the documented specialty workforce capacity, combined with the structural infrastructure gaps documented in this article, places the foundational claim in the category of “structurally impossible at the scale presented” rather than merely “undocumented.” Each vendor named in this article is invited to publish the training-data methodology disclosures the CLAIM checklist requires; any disclosure supported by documentary evidence will be published in full by this publication. This invitation is extended directly and without prejudice to SignalPET, Vetology, Antech Diagnostics, Mars Petcare, and any other vendor whose products are discussed.
This publication is not a law firm and does not provide legal advice. Veterinarians, state regulators, and other readers with specific factual or legal questions should consult qualified counsel. The mathematical analysis presented is intended to inform reader and regulatory consideration of how the marketing claims of large commercial veterinary AI vendors compare against documented specialty workforce capacity and the available upstream data infrastructure. It is not a substitute for vendor-specific due diligence by clinics evaluating these products for adoption.