For example, in examining multiple tissue regions on a slide, a pathologist must combine i.

Here, an instance can be thought of as a tissue region, and a bag can be thought of as a collection of tissue regions from an individual.

Common pooling methods include max pooling, mean pooling, and log-sum-exp LSE -pooling [ 50 ], attention-based pooling [ 36 ].

Then, the meta-instance is computed using a weighted sum of each instance and its respective attention weight.

Then, in Eq 2, the yielded attention weights a k will be used to aggregate the instances x k. The is the parameter of the second layer of the ANN.

The outputs of the second layer are then normalized by the normalized exponential function i. The higher the DS is, the discriminative the DS is. Each patch was labeled with its slide-level label. It is unique in it automatically learns a function to combine multiple instances into a single meta-instance rather than hard coding a function e.

For each fold, with the CNN-scorer pre-trained, we conducted intelligent sampling on patches from both training and validation WSIs. MIL is a machine learning paradigm in which labels are assigned to collections of data points "bags" rather than individual data points "instances" in some datasets. We balanced the number of low and high-risk data by sampling equal number of patients from low-risk cohort.

An early stopping strategy was applied to avoid overfitting when the validation accuracy training accuracy for Ki67 experiments did not improve for 15 epochs.

Binary cross-entropy was used as a loss function. Third, negative bags must only contain negative instances [ 38 ].

A useful analogy to understand MIL paradigm is a disease on the tissue level.

We applied weight normalization to V and U to stabilize the optimization during the training [ 5152 ]. Similarly, instances of each bag have a positive or negative label. Finally, the meta-instance is classified by a fully connected layer FCN and a probability score will indicate the final prediction for the WSI.

CLAM also utilizes attention [ 36 ] to dynamically learn and fuse features predictive of the desired outcome in our case, ODX recurrence risk. In this manner, each of the selected and extracted feature vector from a WSI is an instance, the collection of those feature vectors will be a bag that represents this WSI.

The classification in MIL is done at the bag level, which is slide-level in our problem. Each region of the tissue i. The best AUC for Ki67 stained slides is 0. The three main underlying assumptions of MIL relate to bags and their instances.

The patches are rearranged according to their DSs as previously defined see Eq in descending order. Fig 9 depicts the prediction AUC when different numbers of K top patches are used to construct the bag as the input of the subsequent MIL model.

For Kistained slides, we conducted leave-two-out-cross-validation LTOCVin which one slide from each class was taken for each validation set. The bright area in the heatmaps correspond to WSI area that receive high attention weights.

Both methods are highly robust and well-known for their ability to generalize to multiple WSI datasets.

Their outputs are activated by tanh and sigmoid activation functions, and then the element-wise product is applied to the two outputs. Here, models are evaluated on the hold-out testing set in the 5-fold cross-validation. As per the attention-based pooling, the attention weights a1, a2, …, aK are produced by the ANN. And then, a weighted sum is conducted to aggregate the feature vectors with their attention weights.

We used the binary cross-entropy loss function for training these models.

The remaining components of the experimental design were identical when comparing these two methods with the proposed BCR-Net. However, AUCs reach a steady-state value when the number of samples in the bag exceeds a certain number.

Second, positive bags must contain positive instances and may contain negative instances. We utilized ODX recurrence risk as slide-level labels high vs.

MIL is conventionally posed as a two-class problem, where bags are either assigned a "positive" or "negative" label. We visualized the attention in the form of a heatmap, where each patch on the WSI was assigned the value of its attention weight see Figs 10 and The bottom row are a thumbnail and corresponding heatmap of a Ki67 slide.

First, instance labels are not explicitly assigned or known; they implicitly exist. The CIs were computed using the bootstrapping method. This meta-instance is then further processed i. Fig 7 depicts some sample outputs for selected patches. Namely, in one training step, all instances of a bag will be fed into the ANN in parallel. This is determined by their tissue, which, when examined one region at a time instancewill similarly present as diseased positive or healthy negative.

We can clearly find that the attention based MIL is highlighting specific tissue patterns. DSs are ranged from 0 to 1. Tissue from a diseased individual positive bag will contain diseased positive instances tissue and may contain healthy negative instances tissue. Fig 6 depicts our overall proposed methodology.

It is important to use attention weights to further highlight some discriminative instances, since the selected bag contain discriminative instances for both low- and high-risk categories Section 2.

The top K patches are sampled and embedded by the same feature extractor inherited from the CNN-scorer Fig 4. In the MIL context, we can formulate the low-risk slides as negative data, which only contains patches with sparse PCs, while the high-risk slides as positive data, which contains both sparse and dense patches of PCs.

Instance pooling is the component of MIL models, which specifies how instances are combined into a single, "bag-level" representation.

As the final representation of the WSI, the output meta-instance is classified by a fully connected layer FCN and a probability score will indicate the final pre-diction for the WSI. The magnitude of an attention weight correlates with how important its respective instance is in the downstream prediction on the meta-instance.

The output K feature vectors are treated as a bag of instances and aggregated through attention-based pooling. the figure, we find that the validation AUCs increase as the number of samples increases.

In our BCR-Net, we implement attention-based pooling [ 36 ]. Our implementation consists of a learnable two-layer artificial neural network ANN that maps instance, from an intelligently sampled bag in Section 2.

The resulting bags of feature vectors were used for training and validation of the attention-based MIL model. Typically, pooling fuses abstract representations of instances i. heatmaps are contrast enhanced for visualization purpose.