3. Statistical Analyses #
Now that you have quantitative data on your vessels, you can perform statistics to investigate what those numbers mean.
-
3.1 Normalization #
Before you can apply statistical analyses, it is necessary to normalize your data. The code
Area_slide.py(script downloadable from the user slrenne’s erivessel repository) allows you to normalize the number of vessels on the tissue sample area. The function of this code is to segment the entire sample area through the L channel, also known as the luminance channel.To begin, set up the environment.
#python import cv2 import numpy as np import matplotlib.pyplot as plt import os import glob import pandas as pd from tqdm import tqdmThe following function takes the path to an image as input and returns the area of tissue present, optionally saving visualizations of the results. The image is read using OpenCV, then converted from BGR to RGB to LAB (L = luminance, A = green/red, B = blue/yellow). Here, only the L channel is extracted to help separate tissue from the background based on brightness.
#python def calculate_tissue_area(image_path, save_visualizations=False, output_folder=None): img = cv2.imread(image_path) if img is None: print(f"Error: Could not read image {image_path}") return 0, None, None img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) lab = cv2.cvtColor(img, cv2.COLOR_BGR2LAB) l_channel = lab[:,:,0]You then apply a binary inverse threshold: pixels brighter than 220 (almost white) are set to 0 (black), while others are set to 255 (white). This isolates the darker tissue from the bright background. Next, you perform morhological operations and find the contours of white regions in the binary image, which should correspond to tissue.
#python _, binary = cv2.threshold(l_channel, 220, 255, cv2.THRESH_BINARY_INV) kernel = np.ones((5,5), np.uint8) binary = cv2.morphologyEx(binary, cv2.MORPH_CLOSE, kernel) binary = cv2.morphologyEx(binary, cv2.MORPH_OPEN, kernel) contours, _ = cv2.findContours(binary, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) mask = np.zeros_like(l_channel) min_contour_area = 1000 # Adjust based on your image size for contour in contours: if cv2.contourArea(contour) > min_contour_area: cv2.drawContours(mask, [contour], 0, 255, -1)Here, you count how many white pixels are present, knowing that each pixel corresponds to 1 unit of tissue area. Additionally a mask is used to extract only the tissue from the RGB image by making pixels outside the tissue black.
#python area_pixels = cv2.countNonZero(mask) filled_tissue = cv2.bitwise_and(img_rgb, img_rgb, mask=mask)If enabled, the
save_visualizationsfunction generates a 2x2 subplot with: the original image, the binary threshold result, the tissue mask and the overlayed image with area shown in the title. The plots are then saved in the specifiedoutput_folderwith_analysis.pngsuffix.if save_visualizations: if output_folder is None: output_folder = os.path.dirname(image_path) or '.' plt.figure(figsize=(15, 10)) plt.subplot(2, 2, 1) plt.imshow(img_rgb) plt.title('Original Image') plt.axis('off') plt.subplot(2, 2, 2) plt.imshow(binary, cmap='gray') plt.title('Thresholded Image') plt.axis('off') plt.subplot(2, 2, 3) plt.imshow(mask, cmap='gray') plt.title('Tissue Mask') plt.axis('off') plt.subplot(2, 2, 4) plt.imshow(filled_tissue) plt.title(f'Filled Tissue (Area: {area_pixels} pixels)') plt.axis('off') plt.tight_layout() base_name = os.path.splitext(os.path.basename(image_path))[0] plt.savefig(f"{output_folder}/{base_name}_analysis.png", dpi=300) plt.close()At the end the
calculate_tissue_areafunction returns the tissue area in pixels, the binary mask, and the RGB image with tissue highlighted.return area_pixels, mask, filled_tissueExample output: a binary image showing the background in black and the total tissue area in white.
The next function,
process_folder, processes all image files in a given folder and applies thecalculate_tissue_areafunction to each of them. The results are saved to a CSV file.def process_folder(input_folder, output_folder, file_extensions=None): if file_extensions is None: file_extensions = ['.jpg', '.jpeg', '.png', '.tif', '.tiff'] os.makedirs(output_folder, exist_ok=True) image_files = [] for ext in file_extensions: image_files.extend(glob.glob(os.path.join(input_folder, f'*{ext}'))) image_files.extend(glob.glob(os.path.join(input_folder, f'*{ext.upper()}'))) if not image_files: print(f"No image files found in {input_folder} with extensions {file_extensions}") return None print(f"Found {len(image_files)} image files to process") results = [] for img_path in tqdm(image_files, desc="Processing slides"): try: filename = os.path.basename(img_path) area, mask, filled_tissue = calculate_tissue_area(img_path) img = cv2.imread(img_path) base_name = os.path.splitext(filename)[0] cv2.imwrite(f"{output_folder}/{base_name}_original.png", img) if mask is not None: cv2.imwrite(f"{output_folder}/{base_name}_mask.png", mask) results.append({ 'Image': filename, 'Tissue Area (pixels)': area }) except Exception as e: print(f"Error processing {img_path}: {e}") if results: df = pd.DataFrame(results) csv_path = os.path.join(output_folder, "tissue_areas.csv") df.to_csv(csv_path, index=False) print(f"Results saved to {csv_path}") return df else: print("No results were generated") return NoneThe final section of this script is the main execution block. It defines the folders to use, calls the processing function and checks whether results were produced by printing a confirmation or warning message accordingly.
if __name__ == "__main__": input_folder = 'Input' output_folder = 'Output' print(f"Processing slides from {input_folder}") print(f"Results will be saved to {output_folder}") df = process_folder(input_folder, output_folder) if df is not None: print("Processing complete!") print(f"CSV file with area measurements saved to {output_folder}/tissue_areas.csv") else: print("No results were generated.")Now that you have calculated the area of each tissue section in pixels, you can proceed in performing your desired statistical analyses normalizing on tissue area.
-
3.2 Bayesian Modeling #
This section presents a complete example of how to apply a hierarchical Bayesian model. In our case, the primary focus of the analysis is the minor axis length of blood vessels, a morphological feature that can vary across treatments and experimental models. Specifically, the goal of this statistical approach is to determine whether different treatments have a significant effect on vessel morphology, while accounting for possible variability introduced by different experimental models. The analysis uses partial pooling, which allows for more robust estimation of effects even in the presence of small sample sizes or hierarchical data structures.
The full code
modeling.Ris available in theerivesselrepository by the userslrenne. Start by loading therethinkingpackage, which is built on top of Stan and allows to easily define and fit Bayesian models using theulamfunction. The dataset is read from a CSV file.#r library(rethinking) db <- read.csv('./input/db.csv', stringsAsFactors = FALSE)To prepare the data for modeling, first convert the
modelandtreatmentcolumns to factors. This ensures that they are treated as categorical variables when passed to the model. Then apply a natural logarithmic transformation to theaxis_minor_lengthvariable, which is often necessary to normalize right-skewed biological measurements.#r db$model <- as.factor(db$model) db$treatment <- as.factor(db$treatment) db$min_log <- log(db$axis_minor_length)For the next step, you need to structure data for the model. The
ulamfunction expects a list of named vectors. Here, you standardize the log-transformed minor axis lengths so that they have a mean of 0 and standard deviation of 1. This standardization helps with model convergence and interpretation. Also, this is where you should convert the factor variables into numeric indices.#r d <- list( axis_length = standardize(db$min_log), mod = as.numeric(db$model), rx = as.numeric(db$treatment))Then, specify a hierarchical Bayesian model using ulam. This model assumes that the standardized log-minor-axis length is normally distributed around a linear predictor
muthat is the sum of effects from both the model type and the treatment group. Each group effect is given a group-specific prior with partial pooling. This setup allows you to estimate the overall average effects (a_bar,b_bar) as well as group-level deviations (a[mod],b[rx]), while controlling for uncertainty.#r m <- ulam( alist( axis_length ~ dnorm(mu, sigma), mu <- a[mod] + b[rx], a[mod] ~ dnorm(a_bar, sigma_a), b[rx] ~ dnorm(b_bar, sigma_b), c(a_bar, b_bar) ~ dnorm(0,0.7), c(sigma, sigma_a, sigma_b) ~ dexp(1) ), data=d, chains=4, cores=4, iter=1000)After fitting the model, we examine summary statistics and convergence diagnostics. The
dashboard()function opens an interactive window, whileprecis()provides tabular summaries of the posterior distributions. You then extract the full posterior samples.#r dashboard(m) precis(m) post <- extract.samples(m)Define a helper function
p_link()to compute posterior predictions for any combination of model and treatment. Thedest()function reverses the standardization so that predictions are returned to the log scale.#r p_link <- function(mod = 1, rx = 1) { x <- with(post, a[, mod] + b[, rx]) return(x)} dest <- function(x) { x <- x * 0.378 x <- x + 4.44 return(x)}Next, generate simulated values from the model for a specific combination of inputs and then transform those predictions back to the original axis scale and visualize the results using a posterior predictive check. The reasoning behind this plot is that it allows us to visually assess whether the model is producing values similar to those observed in the dataset.
#r post_mu <- p_link(d$mod, d$rx) sim_min <- rnorm(length(d$axis_length), post_mu[1,], 1 ) sim_min <- dest(sim_min) plot(NULL, xlim = c(0,200), ylim = c(0,0.03), main = 'Posterior Predictive Check', xlab = 'Minor Axis Size', ylab = 'Density') dens(db$axis_minor_length, lwd = 3, add = TRUE) dens(exp(sim_min), lwd = 3, add = TRUE, col = 2)To then assess the direct effect of treatments without adjusting for model type, we simulate predictions for each treatment level independently.
#r p_link <- function(rx = 1) { x <- with(post, rnorm(length(post$a_bar), b[, rx], sigma_b)) x <- dest(x) return(x) } p_raw_bsim <- sapply(1:4, function(i) p_link(i))Lastly, compute contrasts between each treatment and the control group, plotting the posterior density of these differences to visualize the magnitude and uncertainty of treatment effects. The resulting plot is saved as a high-resolution PNG file and highlights how much each treatment deviates from the control group in terms of its predicted effect on vessel morphology.
#r png('output/contrastRXnoModel.png', width = 3000, height = 2000, res = 300) par(mai = c(1.1, 0.8, 0.8, 0.4) + 0.02) col <- c("#332288","#AA4499","#117733") s_factor <- 20 new.lab = c("DTX", "ERI1", "ERI2") plot(NULL, ylim=c(-0.8,1), xlim = c(0.5,3.5), xlab = '', ylab = 'Probability Density', xaxt = 'n', main = 'Treatment Contrast vs CTRL') abline(h = 0) for(i in 1:3) { obj <- p_raw_bsim[,i+1] - p_raw_bsim[,1] y <- density(obj)$x x <- density(obj)$y polygon(i + x/s_factor, y, col = scales::alpha(col[[i]],0.6), border = FALSE) lines(i + x/s_factor, y, lwd = 1) polygon(i - x/s_factor, y, col = scales::alpha(col[[i]],0.6), lwd = 2, border = FALSE) lines(i - x/s_factor, y, lwd = 1) } axis(1, at = 1:3, labels = FALSE) text(x = 1:3+0.05, -0.92, labels = new.lab, srt=40, adj=1, xpd=TRUE) dev.off()In conclusion, this code shows you how to move from raw morphological measurements to robust statistical inferences using hierarchical Bayesian modeling. The structure is modular and can be adapted to other vessel features (e.g., area, eccentricity) or extended to include spatial variables as covariates.