A Novel Monoallelic ALG5 Variant Causing Late-Onset ADPKD and Tubulointerstitial Fibrosis

Introduction Monoallelic variants in the ALG5 gene encoding asparagine-linked glycosylation protein 5 homolog (ALG5) have been recently shown to disrupt polycystin-1 (PC1) maturation and trafficking via underglycosylation, causing an autosomal dominant polycystic kidney disease-like (ADPKD-like) phenotype and interstitial fibrosis. In this report, we present clinical, genetic, histopathologic, and protein structure and functional correlates of a new ALG5 variant, p.R79W, that we identified in 2 distant genetically related Irish families displaying an atypical late-onset ADPKD phenotype combined with tubulointerstitial damage. Methods Whole exome and targeted sequencing were used for segregation analysis of available relatives. This was followed by immunohistochemistry examinations of kidney biopsies, and targeted (UMOD, MUC1) and untargeted plasma proteome and N-glycomic studies. Results We identified a monoallelic ALG5 variant [GRCh37 (NM_013338.5): g.37569565G>A, c.235C>T; p.R79W] that cosegregates in 23 individuals, of whom 18 were clinically affected. We detected abnormal localization of ALG5 in the Golgi apparatus of renal tubular cells in patients’ kidney specimens. Further, we detected the pathological accumulation of uromodulin, an N-glycosylated glycosylphosphatidylinositol (GPI)-anchored protein, in the endoplasmic reticulum (ER), but not mucin-1, an O- and N-glycosylated protein. Biochemical investigation revealed decreased plasma and urinary uromodulin levels in clinically affected individuals. Proteomic and glycoproteomic profiling revealed the dysregulation of chronic kidney disease (CKD)-associated proteins. Conclusion ALG5 dysfunction adversely affects maturation and trafficking of N-glycosylated and GPI anchored protein uromodulin, leading to structural and functional changes in the kidney. Our findings confirm ALG5 as a cause of late-onset ADPKD and provide additional insight into the molecular mechanisms of ADPKD-ALG5.


Figure S3: Plasma mucin-1 (CA15-3) levels in family members with heterozygous R79W-ALG5 and WT-ALG5 and WT-ALG5 and intracellular localization of MUC1 in patients' and control's kidney biopsies
There is no difference in the mean of plasma mucin 1 (CA15-3) concentration between clinically affected, clinically unaffected family members with heterozygous R79W-ALG5 variant (n=10) and WT-ALG5 (student's t-test, P ≥ 0.05) (S3A).In addition, there is no difference in mucin-1 levels between genetically affected family members with ALG5 R79W/-(clinically affected and clinically unaffected as defined by the eGFR cutoff of 90 ml/min) and genetically unaffected family members (S3B).Also, intracellular localization of mucin-1 is not different in patients' and control's kidney specimen.In both is mucin-1 localized on the apical pole of plasma membrane of tubular cells (S3C, S3D).Western blot of plasma transferrin showed in affected individuals with heterozygous R79W-ALG5 variant /-one band corresponding to fully processed glycosylated transferrin (79kDa) while in individual with glycosylation disorder (mutation in PMM2, CDG type 1a) were detected three bands corresponding to non-glycosylated (74kDa), partially glycosylated (77kDa) and fully glycosylated (79kDa) forms of transferrin.The samples were loaded in two volumes normalized to total protein concentration (100% corresponds to 30 ng and 50% corresponds to 15 ng of total protein).
Table S1: PLINK identity-by-descent (IBD) analysis.The IBD-sharing coefficients indicated that families F200 and F350 are related to one another.
The table indicates the PI_HAT score (a measure of overall IBD) and Z0, Z1, Z2 (the probability of having IBD of 0,1 or 2 respectively) between pairs of individuals.Parent-offspring relationships have (Z0, Z1, Z2 of 0,1,0) indicating that the individuals share approximately 50% IBD across the loci tested.Some degree of variance from the expected values for inferred relationships is common due to recombination events and technical errors.IBD-based coefficients confirmed a sibling-level (first-degree) relatedness between known siblings F200:II.9 and F200:II.1.Pairwise IBD estimates inferred second-degree and third-degree relationship estimates between F350_II.6 / F200_II.1 and F350_II.6 / F200_II.9, respectively.ID: identifier

Table S5
Glycoproteomic analysis identified 12 dysregulated proteins in genetically affected individuals with advanced CKD (stage 4 and 5) compared to genetically affected individuals with early CKD (stage 1 -3) and genetically unaffected family members.

Immunofluorescence Analysis of Human Kidney Biopsy
Formaldehyde-fixed paraffin-embedded kidney biopsies were obtained from three patients and two control samples.Control samples were examined by a renal pathologist to select healthy tissue for further processing.For ALG5 patient tissues, samples from the kidney cortex and medulla were collected from patients with advanced disease.
The paraffin sections were stained after deparaffination, hydration, sodium citrate pretreatment (pH=6), and standard blocking procedures (blocking of endogenous peroxidase with 1% sodium azide and 0.3% H2O2 for 10 minutes and blocking with 5% fetal bovine serum (FBS) in phosphate-buffered saline (PBS) for 30 minutes, both at room temperature).

Image Acquisition and Analysis
XYZ images were sampled according to Nyquist criterion using a Leica SP8X laser scanning confocal microscope, HC PL Apo objective (633, N.A.1.40),405 nm diode/50 mW DMOD Flexibl, and 488, 555, and 647 laser lines in 470-670 nm 80 MHz pulse continuum WLL2.Images were restored using a classic maximum likelihood restoration algorithm in the Huygens Professional Software (SVI, Hilversum, The Netherlands) [S27].The colocalization maps employing single pixel overlap coefficient values ranging from 0-1 were created in the Huygens Professional Software [S28].The resulting overlap coefficient values are presented as the pseudo color, which scale is shown in corresponding lookup tables (LUT).

Western Blot Analysis of Urinary Uromodulin
Aliquots of spot urines were normalized to urinary creatinine concentration, denatured in 6x Plasma CA15-3 (mucin 1) concentration was determined as previously described [S29].

Proteomic and Glycoproteomic Profiles of Plasma
Plasma protein concentration was determined using a BCA Protein Assay Kit (ThermoScientific).From each sample, an aliquot equivalent to 100 μg of protein was taken and divided into two parts.One part was subjected directly to the proteomic analysis, while the other one underwent an albumin depletion protocol, as detailed in [S31], to enrich glycoprotein content.Briefly, after sample dilution with water, the lipid fraction was removed.Then, 150 mM NaCl and ice-cold ethanol were added to the samples in a final ratio of 1:2:2.3,respectively.The samples were incubated for 1 hour at 4°C and then centrifuged at 4°C and 16,000 × g for 45 minutes.The resulting pellet was resuspended in 42% ice-cold ethanol and centrifuged at 4°C and 16,000 × g for an additional 15 minutes.The final pellet obtained after these steps was used for proteomic analysis.
Proteomic analysis was carried out using a sample aliquot containing 20 μg of protein.To each sample, a 100 mM triethylammonium bicarbonate buffer (TEAB) was added to adjust the total volume to 40 μL.Next, the samples were reduced and alkylated with tris(2-carboxyethyl)phosphine (TCEP) at a concentration of 10 mM and 2-chloroacetamide (CAA) at a concentration of 50 mM.The samples were then incubated at 70°C for 5 minutes.For digestion, trypsin was added to the samples at a ratio of 1:20, and the mixture was kept at 37°C overnight.To terminate the digestion process, trifluoroacetic acid (TFA) was added to achieve a final concentration of 0.5%.
After the protein digestion, the sample was subjected to offline desalting using C18 StageTip (3M, USA) [S32].Following desalting, the sample was dried using a SpeedVac concentrator.The dried peptides were re-suspended in a solution containing 2% acetonitrile (ACN) and 0.1% trifluoroacetic acid (TFA).The analysis of the peptides was performed using the Vanquish liquid chromatography system (ThermoScientific), which was coupled to the timsToF SCP mass spectrometer equipped with Captive spray (Bruker Daltonics).The mass spectrometer operated in a positive data-dependent mode.A one-microliter volume of the peptide mixture was injected using an autosampler onto a C18 trap column (PepMap Neo C18 5µm, 300 µm × 5 mm, Thermo Scientific).After 3 minutes of trapping, the peptides were eluted from the trap column and separated on a C18 analytical column (DNV PepMap Neo 75 µm × 150 mm, 2 µm, Thermo Scientific) using a linear gradient of 5% to 35% ACN over 35 minutes at a flow rate of 350 nL/min.Both the trap and analytical columns were heated to 50°C.The timsTOF SCP settings were based on the standard proteomics PASEF method.The scan range was set from 0.6 to 1.6 V s/cm2 with a ramp time of 100 ms.A total of 10 PASEF MS/MS scans were performed.Precursor ions in the m/z range between 100 and 1700 with charge states ≥2+ and ≤6+ were selected for fragmentation.Active exclusion was enabled for 0.4 min to prevent repeated selection of the same precursor ions.
Proteins were identified using MaxQuant software (version 1.6.17)[S33], and the peak lists were searched against the Homo sapiens database from Uniprot using the Andromeda search engine [S34].The database search parameters were as follows: trypsin enzyme specificity, allowing up to two missed cleavages; fixed modifications included carbamidomethylation of cysteine, while variable modifications were N-terminal protein acetylation and methionine oxidation.The precursor ion tolerance was set at 20 ppm, and the mass tolerance for MS/MS fragment ions was set at 0.5 Da.To ensure reliable identifications, PSM (Peptide Spectrum Match) and protein identifications were filtered using a target-decoy approach at a false discovery rate (FDR) of 1%.
For label-free quantification (LFQ) of proteins, the MaxLFQ algorithm [S35] integrated into MaxQuant was utilized.A minimum ratio count of 2 was set to ensure robust quantification.Subsequent data analysis was performed using Perseus software (version 2.0.70)[S36].The data were filtered to remove hits to the reverse database, contaminants, and proteins identified solely with modified peptides.LFQ intensity values were log2-transformed and normalized using Z-score.Statistical analyses were carried out using algorithms integrated into Perseus.
The raw data were deposited to a ProteomeXchange Consortium via a PRIDE [S37] partner repository with a data set identifier.

Statistical analysis
Patient characteristics and genetic diagnosis were collected, and data were presented descriptively; continuous variables were expressed as mean ± standard deviation or median [interquartile range, IQR], whereas categorical variables were expressed in frequencies or percentages.The difference in means between subgroups was determined using a two-sample Student t-test, with a two-tailed P value of less than 0.05 indicating statistical significance.Kaplan-Meier curves with log-rank testing were used to evaluate progression to end-stage kidney disease.Statistical analyses were performed using STATA SE 16 (StataCorp, College Station, TX, USA).

Figure S4 :
Figure S4: Western blot of plasma transferrin of affected individuals with heterozygous R79W-ALG5 from F350 in comparison to individual with glycosylation disorder (mutation in PMM2, CDG type 1a)

b
Intensities: the sums of all individual peptide intensities belonging to a particular protein group.
Describe all statistical methods, including those used to control for confounding 10, Supplementary Methods 2.10 (b) Describe any methods used to examine subgroups and interactions NA (c) Explain how missing data were addressed NA (d) Cohort study-If applicable, explain how loss to follow-up was addressed Case-control study-If applicable, explain how matching of cases and controls was addressed Cross-sectional study-If applicable, describe analytical methods taking account of sampling strategy NA (e) Describe any sensitivity analyses NA Results Participants 13* (a) Report numbers of individuals at each stage of study-eg numbers potentially eligible, examined for eligibility, confirmed eligible, included in the study, completing follow-up, and analysed 11-12 (b) Give reasons for non-participation at each stage NA (c) Consider use of a flow diagram Pedigrees Figures 1 and 2 Give characteristics of study participants (eg demographic, clinical, social) and information on exposures and potential confounders 12 (b) Indicate number of participants with missing data for each variable of interest NA (c) Cohort study-Summarise follow-up time (eg, average and total amount) NA Outcome data 15* Cohort study-Report numbers of outcome events or summary measures over time NA Case-control study-Report numbers in each exposure category, or summary measures of exposure NA Cross-sectional study-Report numbers of outcome events or summary measures 11unadjusted estimates and, if applicable, confounder-adjusted estimates and their precision (eg, 95% confidence interval).Make clear which confounders were adjusted for and why they were included NA (b) Report category boundaries when continuous variables were categorized NA (c) If relevant, consider translating estimates of relative risk into absolute risk for a meaningful time period NA Continued on next page

Table S2 Proteomic analysis identified 198 proteins
a Score: Protein score which is derived from peptide posterior error probabilities; b Intensities: the sums of all individual peptide intensities belonging to a particular protein group.

Table S4 Glycoproteomic analysis identified 154 proteins
Protein score which is derived from peptide posterior error probabilities;