Zephyrnet Logo

Selective binding of retrotransposons by ZFP352 facilitates the timely dissolution of totipotency network – Nature Communications

Date:

Cell culture and cell lines

HEK293T cells (Pricella, CL-0005) were cultured on 0.12% gelatin pre-coated dishes with DMEM high glucose (HyClone, SH30243.01) supplemented with 10% FBS (Gibco, 10270-106). E14 cells (Cell Search System, E14tg2a) were cultured in DMEM high glucose (Gibco, 11965-084) with 15% FBS (Gibco, 16000-044), 100X NEAA (Gibco, 11140050), 100X L-Glutamine (Gibco, 25030081), 1000X β-Mercaptoethanol (Gibco, 21985023), homemade LIF (10,000X dilution) on gelatin pre-coated plate. The medium was refreshed daily. E14 cells were passaged by incubation with 0.05% trypsin (Beyotime, C0203) for 3 min at 37 °C. ZFP352 inducible mESC line was made by lentiviral integration of the pCW57.1-Zfp352 expression plasmid. Zfp352 coding sequence was cloned under the TET inducible promoter in the pCW57.1 plasmid. Cells were routinely checked for mycoplasma contamination.

Transfection and transduction

Transfection was done using Lipofectamine 2000 (Life Technologies, 11668027) according to protocol. Transfection mix was added to the gelatin pre-coated plate, before cells were seeded. For lentivirus production, HEK293T cells were grown to 70% confluency. The transfer plasmid, pxPAX2, and VSVG were all transfected into HEK293T cells with PEI (Polysciences, 24765-1). Supernatant was harvested 48 h to 72 h later, and virus produced was precipitated using 5X virus concentration solution (Origene, TR30026), resuspended in PBS, and aliquoted before storing at −80 °C. To infect cells with lentivirus, virus was added into the medium with 4 μg/ml polybrene for 24 h. Medium was refreshed the second day to allow the subsequent assays.

RNA extraction and RT-qPCR

Total RNA was extracted by RNAios Plus (Takara, 9109). RNA was reverse-transcribed into cDNA by HiScript II Q RT SuperMix (Vazyme, R223-01). qPCR was performed using ChamQ SYBR qPCR Master Mix (Vazyme, Q311-03) with the Roche LightCycler® 480 (LC480) system.

Luciferase expression assay

The SINE_B1 sequences and MT2_Mm sequence were cloned into pGL4.23 luciferase reporter plasmid containing the SV40 mini-promoter, or the psi-CHECK-2 (Promega, C8021) luciferase reporter plasmid. Luciferase reporter plasmids were transfected into HEK293T cells together with Zfp352 or Dux over-expressing plasmids. The luciferase expression changes were detected by RT-qPCR. Luciferase activity was also measured with the dual luciferase assay system (Promega, E1910) and analyzed on a Synergy2 plate reader (BioTek).

Co-immunoprecipitation

Cells were lysed in Endo-IP buffer with protease inhibitor cocktail (Biotool, B14001) on ice for 30 min, and the lysate was centrifugated at 4 °C for 25 min to remove precipitates. 500 μg total protein was incubated with 5 μg of rabbit IgG or HA antibodies (Sigma, H3663) or Flag antibodies (Sigma, F1804) at 4 °C overnight with rotation. Lysate and antibody mix was incubated with 30 μl protein G Dynabeads (Thermo Fisher Scientific, 10003D) for 2 h at room temperature with rotation. Beads were washed with lysis buffer and eluted. The eluent was analyzed by western blotting. To detect protein ubiquitination, 500 μg total protein was incubated with 5 μg of rabbit IgG or ZSCAN4 antibodies (Millipore, AB3430) at 4 °C overnight with rotation. Lysate and antibody mix were incubated with 30 μl protein G Dynabeads (Thermo Fisher Scientific, 10003D) for 2 h at room temperature with rotation. The eluent was later blotted with ubiquitin antibody (CST, #3936).

Western blotting

Cell lysate in Endo-IP buffer with protease inhibitor cocktail or Co-IP eluted samples were denatured at 95 °C and kept on ice. Protein samples were resolved using SDS-PAGE and transferred onto a polyvinylidene difluoride (PVDF) membrane (Sigma-Aldrich, 3010040001) using a wet tank transfer system. The PVDF membrane was blocked in 10% non-fat milk in PBST for 1 h and incubated the primary antibody in blocking buffer at 4 °C overnight and subsequently secondary antibody conjugated with HRP (Genescrip, A00098) 1:5000 diluted. The membrane was visualized using the Azure C300 system. The primary antibody used included antibody against GAPDH (Abclonal, AC033), HA (Sigma, H3663), Flag (Sigma, F1804), ZSCAN4 (Millipore, AB3430), ubiquitin (CST, #3936) and ZFP352, and all primary antibodies were diluted at 1:2000.

Flow cytometry

mESCs were digested into single cells by 0.05% trypsin at 37 °C and fixed with 4% PFA for 30 min. Upon resuspended in pre-cold PBS, the suspensions were filtered through 40 µm strainer and analyzed on BD LSRFortessaTM Cell Analyser. The FACS data were analyzed using FlowJo X V10, with gating strategies illustrated in Supplementary Fig. 7.

CRISPRi and CRISPRa

For the CRISRi and CRISPRa system, sgRNAs designed from MT2_Mm consensus sequences in Dfam were used and cloned into dCAS9-VP160 and dCAS9-KRAB vectors modified from Addgene #48240. The sgRNA sequences used in this study were:

CAGCTGTGAATGGAAGTCCA

ATTTATTGATGACTTACAGT

CACCAGTGACCCTTATCTGG

dCAS9-VP160 or dCAS9-KRAB plasmid with sgRNAs were transduced into E14 cells and empty vectors were used as control.

Embryo retrieval and microinjection

The 8–10-week-old female ICR mice were purchased from Shanghai SLAC Laboratory Animal Co., Ltd. All the mice were housed under the SPF environment with a 12 h light-dark cycle, and had a temperature of 22–24 °C with 50–60% humidity. ICR female mice (8–12 weeks old) were super ovulated by i.p. injection of 10 IU pregnant mare’s serum gonadotropin (PMSG, CEN’S, Hangzhou, China) and 48 h later, 10 IU human chorionic gonadotropin (hCG, CEN’S, Hangzhou, China). For in vivo produced embryos, females were then mated with ICR male mice (10–18 weeks old). The zygotes were collected from dissected oviducts at 0.5 dpc. and put into KSOM medium (ARK Resource Co., Ltd.). Zhejiang University (China) provided the guidance for the animal research protocol with the ethical approval number as ZJU20230182.

For siRNA injections, the in vivo collected zygotes were randomly allocated into four groups: siZfp352, siMERVL, siNC (scrambled control), and uninjected control. The siRNA sequences were as shown below.

siZfp352: CCAUUUGAGAACACUUCUUTT; GCUCCAUAUGUGGGUGAAUTT; GGUUCUACGCUUGUCCCUUTT

siMERVL: GAAGAUAUGCCUUUCACCAGCUCUA

siNC: UUCUCCGAACGUGUCACGUTT

60 μM of siRNA and were injected into the cytoplasm of zygotes using microinjection glass capillary (BOROSILICATE GLASS, ITEM#: BF100-78-15). Injected embryos were cultured in KSOM medium under mineral oil in 35 mm petri dishes (Corning Life Sciences, 430165) in the humidified multi-gas incubator (5% O2, 6% CO2, and 89% N2) at 37 °C. Embryo development were recorded for a total of four days after microinjection.

Real-time qPCR and RNA-seq for mouse embryos

Total RNA was isolated and pooled from 10 embryos. cDNA conversion was performed as previously described57. Embryos were obtained and placed into a 0.2-ml thin-walled PCR tube containing 2 µl of cell lysis buffer (with 5% RNase inhibitor and 95% Triton X-100), 1 µl of 10 µM oligo-dT primer and 1 µl of dNTP mix (10 mM each; Fermentas, R0192). The mix was quickly vortexed and then span down at 700 × g for 10 s at room temperature) and immediately transferred on ice. The samples were incubated at 72 °C for 3 min and immediately put back on ice. 5.7 µl of the RT mix containing SuperScript II reverse transcriptase (Invitrogen, cat. no. 18064-014), RNAse inhibitor (10U), 5X Superscript II first-strand buffer, 100 mM DTT (Invitrogen, 18064-014), 5 M Betaine (BioUltra ≥99.0%; Sigma-Aldrich, 61962), 1 M MgCl2, 100 µM TSO were mixed by gently pipetting up and down. The sample was incubated at 42 °C for reverse transcription, and then amplified with First-strand reaction KAPA HiFi HotStart ReadyMix (KAPA Biosystems, KK2601). The relative expression levels of Zfp352 and MERVL were measured by RT-qPCR and normalized to Gapdh. The cDNA was used for library construction using Vazyme TruePrep DNA Library Prep Kit V2 for Illumina kit (TD503-01) and subsequently subjected to Illumina platform sequencing with a depth of 20 M reads per sample. More detail about library construction can be found in its user’s manual.

ELISA assay

SINE_B1 DNA probe was synthesized by PCR using biotinylated primer, diluted in 0.1% TBS-T, and coated onto streptavidin plates (Thermo Scientific, 15500). The probe-coated wells were washed and blocked before loaded with different amounts of protein extract from HEK293T cells over-expressing Zfp352 or Zfp352+Dux together. Primary antibody against HA at 1:2000 dilution ratio, and anti-mouse secondary antibody coupled with HRP at 1:5000 dilution ratio was used to detect HA-ZFP352 bound to SINE_B1 probe. The wells were further incubated with OPD-solution (Sangon Biotech A610348) in the dark for 30 min before adding the stop solution. The emission was measured at 492 nm with 650 nm as reference wavelength.

Repeat elements annotation

We downloaded RepeatMasker (rmsk) tracks of mm10 from UCSC genome browser (https://genome.ucsc.edu/). We further filtered TEs with large indels, which are those with length 20% longer or shorter than its consensus sequences (annotated in Repbase). We then generated a new gene annotation file in GTF format by combining filtered TEs with Ensembl gene annotation (version GRCm38.99). The combined GTF file was used in RNA-seq or Chip-seq analysis subsequently.

Single-cell RNA-seq analysis

Re-processing mouse early embryo development scRNA-seq data

In order to quantify the expression of TEs during early embryonic development, we re-processed scRNA-seq data (SMART-Seq) from46 as previously described58. Briefly, the raw files were downloaded from short reads achieve (SRA) database (accession: SRA072494) and mapped to mouse reference genome GRCm38.99 with STAR (v2.7.0e)59. Reads mapped to not more than 2000 loci were retained, and only the best hit was kept (–alignEndsType EndToEnd –winAnchorMultimapmax 2000 –outFilterMultimapNmax 2000 –outSAMprimaryFlag AllBestScore –outSAMmultNmax 1). With such setting, all uniquely mapped reads will be kept. For reads with multiple equivalent best hits of equivalent scores, only one of the hits will be retained. We then used featureCounts (v2.0.0)60 to quantify the number of reads mapped to both gene and TE with parameter (-s 0 –fraction -M -C).

The count matrix generated by featureCounts was loaded into Seurat (v3.0.0)61 for downstream processing, including basic QC, normalization, clustering. We kept cells that expressed over 2000 gene features and filtered out cells with 50% of their RNAs derived from mitochondrial. We then normalized raw counts to counts per 10k reads and identified the top 3000 most variable genes with Seurat function “NormalizeData”, “FindVariableFeatures” and “vst” selection method. The variable genes were scaled and used as input to compute the Principal Components (PCs). Top 50 PCs were selected for computing Uniform Manifold Approximation and Projection (UMAP) with Seurat function “RunUMAP” using default settings. The cell type information was adopted from ref. 46. Stage specific marker genes were identified with Seurat function “FindMarkers” and parameters: “min.pct = 0.2, logfc.threshold = 1, test.use = “wilcox”, max.cells.per.ident = 20”.

The cell type (embryonic stage of cell) marker gene list was defined according to the annotation file from ref. 46. Markers were identified using FindMarker function in Seurat. Average FC > 2 and P value < 0.05 were used as cut-off for defining the final markers, and the marker gene list was used for the following analysis.

Integrative analysis of 2CLC entry and exit process using scRNA-seq data

For the integrative analysis of 2CLC entry and exit process, we downloaded raw fastq files from a study of entry 2CLC induced by Dux overexpression15 (GEO: GSE121459) and from a study of exit from DUX-induced 2CLC45 (GEO: GSE133234). Since these datasets are generated from 10X Genomics platform, we first processed it by STARsolo with the following parameters: (–winAnchorMultimapNmax 2000 –outFilterMultimapNmax 2000 –outSAMprimaryFlag AllBestScore –outSAMmultNmax 1 –limitOutSJoneRead 10000 –limitOutSJcollapsed 3000000 –outSAMattributes NH HI nM AS CR UR CB UB GX GN sS sQ sM –soloType CB_UMI_Simple –soloCBwhitelist 737K-april-2014_rc.txt –soloCBlen 14 –soloUMIstart 15 –soloUMIlen 8 –soloBarcodeReadLength 0 –soloStrand Forward –soloFeatures Gene GeneFull SJ). In addition, the combined GTF, in which TE annotation has been added, was used for quantifying the TEs abundance. Using such settings, we identified cells expressed MT2_Mm or other TEs.

Using the count matrix generated from STARSolo, downstream analysis was done in Seurat. Briefly, we excluded cells with (1) less than 400 UMI/cell and (2) less than 500 or more than 8000 genes detected, and with more than 30% of UMIs derived from mitochondrial RNAs. Raw UMI counts were normalized to counts per 10k UMIs and log-transformed. Top 3000 most variable genes were identified with Seurat function “FindVariableFeatures” and variable selection method “vst” for both datasets, respectively. We then applied “FindIntegrationAnchors” and “IntegrateData” functions to the top 20 CCAs to integrate the two datasets. UMAP of integrated datasets was computed by the “RunUMAP” function with setting “n.neighbors = 5”.

RNA-seq and data analysis

The library was constructed with Truseq RNA Sample Prep Kit v2 (Illumina, RS-122-2001) before being subjected to Illumina platform sequencing with a depth of 20 M reads per sample. Quality control for raw reads was performed by FastQC (v0.11.8)62. The first 10 bp of both paired-end reads were trimmed by cutadapt (v2.9)63. STAR (v2.7.0e)59 was used for aligning reads against mouse reference genome GRCm38.99 (https://ftp.ensembl.org/pub/release-99/fasta/mus_musculus/dna/). Reads with no more than 2000 mapped loci were retained, and only the best hit was kept (–alignEndsType EndToEnd –winAnchorMultimapmax 2000 –outFilterMultimapNmax 2000 –outSAMprimaryFlag AllBestScore –outSAMmultNmax). Quantification for both gene and TE were calculated by FeatureCounts v2.0.060 with parameter using GRCm38.99 GTF file(-s 0 –fraction -M –C, https://ftp.ensembl.org/pub/release-99/gtf/mus_musculus/).

Differentially expressed genes (DEGs) and transposable elements (DE-TEs) were generated by the R package edgeR (v3.30.3)64. Only features with mean TPM greater than 1 in either control or treatment group were retained. DEGs were defined as fold change > 2 and Adjusted P value < 0.05. GO term enrichment for DEGs was conducted with the R package clusterProfiler (v3.12.0)65 (ont = ‘ALL’, pAdjustMethod = ‘BH’, pvalueCutoff = 0.05, qvalueCutoff = 0.05).

The clustering marker gene lists were defined with genes match the following criterion. Firstly the DEGs were filtered out at several different time point after Dux overexpression, and then DEGs with the highest TPM compared with other different time points and also higher than the average TPM of the other time points were defined as Dux overexpression time point clustering specific genes. Overlapping analysis was performed with the R package GeneOverlap (v1.30.0, GitHub – shenlab-sinai/GeneOverlap: R package for testing and visualizing gene list overlaps).

ChIP-seq and data analysis

Cells were harvested and fixed with 1% formaldehyde (Sigma, 47608-250ML-F) for 10 min at RT with rotation. Quenching was done with 0.14 M glycine at room temperature for 10 min. Cells were lysed by ChIP lysis buffer (10 mM Tris-HCl (pH 8.0), 0.25% Triton X-100, 10 mM EDTA, 100 mM NaCl, protease inhibitor cocktail). The genomic DNA was sonicated into short fragments with an average size of 500 bp. The fragmented DNA was incubated with 3 μg HA (CST, 61099) antibody overnight at 4 °C. The mix was subsequently incubated with 30 μl protein G Dynabeads for 2 h at room temperature with rotation. DNA was eluted in elution buffer (50 mM Tris-HCl (pH 8.0), 1 mM EDTA, 1% SDS), treated with proteinase K at 60 °C overnight. DNA was purified by FastPure DNA Extraction Mini Kit (Vazyme Biotech Co. Ltd, DC301) and subjected to qPCR or Illumina sequencing.

We used a community-curated bioinformatics pipelines “nf-core/chipseq” (v1.1.0)66 for the analysis of ChIP-seq data. We took the default setting of the chipseq pipeline but added the “–keep-multi-map” option to retain multiple hits reads so that peaks in TE region can be identified. Briefly, raw fastq files were first fitered by FastQC (v0.11.8)62. TrimGalore (v0.5.0, https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/) was used for trimming adapters. The remaining high-quality reads were mapped to mouse reference genome downloaded from Ensembl (GRCm38.99) by BWA (v0.7.17)67. PCR duplicates were marked by PICARD (v2.19.0, http://broadinstitute.github.io/picard/) and removed by SAMtools (v1.9)68. Then, MACS2 (v2.1.2)69 was used for calling peaks. The narrow peaks with q-value (minimum FDR) < 0.05 were kept as final peaks.

Bigwig tracks were generated using Deeptools (v3.4.3)70 by normalizing to RPKM using binsize of 10 bp. ChIP-seq signals over genomic regions were plotted by Deeptools (v3.4.3)70. For TE annotation of peak region, annotation of different TE families/classes over peak regions (Observed) was obtained by annotatePeaks.pl function in Homer (v4.7)71 by using TE annotation file.

Dux overexpression ChIP-seq raw data were from GEO: GSE955178. ChIP-seq heatmap were plotted using computeMatrix and plotHeatmap commands from DeepTools (v3.4.3)70. K-means clustering method was used to divide the regions into three distinct categories. The resulting clusters are ZFP352 and DUX co-binding region (±500 bp), DUX only binding region without ZFP352 and ZFP352 only binding region without DUX. The Genome browser images of peak regions and read coverage were composed using the Integrative Genomics Viewer (IGV). Peaks were annotated against mm10 with annotation library from UCSC.

The expected TE locations surrounding ZFP352 ChIP peaks were identified by counting the number of the respective TE sub-family copies in binding regions of the ZFP352 ChIP peak region, and the expected TE numbers were computed assuming random distribution of TE sub-family in the respective genomic region. The average length of each TE sub-family was taken into account as well. SINE_B1 specific genes and B1_Mus2, B1_Mus1, B1_Mm specific genes were counted using the BEDtools (v2.29.2)72,73 window command in a window of ±5 kb from the summits of the peaks. MT2_Mm specific genes were counted using the BEDtools window command in a window of ±50 kb from the summits of the peaks because of the total copy numbers of SINE_B1 were much higher than the numbers of MT2_Mm, so different distance criterion was used.

ATAC-seq and data analysis

ATAC-seq was performed as previously described74 using MagicSeq Tn5 DNA Library Prep Kit for Illumina (Magic-Bio, M3141). mESC pellet was treated transposase at 37 °C for 30 min, purified using MinElute PCR Purification Kit (QIAGEN, 28006), and amplified using 1xNEBnext PCR master mix (NEB, M0541S) using custom Nextera PCR primers 1 and 2. Upon purification with MinElute PCR Purification Kit (QIAGEN, 28006). Libraries were subjected to Nova pair-end 150 bp sequencing.

For ATAC-seq data analysis, quality control was performed by FastQC v0.11.862. Tn5 adapter (AGATGTGTATAAGAGACAG) was trimmed by cutadapt v2.963. STAR v2.7.0e59 was used for alignment onto human reference genome GRCm38.99. Reads with maximal 1000 multiple mapped sites and no more than 3 mismatches were retained, and only the best hit was kept (–outFilterMultimapNmax 1000, –outFilterMismatchNmax 3, –outSAMmultNmax 1). By creating STAR index without general feature format file and not allowing intron length (–alignIntronMax 1), the splice junction was neglected. PCR duplicates were removed by Samtools v1.268 rmdup function.

ATAC-seq peaks were defined using MACS2 v2.2.7.175 callpeaks function with default parameters. Bigwig tracks were generated using deeptools v3.4.370 by normalizing to RPKM using binsize of 10 bp. ATAC-seq signals over genomic regions were plotted by deeptools v3.2.170.

Motif analysis

ZFP352 bound SINE_B1 sequences were SINE_B1/Alu peaks which have ZFP352 ChIP-seq peaks in ±5 kb region (n = 4141). ZFP352 bound B1_Mus2 (n = 2150), ZFP352 bound B1_Mus1 (n = 780) and ZFP352 bound B1_Mm (n = 1211) were also obtained with the same method. ZFP352 bound MT2_Mm sequences were MT2_Mm peaks which have ZFP352 ChIP-seq peaks in ±50 kb region (n = 1896). Motif analysis was done by Motif Discovery function from the MEME web site using ZFP352 bound SINE_B1 and ZFP352 bound MT2_Mm bed files. Hexamer analysis was conducted using ZFP352 bound SINE_B1 or ZFP352 bound MT2_Mm sequences. The sequences information of ZFP352 bound SINE_B1 or ZFP352 bound MT2_Mm were generated using the getfasta command from BEDtools (v2.29.2)72,73 with their respective bed file, and the hexamers were counted using Jellyfish (v2.3.0)76 count command, and Jellyfish dump command was used to obtain the statistics from the hexamer count results. The percentage distribution of every hexamer in each ZFP352_bound TE sub-family was calculated as the percentage of this hexamer among all random hexamers from the retrotransposons sub-family copies and plotted together in the heatmap.

Data visualization

Heatmaps of selected genes was plotted using R package pheatmap (v1.0.12) and ComplexHeatmap (v2.0.0)77.

RAD analysis

The RAD analysis was done using the web tool: https://labw.org/rad/docs47. The DEGs from Dux overexpression and Zfp352 overexpression RNA-seq were selected with the following criteria: up-regulated genes with fold change [FC] > 1.5 and Adjusted P value < 0.05; down-regulated genes with fold change [FC] < −1.5 and Adjusted P value < 0.05. The input regions include the DUX or ZFP352 bound peaks revealed from ChIP-seq, MT2_Mm, and SINE_B1 sub-family location coordinates in the GRCm38 (mm10) genome. For submit options, “GRCm38(mm10)” was selected as reference genome, “1000, 500, 200, 100, 50, 25, 20, 15, 10, 5, 0 kb” as the customized peak extend distance. The enrichment scores were calculated by observed over expected distribution frequencies. “Expected” represents the genomic average assuming random distribution. One-sided hypergeometric test was performed to evaluate the statistical significance, and the P value were presented as the following: *P < 0.05, **P < 0.01, ***P < 0.001, ****P < 0.0001.

Statistics and reproducibility

For all the data presented in the figures except for data from Next Generation Sequencing, the experiments were performed at least twice, each time with replicates and statistical analysis were as described in the respective figure legends.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

spot_img

Latest Intelligence

spot_img