Processing sequencing data
This is meant to be a guideline for processing and analysis of most Illumina sequencing data in the Troutman lab. There are many alternative ways to consider processing, quality control, quantification, and differential expression analysis. This workflow is not a gold standard, will not always work for all data, and is open for interpretation/criticism.
General workflow
Verify data checksums and rename following lab convention: <filenameA>_R1.fastq.gz <filenameA>_R2.fastq.gz <filenameB>_R1.fastq.gz <filenameB>_R2.fastq.gz
Group data according to mapping needs, e.g. by genome, paired end and single read, etc.
Run fastqc on the raw fastq files.
module load fastqc/0.11.7
fastqc <data>R1_001.fastq.gz -o <preferred_name>
fastqc <data>R2_001.fastq.gz -o <preferred_name>
- Run Trim Galore on the fastq files. Rerun fastqc on the trimmed fastq files. Trim Galore can automatically do this for you.
module load trimgalore/0.6.6
trim_galore \
--cores 4 \
--paired <data_R1_001.fastq.gz> <data_R2_001.fastq.gz> \
--fastqc \
--output_dir <preferred_path>/
- Align the data with STAR. Have STAR format the alignment as a sorted BAM file.
module load STAR/2.7.9
module load samtools
STAR \
--runThreadN 16 \
--genomeDir /path/to/star/index/ \
--readFilesCommand zcat \
--readFilesIn <data_R1_001.fastq.gz> <data_R2_001.fastq.gz> \
--outFileNamePrefix <preferred_prefix> \
--outSAMtype BAM SortedByCoordinate
- Use HOMER to generate tag directories from the STAR alignment BAM file. Use the
-sspe -flip
options if the data is generated using our lab’s dUTP protocol, which is strand specific and reports on the opposite strand from the original molecule.
module load homer/5.1
makeTagDirectory <tagDirectoryName> <filename.sam> -genome <experiment_genome> \
-checkGC -format sam -sspe -flip
Omit `-sspe` if the sequencing data is not paired end, and omit `-flip` if the sequencing reads will not align to the opposite DNA strand as in "first-strand" library synthesis protocols.
Data quality control
- Use the fastqc results to check basic data quality.
- Check data assumptions and quality using a local install of the Integrative Genomics Viewer. An alternative would be to load a HOMER
makeUCSCfile
into the UCSC Genome Browser. An even better solution is to load HOMERmakeMultiWigHub.pl
tracks to the UCSC Genome Browser using a lab web server. We have a web server for this purpose, but the resource is not functional yet. - Check mapping efficiency using the STAR alignment log file.
- Check other basic quality parameters using the HOMER
tagInfo.txt
,tagLengthDistribution.txt
, andtagCountDistribution.txt
files. These files will be found in the HOMER tag directory. - Check sequence bias using the HOMER
tagFreq.txt
andtagFreqUniq.txt
. These files will also be in the HOMER tag directory. Fastqc also reports on sequencing bias.
Many of these QC parameters can quickly be summarized using MultiQC. This software is not currently available on the BMI cluster but can be run locally. Installation is easy via Docker.
In order to run MultiQc from your local machine, you will need to do one of the following: mount the server folder as a drive on your local machine (good idea), or copy the data from the server to your local machine (bad idea). Instructions for mounting the drives can be found here and here.
Once a drive is mounted and the Docker software is opened, the Multiqc docker image can be run through your terminal application on your local machine as in the example described here and provided below:
docker run -t -v `pwd`:`pwd` -w `pwd` multiqc/multiqc multiqc ./
Another useful tool is alignStats.R
, which generates a table containing selected information from the STAR log file and the HOMER tagInfo.txt
file. This tool is described in more detail below.
Automated data processing
Applying the steps outlined above to a set of new data is cumbersome and prone to human typing errors. The lab has a multi-purpose wrapper script (mapper_wrapper.sh
) for single-read or paired-end RNA-seq data, OR single-read or paired-end ATAC- or ChIP-seq data. However, it is a good idea to learn how to use the above tools in an independent workflow. And to learn how to apply various options as best suited to your data type/assumptions. Becoming over reliant on automated processing and analysis will limit understanding, independence, and creativity.
mapper_wrapper.sh
can be run from the lab’s cluster drive or added as executables to your path. The current script version is located at /data/hpc-troutman/software/scripts/mapper_wrapper.sh
. Let Ty know if you come across bugs or implement improvements.
mapper_wrapper.sh
takes an input list file or directory of *fastq.gz files and generates and submits a series of batch files to the LSF scheduler on the CCHMC BMI High Performance Cluster. Users may use default specified parameters for standard Troutman lab paired-end RNA-seq, or specify options specific to their data type. The submitted batch files automate execution of the following: trim_galore, fastqc, data alignment, sorting and indexing alignment bam files, and tag directory creation.
- This tool should work with any R1_001.fastq.gz file. Samples from the CCHMC core meet this convention.
[sample_name]_S#*L00#*[R/I][1/2]#001.fastq.gz
[sample_name]
is the name of the sample/library as it is submitted to the core in their online submission interface.S#
is the number of the sample in the sequencing run. Each unique sample_name will be assigned an individual sample number for that run. If the same sample is present on a subsequent flow cell run, this sample may or may not have the same sample number assigned to it.L00#
is the lane number of the flow cell.[R/I][1/2]#
is the read number. Libraries that are sequenced with in a ‘Single End’ specification will not contain a Read 2 file. If fastq files are created for the index reads, these will be indicated by ‘I1’ for the i7 index or ‘I2’ for the i5 index.001
The last segment is always 001.fastq.gz
Each file has been compressed with the gzip compression algorithm to conserve space. The CCHMC core meets this convention.
- Other samples may need to be renamed, as in the case-specific example below. It is good practice to triple-check your commands with an
echo
statement before running the final command.
for i in R1.fastq.gz; do echo mv $i ${i/R1_001.fastq.gz/R1.fastq.gz}; done
for i in R2.fastq.gz; do echo mv $i ${i/R2_001.fastq.gz/R2.fastq.gz}; done
- These tools require input of only one
fastq.gz
file per sample per read direction. If you have multiple files*R1_001.fastq.gz
or*R2_001.fastq.gz
file per sample, they can be concatenated usingbash /data/hpc-troutman/software/scripts/combine_fastq.sh
. - This tool generates tag directories specif to lab library chemistries (e.g. lab standard dUTP RNA-seq protocol).
Using mapper_wrapper.sh
Usage on the cluster should be done on the head node and does not require loading module(s).
First, list the data you wish to process into a file. For example:
ls /directory/one/*R1_001.fastq.gz > file.list
ls /directory/one/*R1_001.fastq.gz >> file.list
ls /directory/two/*R1_001.fastq.gz >> file.list
ls /directory/two/*R1_001.fastq.gz >> file.list
# Check the list
cat file.list
Then, for standard lab paired-end RNA-seq using mm39, run the following command:
bash /data/hpc-troutman/software/scripts/mapper_wrapper.sh file.list mm39
For standard lab ATAC-seq or ChIP-seq:
bash /data/hpc-troutman/software/scripts/mapper_wrapper.sh file.list mm39 -a bowtie2
If specifically requiring .sam alignment files:
bash /data/hpc-troutman/software/scripts/mapper_wrapper.sh file.list mm39 -o SAM -k samtools
Additional options:
Usage: mapper_wrapper.sh INPUT GENOME [options]
Required positional arguments:
INPUT Path to file list or directory of *_R1_*.fastq.gz
GENOME Genome key (e.g., mm39)
Options:
-o, --output-format SAM|BAM Output format (default BAM)
-a, --aligner star|bowtie2 Aligner choice (default star)
-r, --read-type auto|single|paired
Read layout detection (default auto)
-s, --subsample N|all Subsample reads (default all)
-k, --skip step1,step2 Skip steps: trim_galore,fastqc,align,samtools,tagdir
-e, --extra-aligner-opts "OPTS" Extra options passed to aligner
--extra-tagdir-opts "OPTS" Extra options passed to makeTagDirectory
--custom-genome DIR Override genome index directory
--load-modules "m1 m2 ..." Replace default list of modules to load
-w, --walltime H:MM Set walltime limit (default 4:00)
-c, --cores N Set number of cores (default 16)
-m, --memory MB Set memory per job (default 40000)
-n, --dry-run Dry run (show but do not submit jobs)
-f, --force Force overwrite of existing outputs -h, --help Show this help message and exit
The script generates and submits individual .bat files to the LSF scheduler. For example:
#!/bin/bash
set -euo pipefail
#BSUB -J Mouse-C57BL6J-BMDM-RNA-KLA-2hr-E00314-GGACTCTG-CAAGGTAG_R1_001
#BSUB -o logs/map_Mouse-C57BL6J-BMDM-RNA-KLA-2hr-E00314-GGACTCTG-CAAGGTAG_R1_001.out
#BSUB -e logs/map_Mouse-C57BL6J-BMDM-RNA-KLA-2hr-E00314-GGACTCTG-CAAGGTAG_R1_001.err
#BSUB -W 2:00
#BSUB -n 16
#BSUB -M 64000
#BSUB -R "rusage[mem=64000]"
#BSUB -R "span[hosts=1]"
module purge >/dev/null 2>&1 || true
module load trimgalore/0.6.6
module load fastqc/0.11.7
module load STAR/2.7.9
module load bowtie2/2.3.4.1
module load samtools
module load homer/5.1
SKIP_STEPS=""
skip(){ [[ " $SKIP_STEPS " == *" $1 "* ]]; }
# Subsample (if requested)
# Trim Galore
if ! skip trim_galore; then
trim_galore --paired --fastqc --cores 8 -o trim_galore "/data/hpc-troutman/archive/221102_CCHMC_SP1/Mouse-C57BL6J-BMDM-RNA-KLA-2hr-E00314-GGACTCTG-CAAGGTAG_R1_001.fastq.gz" "/data/hpc-troutman/archive/221102_CCHMC_SP1/Mouse-C57BL6J-BMDM-RNA-KLA-2hr-E00314-GGACTCTG-CAAGGTAG_R2_001.fastq.gz"
fi
# Alignment (STAR or Bowtie2)
if ! skip align; then
STAR --genomeDir "/data/hpc-troutman/genomes/indexes/star/mm10_starIndex" --readFilesCommand zcat --readFilesIn trim_galore/Mouse-C57BL6J-BMDM-RNA-KLA-2hr-E00314-GGACTCTG-CAAGGTAG_R1_001_val_1.fq.gz trim_galore/Mouse-C57BL6J-BMDM-RNA-KLA-2hr-E00314-GGACTCTG-CAAGGTAG_R2_001_val_2.fq.gz --runThreadN "16" --outFileNamePrefix "align_out/Mouse-C57BL6J-BMDM-RNA-KLA-2hr-E00314-GGACTCTG-CAAGGTAG_R1_001.mm10." --outSAMtype BAM SortedByCoordinate
fi
# SAMtools sort/index (if needed)
if ! skip samtools; then
samtools index "align_out/Mouse-C57BL6J-BMDM-RNA-KLA-2hr-E00314-GGACTCTG-CAAGGTAG_R1_001.mm10.Aligned.sortedByCoord.out.bam"
fi
# TagDir
if ! skip tagdir; then
makeTagDirectory "tagdirs/Mouse-C57BL6J-BMDM-RNA-KLA-2hr-E00314-GGACTCTG-CAAGGTAG_R1_001.mm10" "align_out/Mouse-C57BL6J-BMDM-RNA-KLA-2hr-E00314-GGACTCTG-CAAGGTAG_R1_001.mm10.Aligned.sortedByCoord.out.bam" -checkGC -format sam -sspe -flip
fi
Data processing validation
Runs errors/warnings/success can quickly be summarized using validate_mapping.py
. This tool makes heavy use of hard-coding and may not catch all possible errors/warnings.
usage: validate_mapping.py [-h] -d DIRECTORY -t {mapper,shift,allele}
[{mapper,shift,allele} ...] validate_mapping.py: error: the following arguments are required: -d/--directory, -t/--type
Example output:
tro3nr@bmiclusterp2:/data/hpc-troutman/member/tro3nr/250513_tmp$ validate_mapping.py -d logs/ -t mapper
Summary of errors and completions for mapper logs (sorted by sample name):
map_Mouse-C57BL6J-BMDM-RNA-KLA-2hr-E00314-GGACTCTG-CAAGGTAG_R1_001:
- ✅ Successfully completed.
map_Mouse-C57BL6J-BMDM-RNA-KLA-2hr-E00315-GGAGATTC-GTTCAGAC_R1_001:
- ✅ Successfully completed.
map_Mouse-C57BL6J-BMDM-RNA-PBS-E00306-TTATGTAT-TTCAAGAG_R1_001:
- ✅ Successfully completed.
map_Mouse-C57BL6J-BMDM-RNA-PBS-E00307-TTGGTCCG-TTAACACT_R1_001:
- ✅ Successfully completed.
Summary: 0 Errors, 0 Partial Successes (Chunks), 4 Full Successes
Guide for processing and anlaysis of mouse strains and F1 hybrids
Create strain specific STAR and/or Bowtie2 indexes
Sequencing data from mouse strains should be mapped to the strain specific genome. With MMARGE.pl (from MMARGE: Motif Mutation Analysis for Regulatory Genomic Elements), this requires generating a pseudo genome using strain-specific .vcf files. At present, these are still built on mm10 and awaiting formal release by The Mouse Genomes Project before moving to mm39. Currently available STAR indexes are: aj, akrj, balbcj, c57bl10j, c57bl6j, c57bl6nj, casteij, dba1j, dba2j, fvbnj, hg38, mm10, mm39, nodshiltj, pwkphj, spreteij, wsbeij
. Under development.
Data mapping, shifting, sorting, and tag directories
Instructions copied from: Bennett H, Troutman TD, Zhou E, Spann NJ, Link VM, Seidman JS, et al. Nat Immunol. 2023;24(11):1825-38. PubMed PMID: 37735593; PubMed Central PMCID: PMC10602851.
Sequencing data were assessed for quality using fastqc and unsupervised principal component analysis. ATAC-seq and ChIP–seq data were mapped using Bowtie2, and RNA-seq data were mapped using STAR. ATAC-seq data were trimmed to 30 bp to remove sequencing adapters, which improved mapping efficiency. Strain-specific pseudogenomes for BALB/cJ and A/J cells were generated by replacing invariant positions of mm10 sequence with alleles reported in the Mouse Genome Project strain-specific VCF files. Importantly, this strategy allows for mapping of SNPs and indels but does not consider larger structural variants present in BALB/cJ and A/J mice. While these structural variants may contain regulatory elements, the number of structural variants is two orders of magnitude less than the number of SNPs and indels captured by the pseudogenome alignment strategy. mm10 was used as the C57BL/6J strain-specific genome. Samples from parental strains of mice were mapped to the strain-specific genome. Mapped reads were shifted to the chromosome coordinates of the mm10 genome build using MARGE.pl shift with -ind set to balbcj or aj for reads mapped to the BALB/cJ or A/J genome, respectively.
For samples from CB6F1/J samples, reads were mapped to the mm10 and BALB/cJ genome builds. The BALB/cJ mapped reads were then shifted to the mm10 build with MMARGE, as described above. Perfectly mapped reads spanning genetic mutations between BALB/cJ and mm10 were identified using the MMARGE.pl allele_specific_reads command with -ind set to BALB/cJ and a second time with -ind set to mm10, resulting in two SAM files for each biological sample: one SAM file containing reads perfectly mapped to the mm10 genome that spanned known DNA sequence differences relative to the BALB/cJ genome and a second SAM file containing reads perfectly mapped to the BALB/cJ genome that spanned known DNA sequence differences relative to the reference mm10 genome.
Data should be left as .sam files for downstream processing by MMARGE.pl. For standard RNA-seq libraries, the following serves as an example:
mapper_wrapper.sh file.list pwkphj -o SAM -k samtools, tagdir
We have not generated bowtie2 indexes for these genomes. Further, mapper_wrapper.sh
requires an update before it will work with bowtie2 and mouse strains.
Running the MMARGE.pl apptainer
Get an interactive session on the cluster for testing and load apptainer.
bsub -W 8:00 -n 16 -M 32000 -Is /bin/bash
module load apptainer
apptainer exec --bind /database/mmarge:/database /database/apptainer/MMARGE/marge.sif MMARGE.pl shift
apptainer shell --bind /database/mmarge:/database /database/apptainer/MMARGE/marge.sif
Shifting to the reference with mmarge_shift_wrapper.sh
Data mapped to the strain specific genome is then shifted to the reference using MMARGE.pl shift
, which is run through an apptainer container. This processing step can take 1-2 days per file, depending on the file size. LSF jobs can be generated/submitted using mmarge_shift_wrapper.sh
. This script is still being developed/validated.
/data/hpc-troutman/software/scripts/mmarge_shift_wrapper.sh
Error: Must provide filelist or directory and MMARGE_GENOME.
Usage: /users/tro3nr/.local/bin/mmarge_shift_wrapper.sh <filelist.txt|directory> <MMARGE_GENOME> [-t TYPE] [-w WALLTIME] [-n NCORES] [-m MEMORY] [--dry-run] [--bamify] [--tagdir] [-h]
Positional arguments:
filelist.txt|directory Text file or directory containing SAM/BAM files
MMARGE_GENOME Genome key: one of A_J AKR_J BALB_CJ C57BL_10J C57BL_6J C57BL_6NJ CAST_EIJ DBA_1J DBA_2J FVB_NJ NOD_SHILTJ PWK_PHJ SPRET_EIJ WSB_EIJ
Optional flags:
-t, --type TYPE Data type (default: rna-paired; options: rna-single, atac-chip)
-w, --walltime WALLTIME Wall time (default: 36:00)
-n, --ncores NCORES Number of cores (default: 8)
-m, --memory MEMORY Memory in MB (default: 8000)
--dry-run Show job submission but do not execute jobs
--bamify Convert SAM to BAM and remove SAM files
--tagdir Create tag directories using makeTagDirectory -h, --help Display this help message
Sorting allele specific reads with mmarge_allele_specific_wrapper.sh
Shifted .sam files from F1-hybrid mice can then be sorted into reads spanning variants specific to a parental genome. This is done using MMARGE.pl allele_speicfic_reads
through the apptainer container. This also takes a long time. LSF jobs can be generated/submitted using mmarge_allele_specific_wrapper.sh
. This script is still being developed/validated.
/data/hpc-troutman/software/scripts/mmarge_allele_specific_wrapper.sh
Error: Must provide filelist or directory and MMARGE_GENOME.
Usage: mmarge_allele_specific_wrapper.sh <FILE.LIST|DIRECTORY> <MMARGE_GENOME> [options]
Required arguments:
<FILE.LIST|DIRECTORY> File list of shifted SAMs or directory containing them
MMARGE_GENOME Genome key: one of A_J AKR_J BALB_CJ C57BL_10J C57BL_6J C57BL_6NJ CAST_EIJ DBA_1J DBA_2J FVB_NJ NOD_SHILTJ PWK_PHJ SPRET_EIJ WSB_EIJ mm10-clean
Options:
--allele-2 Second non-reference genome for f1-hybrids not using C57BL/6 (mm10-clean),
one of A_J AKR_J BALB_CJ C57BL_10J C57BL_6J C57BL_6NJ CAST_EIJ DBA_1J DBA_2J FVB_NJ NOD_SHILTJ PWK_PHJ SPRET_EIJ WSB_EIJ mm10-clean
--aligner Aligner used for sam files: (default: star; options: star, bowtie2)
--walltime HH:MM Wall time for job (default: 24:00)
--ncores INT Number of CPU cores (default: 8)
--memory INT Memory in MB (default: 8000)
--type TYPE Data type (default: rna-paired; options: rna-single, atac-chip)
--bamify Convert allele-specific SAMs to sorted BAMs and remove SAMs
--tagdir Generate HOMER tag directories from allele-specific outputs
--dry-run Preview job script generation without submitting
--help Display this help message and exit
Example: mmarge_allele_specific_wrapper.sh align_out A_J -t rna-paired --bamify --tagdir --dry-run
For inbred strains
- Map each data set to the strain specific genome. Use the reference mm10 genome for samples from C57Bl/6J mice.
- Shift data not mapped to mm10 to the reference genome individually or with the wrapper.
apptainer exec --bind /database/mmarge:/database /database/apptainer/MMARGE/marge.sif \
-files <files> -ind <genome, e.g. BALB_CJ> MMARGE.pl shift
mmarge_shift_wrapper.sh <file.list> <genome, e.g. BALB_CJ> --bamify --tagdir --<additional options>
- Create tag directories from the shifted alignment files.
F1-hybrids with C57BL/6J
For F1-hybrids resulting from an inbred strain crossed to C57BL/6J reference (e.g. CB6F1/J):
- Map each data set twice, once to the strain genome and once to the reference genome. Output as SAM.
- Shift alignment files from the non reference genome to the reference
C57BL_6J
genome.
apptainer exec --bind /database/mmarge:/database /database/apptainer/MMARGE/marge.sif \
-files CB6F1J.balbcj.Aligned.out.sam -ind BALB_CJ MMARGE.pl shift
mmarge_shift_wrapper.sh <file.list> <genome, e.g. BALB_CJ>
- Sort out perfectly mapped reads spanning genetic mutations between the parental genomes using each genome.
apptainer exec --bind /database/mmarge:/database /database/apptainer/MMARGE/marge.sif \
\
MMARGE.pl allele_specific_reads -file CB6F1J.balbcj.Aligned.out_shifted_from_BALB_CJ.sam \
-ind BALB_CJ \
-method star
apptainer exec --bind /database/mmarge:/database /database/apptainer/MMARGE/marge.sif \
\
MMARGE.pl allele_specific_reads -file CB6F1J.mm10.Aligned.out.sam \
-ind mm10-clean \
-method star
mmarge_allele_specific_wrapper.sh <list.from.BALBCJ> BALB_CJ --bamify --tagdir --<options>
mmarge_allele_specific_wrapper.sh <list.from.mm10> mm10-clean --bamify --tagdir --<options>
- Create tag directories from the shifted alignment files.
F1-hybrids between two non-reference strains
For F1-hybrids resulting from an inbred strain crossed to another non-reference strain:
- Map each data set twice, once to each parental genome. Output as SAM.
- Shift alignment files from the non reference genome to the reference
C57BL_6J
genome.
apptainer exec --bind /database/mmarge:/database /database/apptainer/MMARGE/marge.sif \
-files AJPWKF1_PHJ.aj.Aligned.out.sam -ind A_J
MMARGE.pl shift apptainer exec --bind /database/mmarge:/database /database/apptainer/MMARGE/marge.sif \
-files AJPWKF1_PHJ.pwkphj.Aligned.out.sam -ind PWK_PHJ MMARGE.pl shift
mmarge_shift_wrapper.sh <list.parent.genome.1> <genome, e.g. BALB_CJ>
mmarge_shift_wrapper.sh <list.parent.genome.2> <genome, e.g. PWK_PHJ>
- Sort out perfectly mapped reads spanning genetic mutations between the parental genomes using each genome.
apptainer exec --bind /database/mmarge:/database /database/apptainer/MMARGE/marge.sif \
\
MMARGE.pl allele_specific_reads -file AJPWKF1_PHJ.aj.Aligned.out.shifted_from_A_J.sam \
-inds A_J, PWK_PHJ \
-method star
apptainer exec --bind /database/mmarge:/database /database/apptainer/MMARGE/marge.sif \
\
MMARGE.pl allele_specific_reads -file AJPWKF1_PHJ.pwkphj.Aligned.out.shifted_from_PWJ_PHJ.sam \
-inds PWK_PHJ, A_J \
-method star
- Create tag directories from the shifted alignment files.
Script locations
Scripts described above can be executed from our cluster drive. Current versions are located at:
/data/hpc-troutman/software/scripts/
You can also symlink these scripts to your bin to avoid typing out the fullname. Make sure they are executable. Or run the below for loop:
for i in combine_fastq.sh mapper_wrapper.sh mmarge_allele_specific_wrapper.sh mmarge_shift_wrapper.sh validate_mapping.py \
-s /data/hpc-troutman/software/scripts/$i ~/.local/bin/ \
do ln $i
chmod +x ~/.local/bin/done
Software list
References
Heinz S, Benner C, Spann N, Bertolino E, Lin YC, Laslo P, et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell. 2010;38(4):576-89. doi: 10.1016/j.molcel.2010.05.004. PubMed PMID: 20513432; PubMed Central PMCID: PMC2898526.
Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9(4):357-9. Epub 20120304. doi: 10.1038/nmeth.1923. PubMed PMID: 22388286; PubMed Central PMCID: PMC3322381.
Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29(1):15-21. Epub 20121025. doi: 10.1093/bioinformatics/bts635. PubMed PMID: 23104886; PubMed Central PMCID: PMC3530905.
Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15(12):550. doi: 10.1186/s13059-014-0550-8. PubMed PMID: 25516281; PubMed Central PMCID: PMC4302049.
Link VM, Duttke SH, Chun HB, Holtman IR, Westin E, Hoeksema MA, et al. Analysis of Genetically Diverse Macrophages Reveals Local and Domain-wide Mechanisms that Control Transcription Factor Binding and Function. Cell. 2018;173(7):1796-809 e17. Epub 20180517. doi: 10.1016/j.cell.2018.04.018. PubMed PMID: 29779944; PubMed Central PMCID: PMC6003872.
Link VM, Romanoski CE, Metzler D, Glass CK. MMARGE: Motif Mutation Analysis for Regulatory Genomic Elements. Nucleic Acids Res. 2018;46(14):7006-21. Epub 2018/06/13. doi: 10.1093/nar/gky491. PubMed PMID: 29893919; PubMed Central PMCID: PMC6101580.
Bennett H, Troutman TD, Zhou E, Spann NJ, Link VM, Seidman JS, et al. Discrimination of cell-intrinsic and environment-dependent effects of natural genetic variation on Kupffer cell epigenomes and transcriptomes. Nat Immunol. 2023;24(11):1825-38. Epub 20230921. doi: 10.1038/s41590-023-01631-w. PubMed PMID: 37735593; PubMed Central PMCID: PMC10602851.
Duttke SH, Guzman C, Chang M, Delos Santos NP, McDonald BR, Xie J, et al. Position-dependent function of human sequence-specific transcription factors. Nature. 2024;631(8022):891-8. Epub 20240717. doi: 10.1038/s41586-024-07662-z. PubMed PMID: 39020164; PubMed Central PMCID: PMC11269187.