Expert Bioinformatics Consultant

I bring 20 years of experience in answering biological questions using bioinformatics in both corporate and academia. I apply my energy and experience to turn your research question into actionable tasks, publication ready figures, and reproducible code. 

I’m very knowledgeable about cancer biology and oncology, proficient in DNAseq, RNAseq, metabolomics analysis and passionate about biology in general. 

I’m also available for partial mentoring, if your graduate student or post-graduate researcher needs to meet with someone to briefly discuss their project.

Services I Provide - Bioinformatics on multiple levels

computer
Dataset analysis

Going from raw data (raw reads in FASTQ files) to identifying mutations (DNA-seq), differentially expressed genes and pathways (RNA-seq/proteomics).  Figures, code and methodology provided.

ads
Interpretation of gene lists

What do the results mean functionally and biologically? Are they linked to a pathway or transcription factors? Using machine learning to dive into your data. Literature review and comparison to published results, including survival.

settings
Software/pipeline development

If you’ve got a piece of software written by someone (in or out of your lab), I’d be happy to modernize and update it for you. Pipeline construction from scratch in Nextflow to fit your cloud or HPC environment.

Dataset Analysis 

My workflow is as follows

  1. Discuss the dataset and biological question(s) with you.
  2. Pick the best method to answer your questions and achieve reults.
  3. Apply this method/pipeline, including developing and expanding it as needed.
  4. QC all steps. Repeat modified analysis if necessary.
  5. Identify groups of genes relevant for your question, and their potential functional role.
  6. Perform a preliminary literature review and suggest additional steps.
  7. Provide a preliminary report, including code and raw data.
RNASeq analysis example

After discussing your project, we will identify a biological question or questions. For example: what are the genes differentially expressed between KO tissue and WT tissue? What genes change over the course of a time series? 

At this point, using best practices I will

  • Align the raw reads to the genome, filtering out  irrelevant reads (ribosomal reads, mitochondrial reads).
  • Transform aligned reads to gene expression.
  • Compare differentially expressed genes using relevant statistical models, correcting for multiple testing.
  • Generate lists of differentially expressed genes.
  • Explore the lists fro functional enrichment, including Gene Ontology, Transcription Factor binding sites, metabolic pathways.
  • Perform a preliminary literature review.
  • Send you a preliminary report (see example).
  • All code, raw data, and figures will be provided to you.
Functional Analysis

When identifying a group of genes, such as differentially expressed genes, an important question is “what function do these genes serve?” Functional enrichment attempts to identify over-represented functions in the change.
There are two main approaches (Group enrichment and GSEA), but both result in long lists of enriched functionality, which may include hundreds of terms, see example results. These results which will be summarized as “an increase in invasion, mesenchymal transition and ECM activity.” See my publication in Cancer Research.

In addition to running on established functional categories, I would try and find published lists of genes that are relevant to your functional question (like immune cell types, for example), and see if your group of genes is enriched in them.

Software Development

At Neogenomics, I was one of the lead bioinformaticians developing the computational pipeline for the PanTracer series of assays, broad, next‑generation sequencing panels for pan‑solid tumor indications. As part of this development, I co-ordinated with biologists and clinicans to define requirements, developed algorithms from scratch, including test cases. The pipeline is comprised of a series of tasks running on cloud computting, where development included improving the algorithm and the overall pipeline. 

I am experienced in Nextflow, which is a framework for creating pipelines that include multiple steps . Nextflow runs on cloud computing (AWS/Azure/Google) and HPC (High Performance Computing) clusters, and I am proficient with both. I can develop a pipeline and review or update existing code to suit your needs.

About Me

I love using computational tools to solve biological problems and questions.

Demonstrated success in research, software development, and collaborations to identify clinically meaningful insights.

I have experience in both academic and corporate positions as a bioinformatician, and an academic publication history. I’m an expert in creating pipelines and processing data using multiple programming languages (R, Python, MATLAB, PERL, UNIX shell) while complying with regulations for sensitive patient data and creating documented, reproducible code.

I’m an expert at cloud computing (AWS) and using High-Performance Computing (HPC) clusters. Proficient in creating complex pipelines using workflow managers like WDL, Nextflow, luigi.

In my time at McGill University, I discovered  I love helping solve interesting biological questions in a research environment.

In my corporate role, I was responsible for developing genomic pipelines using Python/R, AWS cloud, and Docker. This development was done coordinating with biologists, medical scientists and bioinformaticians on multiple levels, so I am well versed with presenting my work to people from different fields.

In addition to the previously mentioned skills, I mentored junior bioinformaticians in both corporate and academic environments, something I greatly enjoyed.

  • Biological Data Types: DNAseq, copy number estimation, RNA expression (RNAseq, transcriptomics), single cell RNAseq analysis (scRNA), metabolomics
  • Programming languages: Python, R, Perl, MATLAB, Linux/Bash
  • Tools: Amazon Web Services (AWS), workflow languages (WDL, Nextflow), Bitbucket/git version control, containerization (Docker), HPC, Design Control in software development (Codebeamer)
  • Regulatory: IEC 62304 (Medical device software Life Cycle processes).
  • Data Analytics: Multi-omics exploratory data analysis and integration, machine learning, graphical models, statistics
  • Communication: Cross-disciplinary collaboration with biologists, bioinformaticians, and clinicians