Welcome to the Ficklin Lab
The Ficklin Lab in the Dept. of Horticulture at Washington State University is a computational dry lab dedicated to the creation of software tools, computational approaches and systems-level models that address basic and applied hypothesis at the molecular-level of agricultural systems.
Areas of Focus
- Development of computational methods for discovery of molecular biosignatures of complex traits in agricultural plants. Methods include machine learning, deep learning and systems-level multiplex networks of multiomics data sets.
- Cyberinfrastructure development supporting transfer, storage, visualization and analysis of large genomics and other "omics" data sets.
Stephen P. Ficklin, Ph.D.
Office: Johnson Hall 153
Phone: (509) 335-4295
Department of Horticulture
Washington State University
PO Box 646414
Pullman, WA 99164-6414
- Ph.D. Plant and Environmental Sciences, Clemson University (2013)
- M.S. Computer Science, Clemson University (2003)
- B.S. Computer Science, Brigham Young University (2000)
Assessment of smoke taint risk in vineyards exposed to smoke from wildfires
Funded by the Washington State Department of Agriculture Specialty Block Program, this project addresses the grape and wine industry's need for methods that assess the risk to grape and wine quality associated with vineyard exposure to smoke from wildfires.
Apple genomes for postharvest fruit quality biomarkers
This project funded by the Washington Tree Fruit Research Commission seeks to develop tools for identification of postharvest biomarkers in apple fruit that assess response to storage conditions and predict risk for disorders or loss of quality.
"Big Data" Tree Crop Cyberinfrastructure
Standards and Cyberinfrastructure that Enable "Big-Data" Driven Discovery for Tree Crop Research is a project funded by the US National Science Foundation (award #1444573) to develop standards and infrastucture for the integration of high quality, curated, phenotypic and genotypic data with geo-location and environmental data. This project will both leverage and coordinate funded efforts to enhance or update tree crop databases (Genome Database for Rosaceae, Citrus Genome Database, TreeGene and Hardwood Genomics Web) to Tripal that will support cross-site communication, adoption of existing standards, and "big data" integration and analysis.
Precision Dairying: Transcriptomics/Phenomics Pilot Project
Funded by a Livestock Health and Food Security internal grant by the College of Veterinary Medicine (CVM) and the College of Agricultural, Human and Natural Resources (CAHNRS) at WSU, the Precision Dairying project seeks to apply high-throughput data collection technologies in transcriptomics and phenomics to identify biomarkers predictive of animal health.
Scientific Data at Scale (SciDAS)
SciDAS is a multi-institutional project funded by the National Science Foundation (award #1659300). The goal for SciDAS is to provide advanced cyberinfrastructure to support the creation of a National-level distributed compute infrastructure for the efficient injection of data and workflows compute environments. The Ficklin Lab is responsible for working with the project team to develop a Systems-Biology use case for large-scale development of gene co-expression networks across the tree of life. The project also contains a Tripal component to integrate Tripal sites with the SciDAS infrastructure. See the official SciDAS home page for more information.
Lentil Transcriptomics/Phenomics Pilot Project
Funded by the College of Agricultural, Human and Natural Resources (CAHNRS) at WSU, the Lentil Emerging Issues Pilot project explores integration of transcriptomic, phenomics and phenotype data using a systems genetic approach to explore new protocols for identification of disease resistance biomarkers in Lentil.
The Tripal Gateway Project is a US National Science Foundation (NSF) Funded (award #1443040) project designed to create infrastructure to support two important needs within the Tripal community: data exchange and big data analysis. Modern sequencing technologies have expanded the need for workflow-based analytics to meet the demands of community expectations. The ability to move data between the community database and the high performance computing cluster is critical for meeting performance expectations. The Tripal Gateway Project attempts to meet these needs through the addition of RESTful web services to Tripal, second, integration of Tripal with Galaxy such that Tripal sites can provide analytical workflows to their users, and third development and exploration of methods to improve data transfer between Tripal sites and computing centers where Galaxy jobs are executed.
PeopleThe Ficklin Lab comprises full time technical and research staff, postdocs, graduate students and undergraduate students. Current members of the Ficklin Lab are listed below in alphabetical order.
Python development for Pynome, and workflow development of massive gene co-expression network construction with SciDAS project. High Performance Computing, Ph.D. in Organic Chemistry
C++, CUDA, OpenCL Developer.
Development of improved computational methods towards network annotation of condition-specific molecular function
Noise reduction strategies, Network & GWAS integration, condition-specific subnetworks. Works in both the Ficklin and Zhang labs.
Sai Oruganti Sai Prakash
Top-down metabolic networks construction for identification of condition-specific interactions and integration with gene expression data.
The Ficklin Lab in the Department of Horticulture at WSU began in July of 2015. The following is a list of peer-reviewed publications with lab members as primary or as co-author since 2015.
- Sherman BT, Burns J, Feltus FA, Smith M, Ficklin SP. GPU Implementation of Pairwise Gaussian Mixture Models for Multi-Modal Gene Co-Expression Networks (2019) IEEE Access. 7, 160845-160857
- Humann JL, Lee T, Ficklin S, Main D. Structural and Functional Annotation of Eukaryotic Genomes with GenSAS (2019) Gene Prediction - Methods in Molecular Biology. vol1962. Humana, New York, NYKollmar M. (eds)
- Spoor S, Cheng CH, Sanderson LA, Condon B, Almasaeed A, Chen M, Bretaudeau A, Rache H, Jung S, Main D, Bett K, Staton M, Wegrzyn JL, Feltus FA, Ficklin SP. Tripal v3: an ontology-based toolkit for construction of FAIR biological community databases (2019) Oxford Database. baz077
- Wegrzyn JL, Staton M, Street N, Main D, Grau E, Herndon N, Buehler S, Falk T, Zaman S, Ramnath R, Richter P, Sun L, Condon B, Almasaeed A, Chen M, Mannapperuma C, Jung S, Ficklin S. . Cyberinfrastructure to improve forest health and productivity: the role of tree databases in connecting genomes, phenomes, and the environment (2019) Frontiers in Plant Science, section Plant Biotechnologyc. 10:813
- Honaas LA, Hargarten HL, Ficklin SP, Hadish JA, Wafula E, Mattheis JP, Rudell DR. Co-expression networks provide insights into molecular mechanisms of postharvest temperature modulation of apple fruit to reduce superficial scald (2019) Postharvest Biology and Technology. 149, 27-41
- Zhang X, Khanal U, Zhao X, Ficklin S. Making sense of performance in in-memory computing frameworks for scientific data analysis: A case study of the spark system (2018) Journal of Parallel and Distributed Computing. 120, 369-382
- Linge CS, Antanaviciute L, Abdelghafar A, Arús P, Bassi D, Rossini L, Ficklin S, Gasic K. High-density multi-population consensus genetic linkage map for peach (2018) PLoS One. 13, e0207724.
- Jung S, Lee T, Cheng CH, Buble K, Zheng P, Yu J, Humann J, Ficklin SP, Gasic K, Scott K, et. al.. 15 years of GDR: New data and functionality in the Genome Database for Rosaceae (2018) Nucleic Acids Research. 47, D1137-D1145
- Harper L, Campbell J, Cannon EKS, Jung S, Poelchau M, Walls R, Andorf C, Arnaud E, Berardini TZ, Birkett C, Cannon S, Carson J, Condon B, Cooper L, Dunn N, Elsik CG, Farmer A, Ficklin SP, et. al. . AgBioData consortium recommendations for sustainable genomics and genetics databases for agriculture (2018) Database. bay088
- Falk T, Herndon N, Grau E, Buehler S, Richter P, Zaman S, Baker EM, Ramnath R, Ficklin S, Staton M, Feltus FA, Jung S, Main D, Wegrzyn JL. Growing and cultivating the forest genomics database, TreeGenes (2018) Database. 2018, 1-11
- Dunwoodie LJ, Poehlman WL, Ficklin SP, Feltus FA. Discovery and validation of a glioblastoma co-expressed gene module (2018) Oncotarget. 9, 10995-11008
- Jung S, Taein L, Cheng CH, Ficklin SP, Yu J, Humann J, Main D. Extension modules for storage, visualization and querying of genomic, genetic and breeding data in Tripal databases (2017) Database . 2017:bax092
- Jung S, Taein L, Cheng CH, Ficklin S, Yu J, Humann J, Main D. Extension modules for storage, visualization and querying of genomic, genetic and breeding data in Tripal databases (2017) Database. 2017:bax092
- Wytko C, Soto B, Ficklin SP . blend4php: a PHP API for galaxy (2017) Database. baw154
- Chen M, Henry N, Almsaeed A, Zhou X, Wegrzyn J, Ficklin S, Staton M. New extension software modules to enhance searching and display of transcriptome data in Tripal databases. (2017) Database. bax052
- Zhang X, Khanal K, Zhao X, Ficklin SP. Making sense of performance in in-memory computing frameworks for scientific data analysis: A case study of the spark system (2017) Journal of Parallel and Distributed Computing. 2017
- Ficklin SP, Dunwoodie LJ, Poehlman WL, Watson C, Roche KE, Feltus FA. Discovering Condition-Specific Gene Co-Expression Patterns Using Gaussian Mixture Models: A Cancer Case Study (2017) Scientific Reports. 7:8617
- Zhang X, Khanal U, Zhao X, Ficklin SP. Understanding software platforms for in-memory scientific data analysis: a case study of the spark system (2016) Proceedings of the IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS). Wuhan, China. Dec 13-16 2016
- Jung S, Lee T, Ficklin S, Yu J, Cheng CH, Main D. Chado Use Case: Storing Genomic, Genetic and Breeding Data of Rosaceae and Gossypum Crops in Chado (2016) Database. 2016:baw010
- Wang Y, Ficklin SP, Wang X, Feltus FA, Paterson AH. Large-scale gene relocations following an ancient genome triplication associated with the diversification of core eudicots (2016) PLoS One. 11(5): e0155637
- Zhang X, Khanal U, Zhao X, Ficklin SP. Understanding Software Platforms for In-memory Scientific Data Analysis: A Case Study of the Spark System (2016) IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS). Wuhan, China. Dec 13-16 2016
- Bassil NV, Davis TM, Zhang H, Ficklin S, Mittmann M, Webster T, et al. Development and preliminary evaluation of a 90 K Axiom® SNP array for the allo-octoploid cultivated strawberry Fragaria × ananassa (2015) BMC Genomics. 16:155
SoftwareThe Ficklin lab actively develops software that implements new approaches for Systems Genetics and the Tripal database platform. A list of these software packages is provided below.
The Accelerated Computational Engine (ACE) is a C++ library that provides a generic interface for construction of analytical tools. It provides a common interface for GPU utilization, visualization using the Qt package, and multi-node execution using OpenMPI. ACE provides an open file format for all output files that supports meta-data and provenance. ACE was created as the base for KINC, but can be used for any scientific application.
blend4php is a PHP library that interacts directly with the Galaxy Project API. This tools was developed for use by the Tripal Galaxy Module, but was designed to be independent to allow anyone with a PHP-based site to directly interact with workflows housed in Galaxy. The blend4php package will allow a site to add, modify and launch workflows, view and download histories, create datasets and more.
GEMmaker is a Nextflow workflow for large-scale gene expression sample processing, expression-level quantification and Gene Expression Matrix (GEM) construction. Results from GEMmaker are useful for differential gene expression (DGE) and gene co-expression network (GCN) analyses. The GEMmaker workflow currently supports Illumina RNA-seq datasets.
The Knowledge Independent Network Construction (KINC) package generates gene co-expression networks using Pearson and Spearman and Mutual Information, employs Random Matrix Theory (RMT) for automated network thresholding and optionally employs Gaussian Mixture Models (GMMs) to identify potential condition-specific gene expression. KINC v3.0 is built off of the Accelerated Computing Engine (ACE)--another Ficklin Lab software product.
Tripal is a toolkit for construction of online biological (genetics, genomics, breeding, etc), community database, and is a member of the GMOD family of tools. Tripal v3 provides by default integration with the GMOD Chado database. Tripal is used by species and clade genome databases all over the world and boasts an active distributed community of open-source developers.
Tripal Galaxy Module
The Tripal Galaxy Module is an extension module for Tripal that integrates a Tripal-based site with the Galaxy Workflow tool. It allows a site to provide workflows to end-users and for site developers to use Galaxy workflows to power computation of complex analytical tools.
Tripal Network Module
The Tripal Network Module serves as an extension to Tripal and provides data management and visualization for biological networks stored in Tripal.
AFS 505: Topics in Computational and Analytical Methods for Scientists
Formerly a Horticulture 503 (Special Topics) course, this course offers:
- Applied computational methods for researchers processing, managing, and analyzing data in scientific and engineering fields.
- Variable-credit (1-6) course with 5-weeks per module and 1 credit per module.
- Select from non-sequential modules to meet program needs.
- General prerequisite is graduate standing in an agricultural, life environmental or economic science, or engineering. Other recommended preparation specific to individual modules.
Modules offered in the Fall
- Data Structures in R
- Data Visualization in R
- Data Wrangling in R
Modules offered in the Spring
- Programming in Python
- Data Analysis with Python
- Computing for Big Data
- Hort 503 (Advanced Topics), Section 1 Spring 2019
- Hort 503 (Advanced Topics), Section 1 Spring 2018
Data Analysis in Systems Biology
This course offers an introduction to approaches for modeling and analysis for systems biology. Topics include
- Review of gene, protein, metabolic, and signaling systems
- Methods for modeling biological systems
- UNIX Basics
- High Performance Computing (HPC) introduction
- Graph theory for network modeling
- Network visualization
Throughout the course students work towards the generation of gene co-expression networks from RNA-seq data they select for organisms and biological functions of their own interest. These networks are constructed using HPC and existing bioinformatics tools.
- Hort 503 (Advanced Topics), Section 2 Fall 2019
- Hort 503 (Advanced Topics), Section 2 Fall 2017
- Hort 503 (Advanced Topics), Section 2 Fall 2016
Join the Lab
Graduate degrees with an emphasis on Systems Genetics and Computational Biology are available with the Ficklin lab through the Department of Horticulture and the Molecular Plant Sciences (MPS) program. Both programs offer world-class graduate-level education. Dr. Ficklin is currently looking for students interested in graduate research both at the M.S. (Horticulture) and Ph.D. levels (Horticulture and MPS). Please contact Dr. Ficklin directly to express interest.
Undergraduate research opportunities are available for motivated students with background in computer programming. If interested, please contact Dr. Ficklin directly.
Research Staff / Postdoctoral Researchers
The Ficklin lab offers full time employment as needed by funded projects for data scientists and software developers. At times, positions are available for Research Associates (with a B.S. or M.S. degree and relevant experience) and Postoctoral Researchers. When available, these positions are posted online at WSU's career website. If you are looking to apply for an existing opening please use that site to apply. If you would like to inquire about potential employment, please contact Dr. Ficklin directly.