Data Sources and their Data Sets

CATHGENE3D
CDD
GO
GO
The Gene Ontology (GO) knowledgebase is the world's largest source of information on the functions of genes. This knowledge is both human-readable and machine-readable, and is a foundation for computational analysis of large-scale molecular biology and genetics experiments in biomedical research.
HAMAP
InterMine post-processor
InterMine gene-flanking regions
Gene-flanking regions created by the core InterMine post-processor
InterMine intergenic regions
Intergenic regions created by the InterMine core post-processor
InterPro
InterPro data set
InterPro provides functional analysis of proteins by classifying them into families and predicting domains and important sites.
InterPro domain GO annotations
Mapping of GO terms to InterPro entries.
LIS Datastore
/data/v2/LEGUMES/Fabaceae/genefamilies/legume.genefam.fam1.M65K/legume.genefam.fam1.M65K.trees_ML_rooted
LIS gene family phylogenetic tree files
F_IGA1003.gnm1.V9RB
Files in this directory are genome assembly files for Glycine soja F, Chu et al . (2020): https://www.ncbi.nlm.nih.gov/bioproject/?term=prjna561626
F_IGA1003.gnm1.ann1.G61B
The files in this directory originated from figshare. The figshare repostory is considered the primary repository and authoritative; files in this present directory are derived, and may have changes, as noted below. The files here are held as part of the LegumeInfo, SoyBase, and LegumeFederation projects, and are made available here for the purpose of reproducibility of analyses at these sites (e.g. gene family alignments and phylogenies, genome browsers, etc.) and for further use by researchers, as that research extends other analyses at the projects listed above. If you are conducting research on large-scale data sets for this species, please consider retrieving the data from the primary repository. If you use the data in the present directory, please 1) please cite the data appropriately - generally referring to the original publications for this data; and if you make use of any significant modifications in the files (noted below under Transformations where applicable), then please also cite the respective database project(s) related to this directory.
FiskebyIII.gnm1.F177
Files in this diretory are genome assembly files for genome type Fiskeby III, Stupar (2020)
FiskebyIII.gnm1.ann1.SS25
Files in this directory are genome annotation files for Glycine max Fiskeby III, Stupar et al. (2020)
Hefeng25_IGA1002.gnm1.L69T
Files in this directory are genome assembly files for cultivar Hefeng 25 (Hefeng25_IGA1002 in publication; WHFS_GmHF25_1.0 in the GenBank assembly record), Chu et al. (2021): Eight soybean reference genome resources from varying latitudes and agronomic traits.
Hefeng25_IGA1002.gnm1.ann1.320V
Files in this directory are genome annotation files for cultivar Hefeng 25, Chu et al. (2021): Eight soybean reference genome resources from varying latitudes and agronomic traits.
Huaxia3_IGA1007.gnm1.RGGN
Files in this directory are genome assembly files for cultivar Huaxia 3 (Huaxia3_IGA1007; WHFS_GmHX3_1.0 in the GenBank assembly record), Chu et al. (2021): Eight soybean reference genome resources from varying latitudes and agronomic traits.
Huaxia3_IGA1007.gnm1.ann1.LKC7
Files in this directory are genome annotation files for cultivar Huaxia3, Chu et al. (2021): Eight soybean reference genome resources from varying latitudes and agronomic traits.
Jinyuan_IGA1006.gnm1.LXM0
Files in this directory are genome assembly files for cultivar Jinyuan (Jinyuan_IGA1006 in the publication; WHFS_GmJY_1.0 in the GenBank assembly record), Chu et al. (2020); https://www.ncbi.nlm.nih.gov/bioproject/?term=prjna561626
Jinyuan_IGA1006.gnm1.ann1.2NNX
Files in this directory are genome annotation files for cultivar Jinyuan, Chu et al. (2021): Eight soybean reference genome resources from varying latitudes and agronomic traits.
Lee.gnm1.BXNC
Initial genome assembly for Glycine max cultivar Lee. The assembly incorporates Illumina sequence and optical mapping from NRGene. Pseudomolecule anchoring of scaffolds was accomplished using two dense genetic maps as well as synteny comparisons with Glycine max reference assembly for cultivar Williams 82 and Glycine soja PI 483463. This assembly corresponds to quality control round 12 (QC12), Oct 26, 2017
Lee.gnm1.ann1.6NZV
Genome annotations for the Glycine max Lee genome assembly
PI483463.gnm1.YJWS
The aims of this project were to generate a high-quality reference genome assembly for Glycine soja accession PI 483463, which shows high genotypic diversity with respect to elite cultivars from Glycine max (soybean). The assembly incorporates Illumina sequence and optical mapping from NRGene. Pseudomolecule anchoring of scaffolds was accomplished using two dense genetic maps as well as synteny comparisons with the Glycine max reference assembly for cultivar Williams 82 (JGI Glyma.Wm82.a2) and Glycine max cultivar Lee. This assembly corresponds to quality control round 13 (QC13), Nov 7, 2017
PI483463.gnm1.ann1.3Q3Q
Genome annotations for the Glycine soja PI483463 genome assembly
Torkamaneh_Laroche_2017
Original VCF file received from Dr. Davoud Torkamaneh containing 4900192 variants (Single Nucleotide Polymorphisms and Structural Variants) in 102 Canadian soybean accessions.
W05.gnm1.SVL1
Genome assembly files for cultivar W05 from Xie, Chung, et al. (2019)
W05.gnm1.ann1.T47J
Genome annotations for the Glycine soja W05 genome assembly
Wenfeng7_IGA1001.gnm1.L0QH
Files in this directory are genome assembly files for cultivar Wenfeng 7 (Wenfeng7_IGA1001 in publication; WHFS_GmWF7_1.0 in the GenBank assembly redord); Chu et al. (2021): Eight soybean reference genome resources from varying latitudes and agronomic traits.
Wenfeng7_IGA1001.gnm1.ann1.ZK5W
Files in this directory are genome annotation files for cultivar Wenfeng 7, Chu et al. (2021): Eight soybean reference genome resources from varying latitudes and agronomic traits.
Wm82.gnm2.DTC4
Genome assembly
Wm82.gnm2.ann1.RVB6
Genome annotations for the Glycine max Williams82 v02 genome assembly
Wm82.gnm2.ann1.expr.G7ZY
Soybean (Glycine max L.) is a major crop providing an important source of protein and oil, which can also be converted into biodiesel. A major milestone in soybean research was the recent sequencing of its genome. The sequence predicts 69,145 putative soybean genes, with 46,430 predicted with high confidence. In order to examine the expression of these genes, we utilized the Illumina Solexa platform to sequence cDNA derived from 14 conditions (tissues). The result is a searchable soybean gene expression atlas accessible through a browser (http://digbio.missouri.edu/soybean_atlas). The data provide experimental support for the transcription of 55,616 annotated genes and also demonstrate that 13,529 annotated soybean genes are putative pseudogenes, and 1736 currently unannotated sequences are transcribed. An analysis of this atlas reveals strong differences in gene expression patterns between different tissues, especially between root and aerial organs, but also reveals similarities between gene expression in other tissues, such as flower and leaf organs. In order to demonstrate the full utility of the atlas, we investigated the expression patterns of genes implicated in nodulation, and also transcription factors, using both the Solexa sequence data and large-scale qRT-PCR. The availability of the soybean gene expression atlas allowed a comparison with gene expression documented in the two model legume species, Medicago truncatula and Lotus japonicus, as well as data available for Arabidopsis thaliana, facilitating both basic and applied aspects of soybean research.
Wm82.gnm2.ann1.syn.HXNY
Synteny between phavu.Wm82.gnm2 and other species.
Wm82.gnm2.div.Jeong_Moon_2019
VCF file containing genotype informatino for 4234 Korean soybean accessions.
Wm82.gnm2.div.Lam_Xu_2010
VCF file from resequencing 31 wild and cultivated chinese soybean accessions called with respect to version a2 of the soybean reference genome received from Meng Ni, Tin Hang, and Hon-Ming Lam.
Wm82.gnm2.div.dosSantos_Valliyodan_2016
Original VCF file received from Dr. Francismar CorrĂȘa Marcelino-Guimaraes containing genotype information for28 Brazilian soybean cultivars. HapMap file transformed from the VCF file using TASSEL 5.2.
Wm82.gnm2.mrk.1536_USLP1
Genetic markers mapped to glyma.Wm82.gnm1
Wm82.gnm2.mrk.Li_Zhao_2019
Genetic markers mapped to glyma.Wm82.gnm1
Wm82.gnm2.mrk.NJAU355K
Genetic markers mapped to glyma.Wm82.gnm1
Wm82.gnm2.mrk.SLAFseq
Genetic markers mapped to glyma.Wm82.gnm1
Wm82.gnm2.mrk.SoySNP50K
Genetic markers mapped to glyma.Wm82.gnm1
Wm82.gnm2.mrk.SoyaSNP180K
Genetic markers mapped to glyma.Wm82.gnm1
Wm82.gnm2.mrk.various
Genetic markers mapped to glyma.Wm82.gnm1
Wm82.gnm4.4PTR
Genome assembly for Williams 82. The Williams 82 version 4 assembly (Wm82v4) builds on the widely-used assembly version 2, as well as an incremental version 3 that involved incorporation of BAC sequence to fill contig gaps in 2016. The Wm82v2 assembly was primarily Sanger-based, and new gap-filling in v3 and v4 utilized PacBio-based BAC assemblies targeted to gap regions. The Wm82v4 assembly closed 3,626 gaps and added 5,138,978 bp of sequence relative to Wm82v2, increasing the contig N50 from 233.1 kbp to 419.3 kbp.
Wm82.gnm4.ann1.T8TQ
Genome annotations for the Glycine max Williams 82 v4 genome assembly
Wm82_IGA1008.gnm1.5CQQ
Files in this directory are genome assembly files for cultivar Williams 82 (Wm82_IGA1008 in publication; WHFS_GmW82_1.0 in the GenBank assembly record), Chu et al. (2021): Eight soybean reference genome resources from varying latitudes and agronomic traits.
Wm82_IGA1008.gnm1.ann1.FGN6
Files in this directory are genome annotation files for cultivar William 82, Chu et al. (2021): Eight soybean reference genome resources from varying latitudes and agronomic traits.
Zh13.gnm1.N6C8
Files in this directory are genome assembly files for cultivar Zhonghuang 13, Shen et al. (2018): De novo assembly of a Chinese soybean genome
Zh13.gnm1.ann1.8VV3
Files in this directory are genome assembly files for cultivar Zhonghuang 13, Shen et al. (2018): De novo assembly of a Chinese soybean genome
Zh13.gnm2.LV9P
Genome assembly files for the Glycine max Zhonghuang 13 v02 genome assembly
Zh13.gnm2.ann1.FJ3G
Files in this directory are genome assembly files for cultivar Zhonghuang 13, Shen et al. (2019): Update soybean Zhonghuang 13 genome to a golden reference
Zh13_IGA1005.gnm1.FRXQ
Files in this directory are genome assembly files for cultivar Zhonghuang 13 (Zh13_IGA1005 in publication; WHFS_GmZH13_1.0 in the GenBank assembly record), Chu et al. (2021): Eight soybean reference genome resources from varying latitudes and agronomic traits.
Zh13_IGA1005.gnm1.ann1.87Z5
Files in this directory are genome annotation files for cultivar Zhonghuang 13, Chu et al. (2021): Eight soybean reference genome resources from varying latitudes and agronomic traits.
Zh35_IGA1004.gnm1.DBYJ
Files in this directory are genome assembly files for cultivar Zhonghuang 35 (Zh35_IGA1004 in publication; WHFS_GmZH35_1.0 in the GenBank assembly record), Chu et al. (2021): Eight soybean reference genome resources from varying latitudes and agronomic traits.
Zh35_IGA1004.gnm1.ann1.RGN6
Files in this directory are genome annotation files for cultivar Zhonghuang 35, Chu et al. (2021): Eight soybean reference genome resources from varying latitudes and agronomic traits.
legume.genefam.fam1.M65K
LIS gene families
PANTHER
PFAM
PIRSF
PRINTS
PROFILE
PROSITE
Plant Ontology
Plant Ontology
The Plant Ontology (PO) is a community resource consisting of standardized terms, definitions, and logical relations describing plant structures and development stages, augmented by a large database of annotations from genomic and phenomic studies.
Plant Trait Ontology
Plant Trait Ontology
A controlled vocabulary of describe phenotypic traits in plants.
SFLD
SMART
SSF
Sequence Ontology
Sequence Ontology
The Sequence Ontology is a set of terms and relationships used to describe the features and attributes of biological sequence.
Soybean Crop Ontology
Soybean Crop Ontology
A controlled vocabulary to describe crop traits in soybean.
Soybean Growth and Trait Ontology V3.0 revision 1.0
Soybean Growth and Trait Ontology V3.0 revision 1.0
Currently, there are 4 divisions to SOY terms, soybean structural terms (Soybean Structure Ontology), developmental stages (Soybean Developmental Ontology), whole plant development terms (Soybean Whole Plant Growth Stages) and trait terms (Soybean Trait Ontology).
TIGRFAMs
USDA
Legume Federation
InterMine logo
The Legume Information System (LIS) is a research project of the USDA-ARS:Corn Insects and Crop Genetics Research in Ames, IA.
BeanMine | ChickpeaMine | CowpeaMine | LupinMine | PeanutMine | SoyMine | MedicMine | LegumeMine
InterMine © 2002 - 2020 Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EH, United Kingdom