Help : Gene Alignment Help
Contents
What are the alignment data?
Map positions are assigned to identifiers using the NCBI genome assembly, hg17 NCBI Build 35, accessed through the UCSC genome browser, GoldenPath (May 2004 freeze). Each position is associated with a GenBank accession number. An accession may have 1 to many genomic positions within GoldenPath; however, there are generally 2 mappings, one on the positive strand and one on the negative strand.
The data that are returned to the user for unconsolidated positions include the following:
- Identifier: The input value provided by the user to query the genome
- Accession: The accession number to which the identifier has been mapped. If there are multiple accessions for an identifier, one row for each accession will be returned.
- Target Start: Alignment start position in target chromosome
- Target End: Alignment end position in target chromosome
- Strand: + or - for chromosome strand
- Chromosome: Target sequence name
- Q Start*: Alignment start position in chromosome
- Q Size*: Query sequence size
- In the Start/End fields the coordinates are where it matches from the point of view of the forward strand. For more information, see the UCSC site.
How are clones aligned?
A clone is aligned by first being mapped to its associated GenBank accessions via DBest. These accessions are then mapped to the genome via the UCSC data. Each clone can map to 1 to many GenBank accessions.
How are genes aligned?
A gene is first mapped to a Unigene Cluster ID and then the accessions that map to that cluster are returned. There are generally many accessions mapped to one cluster.
How are the data consolidated?
There are several steps taken when data are returned in a consolidated position.
- The identifier is mapped to Genbank accessions.
- If the group of accessions that is associated with the identifier map to more than 1 chromosome, the data are thrown out. You will need to use the unconsolidated mapping position in order to see all positions for this identifier.
- Next, the largest end position for all the accessions aligned to the identifier is compared with the smallest start position. If this distance is greater than the maximum distance between queries that you have selected (by default this is 1,000,000 bases) then the data are discarded.
- If the data are not discarded, the size that is returned is the distance between the smallest accession start position and the largest accession end position.
- Please note that for most genes, since there are so many accessions that map to a cluster, this option is not recommended. It will generally not return any results due to the fact that one of the accessions might be mapped to a different (potentially erroneous) chromosome. [an error occurred while processing this directive]
