compagen-logo compagen-logo

 

The structure of the AURELIA PROJECT Web Site

The web site consists of the following sections:

Home Front page with a general information about the Aurelia Project.
News Update history.
DataSets Sequence assemblies in FASTA format (files for downloading).
Tools Instruments for working with the DataSets.
   
  Search Retrieve information for a specific gene (search by cluster ID, BLAST hit description, etc.).
  Atlas 454 Compare Gene Expression values in Polyp, Strobila and Ephyra stages.
  Atlas MicroArray Compare gene expression values during the induction of metamorphosis and strobilation.
  Image Library Library of in situ images and gene information.
  SQL Queries Retrieve information from the database using standard SQL language (introduction with examples).
   
BLAST Standard BLAST search.
Help This section.
Contact Whom to contact in case of questions or suggestions.

 

How to use Tools?

  Search Tool
  454 expression Atlas Tool
  MicroArray expression Atlas Tool
  Image Library Tool
  SQL Queries Tool


How to use the Search Tool?

All genes (clusters) are numbered from 1 to 29608. These unique identifiers are referred to as "cluster ID". To get information about the gene just type its cluster ID, for example, "390" and press Search button. In this example we also want to see the DNA sequence of the cluster, predicted peptide sequence and five best BLASTP hits against Acropora digitifera, so we checked the correspondent options.

1

As a result you will get:

 

2

 

From the output you can see that this gene is not expressed in polyp (0 reads), is strongly up-regulated in strobila (2002 reads) and is weakly expressed in ephyra (9 reads). You can also see the corresponding DNA sequence, predicted peptide sequence and BLASTP hits against Acropora digitifera (in this case there are no hits - this gene is Aurelia-specific).

In order to export the result to a file, the button img-doc3 can be used. This will save the results in a CSV file that one can download to the computer and open in Ms Excel, for example.

It is also possible to find genes based on the results of pre-calculated BLAST searches. For example, lets get all the clusters with similarity to beta-catenin. Type "beta-catenin" (with quotes) in the search line and select the checkbox "Show 5 best hits against: Non-redundant NCBI (BlastX)".

3

As a result three clusters with similarity to beta-catenin from Hydractinia and Hydra will be displayed:

4

It is also possible to retrieve information for several genes simultaneously - just separate gene IDs or search terms with spaces.

5

 

How to use the 454 expression Atlas?

In this section the information about gene expression is based on the results of 454 sequencing. Each cluster sequence has been assembled from reads which originate from polyp, strobila and ephyra stages. Contribution of each stage to the final cluster sequence reflects the expression of a gene in a given stage.

Using this tool we can identify stage-specific genes and compare expression in polyp, strobila and ephyra.

Now lets find genes with a more than 500x stronger expression in a strobila than in a polyp. As shown in the example below, you have to select correct radio buttons and to type "500" in the field "for ... folds":

6

As a result you will get 9 genes which fulfil our searching criteria. All of them are strongly expressed in strobila and not expressed or expressed very weakly in a polyp. Some of these genes are also strongly expressed in ephyra.

 

7

You can sort the results in the table according to their values in descending or ascending order. Just click on the column's header. Here the genes (rows) were sorted in the descending order according to their expression in the strobila stage.

 

8

 

You can also filter the results by typing operators (>, <, =, >=, <=), a range (N1..N2) or values and, of course, combine all of them.


For example, now we can only see the genes with >2000 reads in strobila sorted in ascending order:

 

9

 

Now we combine it with an additional filter (Ephyra > 1500):

 

10

 

How to use MicroArray expression Atlas?

The functionality is similar to that in the Expression Atlas. Using this tool we can compare gene expression during temperature induced strobilation. The data are based on microarray hybridization.

 

11

 

How to use Image Library?

Just type the cluster-id number (or the gene name) to see gene related information including in situ images. If the cluster has no images the associated field will be empty.

12

 

Click on an image to see it in a higher resolution.

 

How to run SQL Queries against the database?

As most of the databases in the world, our database uses a special language named SQL (Structured Query Language) to execute queries.

This language allows different operations, but the most common is to retrieve the data. For this purpose SELECT statement is used.

How SELECT works?

Well, it is quite intuitive once you know the basic things.

First it is good to know:

  • The name of the table that you want to query.
  • The structure of the table (fields, data types…)

Let’s start with an example:

  • We want to retrieve all the data included in the table: all_info. The list of the tables available is shown in the drop-down menu Table Info:

 

13

  • Select the option "all_info", and the description of the table will appear:

14


… and scrolling:

15

This basically means that:

The table “all_info” has 13 fields (columns) with the names:

cluster_id   unique cluster identifier (ranges from 1 .. 29608)
cluster_name    ong name of the cluster (rA_c02_1 - re-assembly number 02 cluster 1)
cluster_seq DNA sequence of the cluster
cluster_length    length of the sequence (in nucleotides)
P_reads number of reads in POLYP stage
S_reads  number of reads in STROBILA stage
E_reads number of reads in EPHYRA stage
pep_seq predicted peptide sequence (or No Prediction tag)
pep_length length of the peptide (in amino acids)
score peptide prediction score (given by EstScan)
start_nuc position of the first nucleotide of the ORF
stop_nuc position of the last nucleotide of the ORF

 

The table also indicates the type of the data: numeric (int or smallint) or text (varchar, mediumtext).

  • Now we are going to construct and run our first query (retrieve all the data from a given table).

Just type the following on the query field:

SELECT *
FROM all_info

Which means: Retrieve (SELECT) all the fields (*) FROM table named: all_info.

16

NOTE: The syntax is case insensitive, so “SELECT” is the same as “select” or “SElecT”.
The same happens with the column names and symbols.

 

  • Press the button 17 and a table with the results should appear (If not, check that you have written the query correctly):

18

IMPORTANT: The results are limited to 1000 rows by default. Depending of the query, the results can be very large and memory consuming. Type "0" in the field for using no limit.

 

Saving results as CSV

In order to export the result to a file, the button 19 can be used. This will save the results and the filters applied to the table in a CSV file that one can download to the computer and open in Ms Excel, for example.

Hiding Columns

Question: How can I retrieve only data from columns cluster_id, P_reads, S_reads, E_reads and total_reads, for example?
Answer: Just change the * for the field names separated by commas (,) except the last.

SELECT cluster_id, P_reads, S_reads, E_reads, total_reads
FROM all_info

20

 

Ordering Data

Question: How do I order the data ascendant or descendant?
Answer: There are two possibilities, but with a slightly difference:

  • Click on the column’s header. This will order the values of this column ascendant or descendant, but only affects to the values displayed. Here we sorted all the rows according to the expression level in strobila (see small arrowhead near "S_reads").

21

 

  • Use the clause ORDER BY in your query and use ASC or DESC to indicate the order. This affects to all the values in the table.

SELECT cluster_id, P_reads, S_reads, E_reads, total_reads
FROM all_info
ORDER BY S_reads DESC

22

 

Filtering Data

Question: How do I filter the data?
Answer: There are two possibilities:

  • Use the filter fields on the column header (text or numeric data). But, once time more, it only affects to the values displayed. If your result is greater than the limit specified (1000 by default) it could be some value on the results table that is not showed.

For numeric values the following operators are available:

equal to = N
greater than > N
lesser than < N
lesser or equal <= N
greater or equal >= N
range of values N1 .. N2

23

24

 

  •  Use the clause WHERE and then the condition of your filter (it allows more complex searches). This is the recommended way:

SELECT cluster_id, P_reads, S_reads, E_reads, total_reads
FROM all_info
WHERE S_reads>1000 and E_reads<1000

25

Example with arithmetical and logical operations:

SELECT cluster_id, P_reads, S_reads, E_reads, total_reads
FROM all_info
WHERE (S_reads+1) / (P_reads+1) >= 500

26

SELECT cluster_id, P_reads, S_reads, E_reads, total_reads
FROM all_info
WHERE (S_reads+1) / (P_reads+1) >= 500 and total_reads>2000

27

Adding an ORDER BY:

SELECT cluster_id, P_reads, S_reads, E_reads, total_reads
FROM all_info
WHERE (S_reads+1) / (P_reads+1) >= 500 and total_reads>2000
ORDER BY E_reads DESC

28

There is more info and examples at: http://www.w3schools.com/sql/sql_select.asp. In the same page one can also find help and examples for other statements like WHERE or ORDER BY.

 

Selecting sub-sets of genes

Question: How can I retrieve the list of strobila-specific genes? For example, we want to see only the clusters where more than 80% of all reads originate from the strobila stage. Moreover, we want to retrieve only the clusters where the total number of reads is >=10.

Answer: We have to add additional conditions to our previous query. 80% threshold means that dividing the number of reads in strobila (S_reads) by the total number of reads (total_reads) we should get values >= 0.8. Total read number (total_reads value) should be >=10. So, we need to use a clause WHERE with two conditions:

SELECT cluster_id, P_reads, S_reads, E_reads, total_reads
FROM all_info
WHERE S_reads / total_reads >= 0.8 and total_reads >=10

As a result we will get a list of 345 strobila-specific clusters (see Fuchs et al. Fig. 2B):

29

And now let us get the list of the polyp-specific genes. We need to change just one parameter (P_reads) and our new query will be:

SELECT cluster_id, P_reads, S_reads, E_reads, total_reads
FROM all_info
WHERE P_reads / total_reads >= 0.8 and total_reads >=10

As a result we will get a list of 336 polyp-specific clusters (see Fuchs et al. Fig2B):

30

Getting the list of the ephyra-specific genes is easy now. Here is the corresponding query:

SELECT cluster_id, P_reads, S_reads, E_reads, total_reads
FROM all_info
WHERE E_reads / total_reads >= 0.8 and total_reads >=10

31

 

Working with the microarray data

The advantage of a relational database (like MySQL) is that it allows to work with large data sets and gives absolute flexibility in "asking" question of any level of complexity. It is easy to link different data types together, for example, sequence data with the corresponding expression values, peptide prediction, images and so on. For analysing the data one needs to describe the "question" as a set of mathematical and logical operators (in a similar way like in R, MatLab and similar programs).

In the following examples we will use the table "array_normalized".

This table contains mean signal values from independent experiments ((replicate_1+replicate_2+replicate_3)/3). Mean signal values across the stages (polyp, 14 days 5-Aza-Cytidine, 14 days control, ..., Ephyra) have been normalized based on the expression level of elongation factor-1-alpha (EF1α). Values in the table has not been subjected to logarithmic transformation. Log2 or Log10 transformation is important for presenting data in a form of a heat map, but for comparing expression that operation is not necessary.

The table "array_normalized" contains 12 field (columns):

id_entry unique entry identifier (primary key)
id_oligo unique oligonucleotide name
P_signal expression in POLYP (24h at 10°C)
AZA14_signal expression in POLYP (14 days at 10°C, incubated in 5-Aza-cytidine)
CON14_signal expression in POLYP (14 days at 10°C, DMSO control)
AZA16_signal expression in POLYP (16 days at 10°C, incubated in 5-Aza-cytidine)
CON16_signal expression in POLYP (16 days at 10°C, DMSO control)
ES_signal expression in STROBILA with 1 segment
LS_signal expression in STROBILA with 5 segments
E_signal expression in EPHYRA (freshly detached)
cl_name long name of the cluster (1-RA_1 - cluster 1, rc_8-RA_8 - cluster 8)
cl_id unique cluster identifier (ranges from 1 .. 29608)

 

1) To view all the values from the table type:

SELECT * FROM array_normalized

32

The table with results should appear (if not, please check that the query has been correctly written):

33

IMPORTANT: The results are limited to 1000 rows by default. Depending of the query, the results can be very large and memory consuming. Type "0" in the field for using no limit.


2) To find all the genes where expression in early strobila is 100 times stronger than in a polyp type:

SELECT * FROM array_normalized
WHERE ES_signal / P_signal >= 100
ORDER by cl_id ASC

34

IMPORTANT: Results will be sorted according to the cluster identifiers in ascending order (ORDER by cl_id ASC).
You can also sort the results by clicking on the column's headers.

3) To identify genes which are up-regulated during the temperature induction and might function as a strobilation inducer one will need a bit more complex query with many conditions (now we will describe the hypothetical model in Fig.4A):

SELECT * FROM array_normalized
where (P_signal+AZA14_signal+CON14_signal+AZA16_signal+CON16_signal+ES_signal+LS_signal+E_signal>=100)
and P_signal<50
and CON14_signal>AZA14_signal
and CON16_signal>AZA16_signal
and ES_signal / P_signal>=5
and LS_signal / P_signal>=10
and LS_signal >= 1000
order by cl_id ASC

Here is the short explanation of the query:

1) We want to select genes which are expressed not extremely weak - cumulative expression must be >= 100:

(P_signal+AZA14_signal+CON14_signal+AZA16_signal+CON16_signal+ES_signal+LS_signal+E_signal>=100)

2) and the expression in the polyp stage must be weak:

and P_signal<50

3) now we check that the genes are 5-AZA-cytidin sensitive and the expression increases at cold temperature:

and CON14_signal>AZA14_signal
and CON16_signal>AZA16_signal
and ES_signal / P_signal>=5
and LS_signal / P_signal>=10

4) expression in late strobila must be relatively high:

and LS_signal >= 1000

5) ordering according to the cluster IDs (gene idenifiers):

order by cl_id ASC

As a result we get a list of potential strobilation inducers (27 clusters) represented by the heat map in Fig.4B in Fuchs et al.
Expression dynamics of these genes follows the model represented in the Fig.4A.

35

 

Thus, by using a simple set of commands one can extract a lot of information with nearly unlimited flexibility. It is also possible to combine data from several tables (JOIN statement).

There is more information about databases and SQL at, for example: http://www.w3schools.com/sql/

One can also consult MySQL Reference Manual at: http://dev.mysql.com/doc/refman/5.5/en/index.html

 

Valid XHTML 1.0 Transitional