Learning Objectives
Following this assignment students should be able to:
- understand the basic query structure of SQL
- execute SQL commands to select, sort, group, and aggregate data
- use joins to combine tables in SQL
Reading
- Databases Intro
- Basic Queries - Selecting, Filtering, Sorting, Nulls
- Aggregation - Video, Reading
- Joins - Video, Reading
Lecture Notes
Exercises
SELECT (5 pts)
For this and many of the following problems you will create queries that retrieve the relevant information from the Portal small mammal survey database. Download the data. As you begin to familiarize yourself with the database you will need to know some details regarding what is in this database in order to answer the questions. For example, you may need to know what species is associated with the two character species ID or you may need to know the units for the individual’s weight. This type of information associated with data is called metadata and the metadata for this dataset is available online at Ecological Archives.
- Write a query that displays all of the records for all of the fields (
*) in the main table. Save it as a view namedall_survey_data. - We want to generate data for an analysis of body size differences (using
both weight and hind foot length) between males and females of each
species. We have decided that we can ignore the information related to when
and where the individuals were trapped. Create a query that returns all of
the necessary information, but nothing else. Save this as
size_differences_among_sexes_data.
- Write a query that displays all of the records for all of the fields (
WHERE (5 pts)
A population biologist (Dr. Undomiel) who studies the population dynamics of Dipodomys spectabilis would like to use some data from the Portal Project, but she doesn’t know how to work with large datasets. Being the kind and benevolent person that you are, write a query to extract the data that she needs. She wants only the data for her species of interest (
DSin thespecies_idcolumn), when each individual was trapped, and what sex it was. She doesn’t care about the plot the individual was trapped on or the size of the individuals. She also doesn’t need the species codes because you’re only providing her with the data for one species, and since she isn’t looking at the database itself the two character abbreviation would probably be confusing. Save this query as a view with the namespectabilis_population_data.Scrolling through the results of your query you notice that the data on sex is missing for some species. You send Dr. Undomiel a short e-mail* asking what she would like you to do regarding this complexity. Dr. Undomiel asks that you create two additional queries so that she can decided what to do about this issue later. Add a query that retrieves the same data as above, but only for cases where the sex is known to be male, and an additional query with the same data, but only where the sex is known to be female. Save these as views with the names
spectabilis_population_data_malesandspectabilis_population_data_females.*Short for elven-mail
[click here for output] [click here for output] [click here for output]ORDER BY (5 pts)
The graduate students that work at the Portal site are hanging out late one evening drinking… soda pop… and they decide it would be an epically awesome idea to put together a list of the 100 largest rodents ever sampled at the site. Since you’re the resident computer genius they text you, and since you’re up late working and this sounds like a lot more fun than the homework you’re working on (which isn’t really saying much, if you know what I’m saying) you decide you’ll make the list for them.
The rules that the Portal students have come up with (and they did spend a sort of disturbingly long time coming up with these rules; I guess you just had to be there) are:
- The data should include the
species_id,year, and theweight. These columns should be presented in this order. - Individuals should be sorted in descending order with respect to mass.
- Since individuals often have the same mass, ties should be settled by
sorting next by
hindfoot_lengthand finally by theyear.
Since you need to limit this list to the top 100 largest rodents, you’ll need to add the SQL command
[click here for output]LIMIT 100to the end of the query. Save the final query as100_largest_individuals.- The data should include the
DISTINCT (5 pts)
Write a query that returns a list of the dates that mammal surveys took place at Portal with no duplicates. Save it as
[click here for output]dates_sampled.Missing Data (5 pts)
Write a query that returns the
[click here for output]year,month,day,species_id, andweightfor every record were there is no missing data in any of these fields. Save it asno_missing_data.GROUP BY (5 pts)
Using GROUP BY, write a query that returns a list of dates on which individuals of the species Dipodomys spectabilis (indicated by the
[click here for output]DSspecies code) were trapped (with no duplicates). Sort the list in chronological order (from oldest to newest). Save it asdates_with_dipodomys_spectabilis.COUNT (10 pts)
Write a query that returns the number of individuals trapped in each year. Count the
[click here for output]species_idcolumn so that you only include cases where an individual was identified to species. Name the count columntotal_abundanceand sort it chronologically. Include the year in the output. Save it astotal_abundance_by_year. There should only be one value for each year since this is a count of the individuals across all species in that year.SUM (10 pts)
Write a query that returns the number of individuals of each species captured in each year (
[click here for output]total_abundance) and thetotal_biomassof those individuals (the sum of theweight). The units for biomass should be in kilograms. Include theyearandspecies_idin the output. Sort the result chronologically by year and then alphabetically by species. Save asmass_abundance_data.Basic Join (10 pts)
Write a query that returns the
[click here for output]year,month, anddayfor each individual captured as well as it’sgenusandspeciesnames. This can be accomplished by joining thespeciestable to thesurveystable using thespecies_idcolumn in both tables. Save this query asspecies_captures_by_date.Multi-table Join (10 pts)
The
plotstable in the Portal database can be joined to thesurveystable by joiningplot_idtoplot_idand thespeciestable can be joined to thesurveystable by joiningspecies_idtospecies_id.The Portal mammal data include data from a number of different experimental manipulations. You want to do a time-series analysis of the population dynamics of all of the species at the site, taking into account the different experimental manipulations. Write a query that returns the
[click here for output]year,month,day,genusandspeciesof every individual as well as theplot_idandplot_typeof the plot they are captured on. Save this query asspecies_plot_data.Filtered Join (10 pts)
You are curious about what other kinds of animals get caught in the Sherman traps used to census the rodents. Write a query that returns a list of the
[click here for output]genus,species, andtaxa(from thespeciestable) for non-rodent individuals that are caught on theControlplots. Non-rodents are indicated in thetaxacolumn of thespeciestable. You are only interested in which species are captured, so make this list unique (only one line for each species). Save this query asnon_rodents_on_controls.Detailed Join (10 pts)
We want to do an analysis comparing the size of individuals on the
[click here for output]Controlplots to theLong-term Krat Exclosures. Write a query that returns theyear,genus,species,weightand theplot_typefor all cases where the plot type is eitherControlorLong-term Krat Exclosure. Be sure to choose only rodents and exclude individuals that have not been identified to species (i.e., exclude species withsp.in the species column). Remove any records where theweightis missing. Save this query assize_comparison_controls_vs_krat_exclosures.Aggregated Join (10 pts)
Write a query that displays the total number of rodent individuals sampled on each
[click here for output]plot_type. Save this query asindividuals_per_plot_type.
