on June 21, 2011 by Colin Gillespie in Reviews, Comments (0)

Review of “Analysing microarray data in BioConductor”

This article describes a statistical analysis of the GSE20986 data set. This data set consists of four treatments, with each treatment having three replicates. The post takes the reader through downloading the data, checking the data, and then analysing the data. All R commands are given.

My comments are very minor:

  • The GSE20986 is a large file ~53M. Warn the user?
  • Formatting of R code. When assigning variables, it’s useful to have a space around the assignment operator, i.e. x <- 5
  • Don’t alternate between “<-” and “=” for assignment
  • When the cel files are untarred, I would prefer that they are untarred in a separate directory, say “cel_files/”. You can use the “exdir” argument in the untar function to do this.
  • Change the gunzip for loop to sapply
  • I found the section on “describing the experiment” a bit unclear. Could the phenodata.txt be generated in R or even downloaded from the website.
  • One of the hist commands uses the variable celfiles.exprs. I don’t think this variable is defined.

Overall a useful and comprehensive post.

No Comments

Leave a comment

Login