Group meeting [Sep.28.2007]
Today we discussed the dpGaP data sets.
1. [Tian] Even though we have been granted the access to the data, I don't think the data files are available. We submitted archive request online but the status of those requests has been "preparing". Another clue was for only one of the three studies, the file sizes were known, which was huge.
2. [Yuejing] Two papers were listed as reference to the Major Depression data but only one was available. [Yuejing] found a presentation file from UNC which updated the status of the study. The study screened families that participated in a Netherland study on MDD over the span of 10 years and selected those with concordant twins with extreme high or low trait values, and discordant twins with sharp contrasts (high <-> low). Linkage results from previous studies show little overlap. The sample size described on dpGaP differs slightly from the presentation file.
3. [Chien Hsun Huang] searched for "Nephropathy" and Type I diabetes and found there are not many papers on this topic. Little is known about its genetic component, we can only assume.
A little background: Nephrotic syndrome is a disorder where the kidneys have been damaged, causing them to leak protein from the blood into the urine. It is characterised by proteinuria (>3.5g/ day) hypoalbuminemia, hyperlipidemia and edema. Diabetic nephropathy (nephropatia diabetica), is a progressive kidney
disease caused by angiopathy of capillaries in the kidney glomeruli. It is characterized by nephrotic syndrome and nodular glomerulosclerosis. It is due to longstanding diabetes mellitus, and is a prime cause for dialysis in many Western countries.
The reference paper describes the study data and design without any analysis results. The most important point is that this study has three designs (case/control, case-trio and control-trio) which could be an advantage for results validation but can also cause interpretability difficulty as the authors discussed.
There is one recent paper from Nature that identifies a Type I diabetes gene.
4. [Jun and Tian] The ADHD data might be the first data we can get our hands on. The 2006 Molecular Psychiatry paper outlines the recruitment of subjects. The data came from a multi-center study involves eight countries. This might create some population structure issues. However since this data set consists of only case-parent trios, the impact of population stratification is not of concern. Despite its high heritability, there has not been any identified genes for this disorder. Previous genome scans didn't have much overlapping significant results. In the 2006 paper, candidate genes were study and only nominal significant results were obtained for 7 out 51 genes. The 2006 Behavioral and Brain function paper described the genome scan efforts using the same sample of subject. It outlines the plan of this project. 600,000 tagging SNPs are to be used. Gene-gene interactions should be considered. Gene-environment interaction may play an important role for this disorder but is hard to account for in analysis. One thing we might try first is to construt a candidate gene study first.
5. Next week, we will check the data status. We will also go over some review papers in genetic epidemiology (see the last post) for Jun, Bo, Julia, Chien-Hsun.