Loading New Genome

From ZENBU documentation wiki
Jump to: navigation, search

Loading genomes and data

Most data load can be performed through the web interface “upload” system. But there are several command-line projects for loading data.

Loading new genomes

In the current version of ZENBU (2.8.2), new genomes must be loaded into mysql databases use a command-line perl script. In future versions, we will be adding genome creation/loading into the upload system. First create a new mysql database to hold the new genome sequence. Genome sequences are often very large (Human is 3billion bases and thus requires 3GB for the mysql database). Please be aware of this. From inside the mysql server CREATE DATABASE zenbu_susScr3_pig; GRANT SELECT, CREATE TEMPORARY TABLES, LOCK TABLES on zenbu_susScr3_pig.* to 'read'@"%"; GRANT SELECT, CREATE TEMPORARY TABLES, LOCK TABLES, INSERT, UPDATE, CREATE, ALTER, DELETE, INDEX on zenbu_susScr3_pig.* to 'zenbu_admin'@"%"; From the command-line cmdline> mysql -hmysql_hostname –uzenbu_admin –pzenbu_admin –P3306 zenbu_susScr3_pig < /zenbu/src/ZENBU_2.8.2/sql/schema.sql cmdline> /zenbu/bin/zenbu_register_peer -url "mysql://zenbu_admin:zenbu_admin@mysql_hostname:3306/zenbu_susScr3_pig" -newpeer From inside the mysql server INSERT INTO `assembly` (`assembly_id`, `taxon_id`, `ncbi_version`, `ucsc_name`, `osc_name`, `release_date`, `taxon_name`, `sequence_loaded`) VALUES (1,9823,'Sscrofa10.2','susScr3','susScr3','2011-09-07','Sus scrofa','y'); INSERT INTO `taxon` (`taxon_id`, `genus`, `species`, `sub_species`, `common_name`, `classification`) VALUES (9823,'Sus','scrofa',NULL,'pig','cellular organisms; Eukaryota; Opisthokonta; Metazoa; Eumetazoa; Bilateria; Deuterostomia; Chordata; Craniata; Vertebrata; Gnathostomata; Teleostomi; Euteleostomi; Sarcopterygii; Dipnotetrapodomorpha; Tetrapoda; Amniota; Mammalia; Theria; Eutheria; Boreoeutheria; Laurasiatheria; Cetartiodactyla; Suina; Suidae; Sus'); The taxon information can be found at the NCBI Taxonomy Browser http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=9825After the database has been created, you can now load the genome sequence. For the above example, Pig genome Sscrofa10.2 can be found at NCBI at this location http://www.ncbi.nlm.nih.gov/assembly/GCF_000003025.5 and!the!actual!sequence!fasta!file!are!here!(by!clicking!the!“GenBank!FTP!site”!link)! ftp://ftp.ncbi.nlm.nih.gov/genbank/genomes/Eukaryotes/vertebrates_mammals/Sus_scrofa/Sscrofa10. 2/Primary_Assembly/assembled_chromosomes/FASTA/ download!the!chrVV.fa.gz!files!into!a!local!directory!on!your!server.!For!example!into!a!directory! /zenbu/genomes/susScr3_pig From!the!command!line Cmdline> /zenbu/bin/eedb_chromChunkTool.pl -url "mysql://zenbu_admin:zenbu_admin@mysql_hostname.gsc.riken.jp:3306/zenbu_susScr3_pig" - assembly susScr3 -seqdir /zenbu/genomes/susScr3_pig/ -withseq -create -store And!finally!modify!the!ZENBU!webVservice!configuration!XML!file.!Add!the!new!genome!into!the!section!

 . . . . . snip . . .
 <seed> mysql://read:read@mysql_hostname.gsc.riken.jp:3306/zenbu_susScr3_pig/</seed>
 . . . . . snip . . .

make sure that the remote connection to the RIKEN ZENBU server is at!the!bottom!of!the!federation! seeds!list.!This!will!ensure!that!the!local!databases!are!used!before!doing!remote!searches!back!to!RIKEN.