Petabyte Explained
A petabyte (derived from the SI prefix peta- ) is a unit of information or computer storage equal to one quadrillion bytes, or 1000 terabytes. It is commonly abbreviated PB. When used with byte multiples, the prefix may indicate a power of either 1000 or 1024, so the exact number may be either:
- 1,000,000,000,000,000 bytes - 10005, or 1015, or
- 1,125,899,906,842,624 bytes - 10245, or 250.
The term "pebibyte", using a binary prefix, has been proposed as an unambiguous reference to the latter value.
Trivia
- IBM Kittyhawk supercomputer designed to run the global-scale Internet, with 32 petabytes (32 PiB) of memory. (February 2008)
- Google processes over 20 petabytes of data per day.[1]
- In Finland all health care information will be stored in a database totaling approximately 500 petabytes in size. The system is scheduled to be complete in 2011.[2]
- Greenplum recently installed an open source based data warehouse with more than 1 petabyte of disk space residing in 48 rackmount Sun Thumper servers to analyze web data for a popular Internet company.[3]
- Microsoft stores on 900 servers a total of approximately 14 petabytes. These are mostly imagery for Microsoft's digital model planet, Virtual Earth. This is part of its web-based geobrowser Live Search Maps. Microsoft has spent at the “couple of hundreds of millions of dollars level” on the acquisition of high-resolution commercial satellite images for Virtual Earth.[4]
- Approximately fifteen petabytes of data will be generated each year in particle physics experiments using CERN’s Large Hadron Collider, due to be launched in May 2008.[5]
- In October 2004, Lawrence Livermore National Laboratory (LLNL) installed over 1 PB of high performance DataDirect Networks storage on BlueGene/L.
- In November 2001, Stanford Linear Accelerator Center Babar Project had stored over 1 petabyte of objects using Objectivity/DB
- The San Diego Supercomputer Center (SDSC) in the USA has a 1-petabyte hard disk store and a 6-petabyte robotic tape store, both attached to the National Science Foundation's TeraGrid network.[6]
- The CERN computer Center (CASTOR) has a 2-petabyte hard disk store and 10-petabyte of data stored on robotic tape store, October, 2007
- The Internet Archive Wayback Machine contains almost 2 petabytes of data and is currently growing at a rate of approximately 20 terabytes per month. (as of May 2006)[7]
- The first commercially-available petabyte Storage Array was launched by the EMC Corporation in January 2006, with an approximate cost of USD 4 million.[8]
- In March of 2005, Teradata announced the world's first single server with roughly 500 gigabytes of storage capable of scaling to a multiserver system that can scale up to approximately 4 petabytes in size for commercial decision support.[9]
- Technicolor Netherlands formerly known as NOB Cross media facilities employs a 7.7-petabyte storage network for the storage of all old and new public television and radio content in digital format. Within the next year, most Dutch public television content will be pulled directly out of this database during broadcast.
- RapidShare has 4.5 petabytes of hard-disk storage for its users.[10]
- As of November 2006, eBay had 2 petabytes[11] of data.
- As of January 2006, the Climate Prediction Network distributed computing experiment which aims to run thousands of cycles of modelled climate change to predict future patterns is producing 2-3 petabytes of data. This project is conducted via the computing power of thousands of home users whose computers crunch numbers in their spare, 'idle' time.
- On February 24, 2006, DataDirect Networks announced they will provide 1 petabyte of networked storage for Europe's fastest Supercomputer, Tera10, at Commissariat à l'Énergie Atomique.[12]
- Managed Storage Services offering in IBM Global Services manages more than two petabytes for IBM customers around the world.[13]
- GridKa (The European Tier1 in Karlsruhe/Germany) plans to extend its disk capacity to 4.2 petabytes for the LHC datastream.
- Indiana University announced on April 5 2006 that it is acquiring the nation's fastest university-owned supercomputer and largest disk-based research storage facility. This new supercomputer will be connected to more than 1 petabyte of high-speed disk storage. This includes DataDirect Networks high-performance storage and will be by far the largest of its type of university-owned storage in the United States.[14]
- In 2007, NOAA maintains approximately 1 petabyte of climate data. NOAA expects that their Comprehensive Large Array-data Stewardship System (CLASS) library will hold 20 petabytes of data by 2011, and 140 petabytes by 2020
- Some modern commercial tape libraries, robotically accessed collections of tapes primarily used by large organizations for archiving, store several petabytes of data.[15]
- As of January 16, 2008, Dattebayo Fansubs has received approximately 17 petabytes of total traffic on their bittorrent network from the download of their collective fansubs.[16] The actual amount is larger, as they did not track traffic until July 2004.
- Iron Mountain uses 3 petabytes to back up files for office computers.
- May 2007, Viewpointe check image database reaches 100 billion check images which utilizes more than 15 petabytes of storage.
- Jefferson National Accelerator Facility has a 2 petabyte storage farm used to collect data from experiments on the particle accelerator. The lab is located in Newport News, Virginia.
- According to Arnaud DeBorchgrave writing in the Washington Times (July 29, 2007), the amount of information loaded onto the Internet doubles every six months. According to him, about 627 petabytes moves all over the internet every day. According to his article, this amount of information is several thousand times the entire contents of the Library of Congress, and it happens every day.
- The first petabyte-size relational database: as of August 2007, BMMsoft DataFusion is the first application to store a petabyte of mixed relational and unstructured data (Emails, Documents, Multimedia and Transactions) in unified relational database using single server and single, non-partitioned database image. Data compression was 85% - compressing over a pebibyte (1,024 TB) of data (6 Trillion records) to less than 160 TB of data on disk. This represents 90% data reduction compared to conventional solutions that would need at least 1.5 PB of storage, according to Sun Microsystems who provided the HW platform for the test and Sybase who provided the analytic engine. Verified 90% storage reduction translates directly into 90% reduction in electricity consumption with corresponding 90% reduction in CO2 emission. The entire system eliminates ~5,000 tons of CO2 per year, or over 15,000 tons of CO2 over typical 3-year life of such a large system. The DataFusion application operates in Real-Time with less than 1 second delay between email arrival and visibility in relational database. Loading speed was over 3 million records per second or over 1 TB per hour and corresponds to combined transaction throughput of all world's stock exchanges and all email and IM traffic between approx. 500,000 financial traders, described in audit document .
- One World Data Backup uses a Petabyte of storage for client back ups.
See also
External links
Notes and References
- http://www.niallkennedy.com/blog/2008/01/google-mapreduce-stats.html "Google processes over 20 petabytes of data per day, Niall Kennedy's Blog"
- http://www.tietokone.fi/uutta/uutinen.asp?news_id=31556 Tietokone
- http://www.sun.com/solutions/landing/infrastructure/dwa/index.jsp Sun Data Warehouse Appliance
- http://www.economist.com/research/articlesBySubject/displayStory.cfm?story_id=9719045&subjectID=348909&fsrc=nwl&emailauth=%2527%252A%2520%25226KN%255BXR%2540%2522%253C%250A
- http://www.physorg.com/news101730821.html New start-up schedule for world's most powerful particle accelerator
- Electronics Weekly, December 11, 2002
- http://www.archive.org/about/faqs.php#9 Internet Archive FAQ
- http://www.engadget.com/2006/01/30/emc-rolls-out-4-million-petabyte-array/ EMC rolls out $4 million petabyte array
- http://www.teradata.com/t/page/137685/ Teradata Achieves Strong Growth, Outpacing the Global Market for Relational Database Management Systems - 6/21/2005
- http://rapidshare.com Rapidshare.com
- http://www.addsimplicity.com/downloads/eBaySDForum2006-11-29.pdf eBay Internals
- http://www.prnewswire.com/cgi-bin/stories.pl?acct=109&story=/www/story/02-24-2005/0003072814&edate= DataDirect Will Provide 1 Petabyte of Networked Storage for Europe's Fastest Supercomputer at CEA, Awarded by Bull
- http://www-1.ibm.com/services/us/index.wss/offering/so/a1000380 Managed Storage Services
- http://newsinfo.iu.edu/news/page/normal/3245.html IU to acquire nation’s fastest university-owned supercomputer, largest disk-based storage facility
- http://www.wwpi.com/index.php?option=com_content&task=view&id=506&Itemid=39 ADIC Scalar 10K Tape Library Adds Support For IBM Enterprise Tape Drive Technology
- http://dattebayo.com/t/ Dattebayo Fansubs' BitTorrent Tracker