Storage systems are widely used and have played a crucial rule in both consumer and industrial products, for example, personal computers, data centers, and embedded systems. However, such system suffers from issues of cost, restricted-lifetime, and reliability with the emergence of new systems and devices, such as distributed storage and flash memory, respectively. Information theory, on the other hand, provides fundamental bounds and solutions to fully utilize resources such as data density, information I/O and network bandwidth. This thesis bridges these two topics, and proposes to solve challenges in data storage using a variety of coding techniques, so that storage becomes faster, more affordable, and more reliable.

\r\n\r\nWe consider the system level and study the integration of RAID schemes and distributed storage. Erasure-correcting codes are the basis of the ubiquitous RAID schemes for storage systems, where disks correspond to symbols in the code and are located in a (distributed) network. Specifically, RAID schemes are based on MDS (maximum distance separable) array codes that enable optimal storage and efficient encoding and decoding algorithms. With r redundancy symbols an MDS code can sustain r erasures. For example, consider an MDS code that can correct two erasures. It is clear that when two symbols are erased, one needs to access and transmit all the remaining information to rebuild the erasures. However, an interesting and practical question is: What is the smallest fraction of information that one needs to access and transmit in order to correct a single erasure? In Part I we will show that the lower bound of 1/2 is achievable and that the result can be generalized to codes with arbitrary number of parities and optimal rebuilding.

\r\n\r\nWe consider the device level and study coding and modulation techniques for emerging non-volatile memories such as flash memory. In particular, rank modulation is a novel data representation scheme proposed by Jiang et al. for multi-level flash memory cells, in which a set of n cells stores information in the permutation induced by the different charge levels of the individual cells. It eliminates the need for discrete cell levels, as well as overshoot errors, when programming cells. In order to decrease the decoding complexity, we propose two variations of this scheme in Part II: bounded rank modulation where only small sliding windows of cells are sorted to generated permutations, and partial rank modulation where only part of the n cells are used to represent data. We study limits on the capacity of bounded rank modulation and propose encoding and decoding algorithms. We show that overlaps between windows will increase capacity. We present Gray codes spanning all possible partial-rank states and using only ``push-to-the-top'' operations. These Gray codes turn out to solve an open combinatorial problem called universal cycle, which is a sequence of integers generating all possible partial permutations.

\r\n", "date": "2013", "date_type": "degree", "id_number": "CaltechTHESIS:05312013-123819501", "refereed": "FALSE", "official_url": "https://resolver.caltech.edu/CaltechTHESIS:05312013-123819501", "referencetext": { "items": [ "N. Alon. Combinatorial Nullstellensatz. Combinatorics Probability and Computing, 8(1--2):7--29, 1999.\r\n\r\nM. Blaum, J. Brady, J. Bruck, and J. Menon. An efficient scheme for tolerating double disk failures in RAID architectures. Computers, IEEE Transactions on, 44(2):192--202, 1995.\r\n\r\nM. Blaum, J. Bruck, and A. Vardy. MDS array codes with independent parity symbols. Information Theory, IEEE Transactions on, 42(2):529--542, 1996.\r\n\r\nJ. E. Brewer and M. Gill. Nonvolatile memory technologies with emphasis on flash. Wiley-IEEE, 2007.\r\n\r\nV. Bohossian, A. Jiang, and J. Bruck. Buffer codes for asymmetric multi-level memory. In Information Theory Proceedings (ISIT), 2007 IEEE International Symposium on, 2007.\r\n\r\nA. Barg and A. Mazumdar. Codes in permutations and error correction for rank modulation. Information Theory, IEEE Transactions on, 56(7):3158--3165, 2010.\r\n\r\nA. Bandyopadhyay, G. Serrano, and P. Hasler. Programming analog computational memory elements to 0.2% accuracy over 3.5 decades using a predictive method. In Circuits and Systems (ISCAS), 2005 IEEE International Symposium on, volume 3, pages 2148--2151, May 2005.\r\n\r\nF. Chung, P. Diaconis, and R. Graham. Universal cycles for combinatorial structures. Discrete Mathematics, 110(1-3):43--59, 1992.\r\n\r\nD. Cullina, A. G. Dimakis, and T. Ho. Searching for minimum storage regenerating codes. In Allerton Conference on Control, Computing, and Communication, 2009.\r\n\r\nP. Corbett, B. English, A. Goel, T. Grcanac, S. Kleiman, J. Leong, and S. Sankar. Row-diagonal parity for double disk failure correction. Proc. of the 3rd USENIX Symposium on File and Storage Technologies (FAST '04), pages 1--14, 2004.\r\n\r\nV. R. Cadambe, C. Huang, and J. Li. Permutation code: optimal exact-repair of a single failed node in MDS code based distributed storage systems. In Information Theory Proceedings (ISIT), 2011 IEEE International Symposium on, 2011.\r\n\r\nV. R. Cadambe, C. Huang, J. Li, and S. Mehrotra. Polynomial length MDS codes with optimal repair in distributed storage systems. In Proceedings of 45th Asilomar Conference on Signals Systems and Computing, 2011.\r\n\r\nCisco. The zettabyte era. Visual Networking Index (VNI), May 2012.\r\n\r\nV. Cadambe and S. Jafar. Tensor product based subspace interference alignment for network coding applications. In Signals, Systems and Computers (ASILOMAR), 2011 Conference Record of the Forty Fifth Asilomar Conference on, 2011.\r\n\r\nV. R. Cadambe, S. A. Jafar, and H. Maleki. Minimum repair bandwidth for exact regeneration in distributed storage. In Wireless Network Coding Conference (WiNC), 2010 IEEE, 2010.\r\n\r\nY. Cassuto, M. Schwartz, V. Bohossian, and J. Bruck. Codes for asymmetric limited-magnitude errors with application to multilevel flash memories. Information Theory, IEEE Transactions on, 56(4):1582--1595, April 2010.\r\n\r\nN. G. de Bruijn and P. Erdos. A combinatorial problem. Koninklijke Netherlands: Academe Van Wetenschappen, 49:758--764, 1946.\r\n\r\nA. Dimakis, P. Godfrey, Y. Wu, M. Wainwright, and K. Ramchandran. Network coding for distributed storage systems. Information Theory, IEEE Transactions on, 56(9):4539--4551, 2010.\r\n\r\nA. G. Dimakis, K. Ramchandran, Y. Wu, and C. Suh. A survey on network codes for distributed storage. Proceedings of the IEEE, 99(3):476--489, 2011.\r\n\r\nE. En Gad, A. Jiang, and J. Bruck. Trade-offs between instantaneous and total capacity in multi-cell flash memories. In Information Theory Proceedings (ISIT), 2012 IEEE International Symposium on, July 2012.\r\n\r\nE. En Gad, M. Langberg, M. Schwartz, and J. Bruck. Constant-weight gray codes for local rank modulation. Information Theory, IEEE Transactions on, 57(11):7431--7442, November 2011.\r\n\r\nH. Finucane, Z. Liu, and M. Mitzenmacher. Designing floating codes for expected performance. In Allerton Conference on Control, Computing, and Communication, 2008.\r\n\r\nF. Farnoud, V. Skachek, and O. Milenkovic. Rank modulation for translocation error correction. In Information Theory Proceedings (ISIT), 2012 IEEE International Symposium on, July 2012.\r\n\r\nP. Gopalan, C. Huang, H. Simitci, and S. Yekhanin. On the locality of codeword symbols. Information Theory, IEEE Transactions on, 58(11):6925--6934, November 2012.\r\n\r\nP. Hall. On representatives of subset. Journal of the London Mathematical Society, 10(1):26--30, 1935.\r\n\r\nC. Huang and L. Xu. STAR: An efficient coding scheme for correcting triple storage node failures. Computers, IEEE Transactions on, 57(7):889--901, 2008.\r\n\r\nG. Isaak. Hamiltonicity of digraphs for universal cycles of permutations. European Journal of Combinatorics, 27:801--805, 2006.\r\n\r\nB. W. Jackson. Universal cycles of k-subsets and k-permutations. Discrete mathematics, 117(1--3):141--150, 1993.\r\n\r\nA. Jiang and J. Bruck. Joint coding for flash memory storage. Information Theory Proceedings (ISIT), 2008 IEEE International Symposium on, 2008.\r\n\r\nA. Jiang, V. Bohossian, and J. Bruck. Floating codes for joint information storage in write asymmetric memories. In Information Theory Proceedings (ISIT), 2007 IEEE International Symposium on, 2007.\r\n\r\nA. Jiang, R. Mateescu, M. Schwartz, and J. Bruck. Rank modulation for flash memories. Information Theory, IEEE Transactions on, 55(6):2659--2673, 2009.\r\n\r\nJ. Johnson. Universal cycles for permutations. Discrete Mathematics, 309(17):5264--5270, 2009.\r\n\r\nA. Jiang, M. Schwartz, and J. Bruck. Error-correcting codes for rank modulation. Information Theory Proceedings (ISIT), 2008 IEEE International Symposium on, 2008.\r\n\r\nA. Jiang, M. Schwartz, and J. Bruck. Correcting charge-constrained errors in the rank-modulation scheme. Information Theory, IEEE Transactions on, 56(5):2112--2120, 2010.\r\n\r\nM. Kendall and J. Gibbons. Rank correlation methods (Fifth Edition). Oxford University Press, NY, 1990.\r\n\r\nD. E. Knuth. The Art of Computer Programming, Volume 3: Sorting and Searching (2nd Edition). Addison-Wesley, 1998.\r\n\r\nD. E. Knuth. The Art of Computer Programming, Volume 4, Fascicle 3. Addison-Wesley, 2005.\r\n\r\nF. R. Kschischang and S. Pasupathy. Some ternary and quaternary codes and associated sphere-packings. Information Theory, IEEE Transactions on, 38(2), 1992.\r\n\r\nM. Kim, J. K. Park, and C. Twigg. Rank modulation hardware for flash memories. In Circuits and Systems (MWSCAS), 2012 IEEE 55th International Midwest Symposium on, August 2012.\r\n\r\nB. H. Marcus, R. M. Roth, and P. H. Siegel. An introduction to coding for constrained systems, 5th Edition. 2001. http://www.math.ubc.ca/ marcus/Handbook/index.html.\r\n\r\nF. MacWilliams and N. Sloane. The theory of error-correcting codes. North Holland Publishing Co., 1977.\r\n\r\nF. Oggier and A. Datta. Self-repairing homomorphic codes for distributed storage systems. In INFOCOM, 2011 Proceedings IEEE, April 2011.\r\n\r\nD. Oren. Solid-state drives reshape the mobile-computing paradigm. Mobile Dev and Design, January 2011. SanDisk.\r\n\r\nD. S. Papailiopoulos, A. Dimakis, and V. R. Cadambe. Repair optimal erasure codes through hadamard designs. In Allerton Conference on Control, Computing, and Communication, 2011.\r\n\r\nD. S. Papailiopoulos, A. G. Dimakis, and V. R. Cadambe. Repair optimal erasure codes through Hadamard designs. In Information Theory Proceedings (ISIT), 2011 IEEE International Symposium on, 2011.\r\n\r\nN. Papandreou, H. Pozidis, T. Mittelholzer, G. Close, M. Breitwisch, C. Lam, and E. Eleftheriou. Drift-tolerant multilevel phase-change memory. In Memory Workshop (IMW), 2011 3rd IEEE International, May 2011.\r\n\r\nI. Reed and G. Solomon. Polynomial codes over certain finite fields. Journal of the Society for Industrial \\& Applied Mathematics, 8(2):300--304, 1960.\r\n\r\nR. Rivest and A. Shamir. How to reuse a write-once memory. Information and control, 55(1):1--19, 1982.\r\n\r\nK. V. Rashmi, N. B. Shah, and P. V. Kumar. Enabling node repair in any erasure code for distributed storage. In Information Theory Proceedings (ISIT), 2011 IEEE International Symposium on, 2011.\r\n\r\nK. V. Rashmi, N. B. Shah, P. V. Kumar, and K. Ramchandran. Explicit construction of optimal exact regenerating codes for distributed storage. In Allerton Conference on Control, Computing, and Communication, 2009.\r\n\r\nF. Ruskey and A. Williams. An explicit universal cycle for the (n-1)-permutations of an n-set. ACM Transactions on Algorithms (TALG), 6(3):45, 2010.\r\n\r\nM. Sathiamoorthy, M. Asteris, D. Papailiopoulos, A. G. Dimakis, R. Vadali, S. Chen, and D. Borthakur. Xoring elephants: Novel erasure codes for big data. Proceedings of the VLDB Endowment, 2013. To appear.\r\n\r\nA. G. Starling, J. B. Klerlein, J. Kier, and E. C. Carr. Cycles in the digraph P(n; k): an algorithm,. Congressus Numerantium, 162:129--137, 2003.\r\n\r\nC. Suh and K. Ramchandran. Exact-repair MDS codes for distributed storage using interference alignment. In Information Theory Proceedings (ISIT), 2010 IEEE International Symposium on, 2010.\r\n\r\nC. Suh and K. Ramchandran. On the existence of optimal exact-repair MDS codes for distributed storage. arXiv:1004.4663, 2010.\r\n\r\nN. B. Shah, K. V. Rashmi, P. V. Kumar, and K. Ramchandran. Explicit codes minimizing repair bandwidth for distributed storage. In IEEE Information Theory Workshop (ITW), 2010.\r\n\r\nN. Silberstein, A. S. Rawat, O. O. Koyluoglu, and S. Vishwanath. Optimal locally repairable codes via rank-metric codes. CoRR, 2013. http://arxiv.org/abs/1301.6331.\r\n\r\nN. Shah, K. Rashmi, P. Vijay Kumar, and K. Ramchandran. Distributed storage codes with repair-by-transfer and nonachievability of interior points on the storage-bandwidth tradeoff. Information Theory, IEEE Transactions on, 58(3):1837--1852, 2012.\r\n\r\nI. Tamo, D. S. Papailiopoulos, and A. G. Dimakis. Optimal locally repairable codes and connections to matroid theory. CoRR, 2013. http://arxiv.org/abs/1301.7693.\r\n\r\nI. Tamo and M. Schwartz. Correcting limited-magnitude errors in the rank-modulation scheme. Information Theory, IEEE Transactions on, 56(6):2551--2560, June 2010.\r\n\r\nI. Tamo, Z. Wang, and J. Bruck. MDS array codes with optimal rebuilding. In Information Theory Proceedings (ISIT), 2011 IEEE International Symposium on, 2011.\r\n\r\nI. Tamo, Z. Wang, and J. Bruck. Access vs. bandwidth in codes for storage. In Information Theory Proceedings (ISIT), 2012 IEEE International Symposium on, 2012.\r\n\r\nI. Tamo, Z. Wang, and J. Bruck. Zigzag codes: MDS array codes with optimal rebuilding. Information Theory, IEEE Transactions on, 59(3):1597--1616, 2013.\r\n\r\nY. Wu and A. G. Dimakis. Reducing repair traffic for erasure coding-based storage via interference alignment. In Information Theory Proceedings (ISIT), 2009 IEEE International Symposium on, 2009.\r\n\r\nY. Wu, A. G. Dimakis, and K. Ramchandran. Deterministic regenerating codes for distributed storage. In Allerton Conference on Control, Computing, and Communication, 2007.\r\n\r\nM. Wojtasiak. There's 1500 free petabytes of cloud storage out there. October 2012. (Seagate) http://storageeffect.media.seagate.com/2012/10/storage-effect/theres-1500-free-petabytes-of-cloud-storage-out-there/.\r\n\r\nZ. Wang, I. Tamo, and J. Bruck. On codes for optimal rebuilding access. In Allerton Conference on Control, Computing, and Communication, 2011.\r\n\r\nY. Wu. Existence and construction of capacity-achieving network codes for distributed storage. In Information Theory Proceedings (ISIT), 2009 IEEE International Symposium on, 2009.\r\n\r\nL. Xu and J. Bruck. X-Code: MDS array codes with optimal encoding. Information Theory, IEEE Transactions on, 45(1):272--275, 1999.\r\n\r\nL. Xu, V. Bohossian, J. Bruck, and D. G. Wagner. Low-density MDS codes and factors of complete graphs. Information Theory, IEEE Transactions on, 45(6):1817--1826, 1999.\r\n\r\nL. Xiang, Y. Xu, J. C. Lui, and Q. Chang. Optimal recovery of single disk failure in RDP code storage systems. ACM SIGMETRICS Performance Evaluation Review, 38(1):119--130, 2010.\r\n\r\nE. Yaakobi, P. Siegel, A. Vardy, and J. Wolf. Multiple error-correcting WOM-Codes. Information Theory, IEEE Transactions on, 58(4):2220--2230, April 2012.\r\n\r\nE. Yaakobi, A. Vardy, P. H. Siegel, and J. Wolf. Multidimensional flash codes. Allerton Conference on Control, Computing, and Communication, 2008.\r\n\r\nH. Zhou, A. Jiang, and J. Bruck. Systematic error-correcting codes for rank modulation. In Information Theory Proceedings (ISIT), 2012 IEEE International Symposium on, July 2012." ] }, "rights": "No commercial reproduction, distribution, display or performance rights in this work are provided.", "collection": "CaltechTHESIS", "reviewer": "Kathy Johnson", "deposited_by": "Zhiying Wang", "deposited_on": "2013-06-03 21:06:07", "doi": "10.7907/TFHZ-RW88", "divisions": { "items": [ "div_eng" ] }, "institution": "California Institute of Technology", "thesis_type": "phd", "thesis_advisor": { "items": [ { "email": "bruck@caltech.edu", "id": "Bruck-J", "name": { "family": "Bruck", "given": "Jehoshua" }, "orcid": "0000-0001-8474-0812", "role": "advisor" } ] }, "thesis_committee": { "items": [ { "email": "bruck@caltech.edu", "id": "Bruck-J", "name": { "family": "Bruck", "given": "Jehoshua" }, "orcid": "0000-0001-8474-0812", "role": "chair" }, { "email": "hassibi@caltech.edu", "id": "Hassibi-B", "name": { "family": "Hassibi", "given": "Babak" }, "orcid": "0000-0002-1375-5838", "role": "member" }, { "email": "effros@caltech.edu", "id": "Effros-M", "name": { "family": "Effros", "given": "Michelle" }, "orcid": "0000-0003-3757-0675", "role": "member" }, { "email": "tho@caltech.edu", "id": "Ho-Tracey", "name": { "family": "Ho", "given": "Tracey C." }, "role": "member" }, { "email": "winfree@caltech.edu", "id": "Winfree-E", "name": { "family": "Winfree", "given": "Erik" }, "orcid": "0000-0002-5899-7523", "role": "member" } ] }, "thesis_degree": "PHD", "thesis_degree_grantor": "California Institute of Technology", "thesis_defense_date": "2013-03-14", "gradofc_approval_date": "2013-06-03", "review_status": "approved", "option_major": { "items": [ "eleceng" ] }, "copyright_statement": "Author's Rights Authorization: I hereby certify that, if appropriate, I have obtained a written permission statement from the owner(s) of each third party copyrighted matter to be included in my thesis, dissertation, or project report, allowing distribution as specified below. I certify that the version I submitted here is the same as that approved by my advisory committee.\n\nI hereby grant to California Institute of Technology or its agents the non-exclusive license to archive and make accessible, under the conditions specified under \"Thesis Availability\" in this submission, my thesis, dissertation, or project report in whole or in part in all forms of media, now or hereafter known. I retain all other ownership rights to the copyright of the thesis, dissertation, or project report. I also retain the right to use in future works (such as articles or books) all or part of this thesis, dissertation, or project report.", "resource_type": "thesis", "pub_year": "2013", "author_list": "Wang, Zhiying" } ]