Talk:Data Structures

General

This textbook aims to cover what CS majors learn in an introductory data structures and algorithms programming course.


Do we want to do everything for this subject on one page, or make this just a page of links? I like the second idea better.

I prefer dropping new information here until a section gets "too big". Then split off that section onto its own page. See http://CommunityWiki.org/BigBucketsFirst --DavidCary 05:41, 12 Jul 2004 (UTC)

I agree with that article's logic. I'll copy the material from my linked list page back to this page, and work on it here. Later on, maybe we'll span off to other pages aka buckets. --Waxmop

I think the buckets right now are too small. I think it would be easier (because this is a textbook) to divide the book into chapters, and have each chapter be a subpage (if necessary). Anything finer grained than chapters would mean a lot of clicking and it would be easy to miss having a coherent voice in the book (both for contributing authors and for readers). But that's just my own feelings on the matter: I want these to seem like real books instead of just indexes to entries in the wikipedia. Mshonle 09:31, 1 Aug 2004 (UTC)

Format

I'd like to join the project if I may. I'm ready to tackle a couple of these topics. Is there a particular format to use?--Wcrowe 21:34, 1 Jul 2004 (UTC)

I would like to suggest a change in the order of the chapters. I would like to have Graphs moved before Trees, as a tree is simply a special case of a graph. Pesto 04:11, 28 August 2005 (UTC)

I see what you are getting at, but the question is What's the best didactic ordering? After all, linked-lists are just another form of graph, further, they are another form of tree, so why not discuss linked-lists last? But to be serious, I think tree traversal algorithms are easier to understand than graph traversal algorithms. And by also being more limited, trees provide a good introduction to the concept of graphs. MShonle 05:37, 28 August 2005 (UTC)

jump in!

If you write an article, you get to choose the format. Later on, people might improve the format, but at the beginning, we should just get out something and let people work on refining later. --Waxmop 17:15, 4 Jul 2004 (UTC)

We should use header markup (with the equal signs) rather than bullet markup. The header markup builds the table of contents. Also, we should put articles into the Programming:Data_Structures namespace for now. I'm starting a linked list article. --Waxmop 17:22, 4 Jul 2004 (UTC)

I agree: looking at a table of contents gives a better idea of what the book is like, and encourages more modifications and additions. I'm going to change the format to that. Mshonle 09:48, 1 Aug 2004 (UTC)

I dropped the variables and types section.

It links to a javascript page right now that I don't think is relevant. Everyone: remember to use the namespace Programming:Data Structures:your topic for your articles.--Waxmop 22:55, 5 Jul 2004 (UTC)

This is a textbook, therefore, IMO, sections should use a how-to approach. You will immediately see that this differs quite a bit from an encyclopedic approach. I hope it is everyone's intent to teach these subjects, step by step, rather than just create a collection of Wikipedia links.

Page Structure

I'd rather not make a large change without some discussion, but it seems to me that the sorting portion should be moved to the bottom. I kind of visualize this page as three sections:

I wasn't planning on laying out the whole menu. I just meant to put in some examples, but I went further than I intended. I think that this breakdown gives us better control over presenting pros & cons of collection selections (Since the list selections are together and the stack selections are together), and is more consistant.

Primary Storage Types
   Single object allocation (used to implements linked lists, most trees, ...)
   Array (used to implement array list, most stacks, circular queues, ...)
Secondary Storage Types
   Lists
      Common list operations & example
          Set(pos, val), Get(pos), remove(pos), append(val)
      Implemented as a Linked List
          Single linked list
          Double linked list
      Implemented as an Array List
   Stacks
      Common stack operations & example
          Push(val), Pop (, peek?)
      Implemented on a Linked List
      Implemented on an array
   Queues
      Common queue operations & example
           enqueue, dequeue
      Implemented as LinkedList
      Implemented as an array (circular, fixed size queue)   
      Special cases
          Dequeues (What is this used for, by the way?)
   Trees & Graphs (isn't a tree just a special case of a graph?)
      Common tree operations & example
           add, get, remove
      Implemented as links (like linked list)
      Implemented in an array (Heap)
      Special cases
          Binary tree
             Balanced
          Quad tree
          Graphs
   Hash Tables
       ...
How do I choose a data structure for a given situation?
   Sorts
   Big O notation
   Combinations
      ArrayList and Hashtable
   ...

--BillKress 23:06, 5 Nov 2004 (UTC)

Big O Notation should be discussed in front because it is used throughout the rest of the material. You should also include the discussion of Abstract Data Types and information hiding in fron too. -- Gdarwin 06:42, 7 Nov 2004 (UTC)
Hi Bill, sorry for the late reply. That structure looks much better, though I would put trees in a section before graphs, because the traversal concepts like inorder, preorder, and post-order (for binary trees) is really something separate enough. After queues should be heaps, aka priority queues, and then the heap sort. For graphs, depth-first and breadth-first should be covered. Below is what I suggest: MShonle 01:00, 21 Dec 2004 (UTC)
Introduction
  What is Data Abstraction?
The Node
  The most elementary data structure
Arrays
  Matricies: two dimensional arrays
  Insertion sort on an array
  Binary search on a sorted array
List Structures
  Common list operations & example
    set(pos, val), get(pos), remove(pos), append(val)
      [not sure about these operations: what about next()? or iterators?]
      [Perhaps discuss operations on Nodes, versus operations on the List
      itself: example, Nodes have a next operation, but the List itself has
      a pointer to the head and tail node, and an integer for the number of
      elements. This view would work well in both the Lisp and OO worlds.]
  Doubley linked list
  Vectors (resizeable arrays)
Iterators
  Operating on sequences without knowledge of the sequence itself
Stacks and Queues
  Common stack operations & example
    push(val), pop, and peek
  Implemented on a Linked List
  Implemented on an array
  Common queue operations & example
    enqueue, dequeue
  Implemented as LinkedList
  Implemented as an array (circular, fixed size queue)
  Dequeues
Trees
  Common tree operations & example
    add, get, remove
  Binary trees
    in order, pre order, post order traversals
    Balanced binary trees
  Iterators for trees
Min and Max Heaps
  A tree implemented as an array! (not to be confused with the heap in memory)
  Percolation
  Heap sort
  Treaps: heaps where elements can be deleted by name
Graphs
  Directed and Undirected graphs
    matrix and list representations
  Depth-First Search
  Breadth-First Search
  Topological sort
Hash Tables
  Iteration order for hash tables by augmenting the structure
   - iterating over items in the order in which they were inserted
   - iterating over the items based on most-recently-used
Sets
  Different implementations
   - lists, binary search trees, bit arrays, hashtables
  Disjoint Sets (operations: union, find)
Tradeoffs between Data Structures
  Use asymptotic behaviour to decide, most importantly seeing how the
  structure will be used: an infrequent operation does not need to be
  fast if it means everything else will be much faster

The content is designed to compliment the Algorithms text book (the idea is that the reader would cover the DS book first, then the Algo book, and then the Advanced Data Structures and Algorithms book that would pick up the more random and advanced ideas). Most importantly, I'd like to see each book be small enough that it can be read completely in one or two semesters. The more reference-based or encyclopedic content could be saved for either the Advanced book or the wikipedia. MShonle 01:00, 21 Dec 2004 (UTC)

Incoherent Structure

I agree. I find it odd that a lot of list based structures are currently placed under linked lists, when these structures can obviously be impleneted as an array.

Your layout looks good, but I was wondering if we can add a disccusion about big O notation. I see it used here, but where is it formally defined?

Gdarwin 16:13, 29 Oct 2004 (UTC)

Proposed structure has been modified. I'd still like to hear one or two more people say that they prefer this structure before I refactor.

--BillKress 23:04, 5 Nov 2004 (UTC)

Duplication for Array

Currently we have some 2 articles on arrays one here and one in Programming:Arrays. We should merge realy leavin only a short into here and move the rest off.

  • I think the DS book should only talk about the most basic form of arrays, that is, integer-indexed arrays. Many of the other topics currently mentioned, like associative arrays, won't make sense to the reader because they haven't read about hash tables (which use simple arrays and linked-lists) or binary search trees yet. The purpose of our story should be abstraction, abstraction, abstraction, and show how the complex gets built up from the very simple; not how the complex gets built up from things just as complex. MShonle 14:50, 22 August 2005 (UTC)
  • Speaking of integer based array: even that leaved the question of "which integer"? (See: Ada Programming/Types/array). Before the almost assembler like C came out (and predended to be a higher language) it was almost part of the definition of a (real) higher language that you did not need to choose a register sized mashine word as array index. As such I belive that restricting an array index to "integer" is actualy a loss of abstraction. Abstraction for me means: away from computer implementation - closer to the live of humans. Example: Humans use arrays of (Jan, Feb, ... Dec) or (Mon, Thu, ... Sun) while computers would internaly use (0, 1 ... 12) or (0, 1, .. 6). Even if a human would choose a number to represent a month or day he would choose a 1 for Jan or Mon and not 0 as a computer would do. A higher programming language with better abstraction allow a more human like representation (i.E. using an enumeration type as as index). The Psydo language used in Computer Science should allow for the highest abstraction - and an array [Jan ... Dez, 1990 ... 2005] of .... should not be out of the question. --Krischik 06:47, 23 August 2005 (UTC)
    I completely agree with you that allowing different index types allows for better abstraction. But we need to be careful about which abstractions we're talking about. The purpose of this book is to describe how one would get from embarrassingly simple, low-level abstractions into very precise, semantic-rich, high-level abstractions. To me, it would kind of be like arguing that the Hashtables chapter should use Perl, because Perl already has hash tables built-in. But the point is not to write the most elegant code examples; the point is to demonstrate computer science concepts.

    I find the idea very beautiful that in a language like Lisp, all you need is cons, car and cdr, and from there you can make lists, trees and graphs. Add in a very simple array, and you can make hashtables and matrices, et cetera... This is a very compelling story, and it can't be told as well if unnecessary elements are constantly added as side-notes. MShonle 14:24, 23 August 2005 (UTC)

    (BTW, a deeper discussion of rich programming languages might be better suited for the Programming Languages book, or also possibly the Advanced Data Structures and Algorithms book. MShonle 21:09, 23 August 2005 (UTC))

Quicksort misconceptions

Since superceeded by the Introsort algorithm, the Quicksort algorithm can exact excellent performance on purely random data. It's worst performance, however, occurs on nearly-sorted or on sorted data, at which point Quicksort becomes completely impractical.

The above quote is incorrect. The original quicksort introduced by Hoare used random, not deterministic, methods to select the pivot. Thus, the kind of data input (random or ordered) wasn't important. It would only perform poorly on the very slight chance that the random selections were suboptimal. Some bad textbooks present some pretty poor ways of picking the pivots, but that is not the true quicksort.

I think it would probably be best not to present quicksort in the data-structures book at all because it's already half-way covered in the Algorithms book (currently the partition portion of quicksort is covered). I think Bubble sort probably shouldn't even be mentioned, as selection sort performs better and is easier to understand. For a data-structures book, Heap sort makes the most sense. Part of my hope is that we can keep both the data-structures and algorithms books simple (by covering only the most fundamental tools), and get into the more advanced material (e.g. introsort) in an Advanced Data-Structures and Algorithms book. MShonle 19:34, 20 Dec 2004 (UTC)

Moving forward

It looks like we've resolved many of the issues on this page. How would one go about archiving this talk page? MShonle 19:18, 30 Dec 2004 (UTC)

Create an /Archive 1 page and move the stuff there. Now that sub-pages are active /Archive 1 allways works ;-). --Krischik 15:31, 23 August 2005 (UTC)

Template Tricks

Here's the page structure I have for the Introduction, and I'd like to repeat it for the other chapters:

Computer Science:Data Structures:Introduction
  -- simple page that includes:
     a header {{Computer Science:Data StructuresChapNav}},
       - i.e. a template
     {{Computer Science:Data Structures:Introduction}},
       - another template
     a horizontal line
     a footer {{Computer Science:Data StructuresChapNav}},
       - the same template as the header

where the template {{Computer Science:Data Structures:Introduction}} is:
Template:Computer Science:Data Structures:Introduction
  -- a simple page that REDIRECTS to
     the page Computer Science:Data Structures:Introduction_content

Computer Science:Data Structures:Introduction_content
  -- the article that includes the chapter's *real* content.

The idea of these template tricks is two fold:

  1. The navigation templates simply allow you to browse the book easier; and
  2. The _content templates allow the book to be viewed in two different ways: by-chapter, or all-chapters

The by-chapter view is the one shown above. The all-chapters view would be a page that would look like this:

 {{Computer Science:Data Structures:Introduction}}
 {{Computer Science:Data Structures:Chapter 2}}
 {{Computer Science:Data Structures:Chapter 3}}
 ... and so on, including all of the _content pages (indirectly, because the
 templates will redirect)

The indirection is neccessary in order to avoid the chapter navigation getting shown in the all-chapters view.

The idea behind the all chapters view is that people could print out the book easily. Anyway, if you see some weird things, like the above, that's the reason why. MShonle 19:47, 30 Dec 2004 (UTC)

Pseudocode

I've implemented the Stack ADT in a pseudocode that I'm proposing for the rest of the book. I used a literate programming approach in which the implementation is interspersed with prose describing it. The pseudocode for a somewhat more complicated ADT should probably be copied here and used for an example. The details of the pseudocode follow.

All blocks begin with a keyword (so far type and method) and end with a line containing end and the keyword. The pseudocode is sensitive to line endings, although a python-like use of ; to put two commands on the same line would not be unreasonable. Make sure you indent the contents of all blocks.

I use type TypeName[<type-arguments>] to introduce a new type. I use the word "type" rather than an alternative like "class" or "structure" because this is a book on abstract data types, rather than object oriented classes, and the word "structure" is too long. I drew the angle brackets from C++ template or Java generic syntax. I think it is important to point out that abstract types are parametrized on other types. The type ends with end type on its own line.

The data keyword declares a private data member. It is of the form data member-name : member-type. I am using the Pascal style of data typing (: to introduce the type of a variable) because there's already a keyword in front of the variable and this allows us to use type names with spaces in them. Data members (so far) can be accessed by their name alone in any member function. Possibly, we should follow python's lead and require a "self." in front of an access to clearly distinguish them from local variables.

method introduces a public member function/method. These are of the form method method-name (arg-1:arg-1-type,) [:return-type]. A method ends with end method on its own line. If a return type is present, a return statement with the right type must terminate all calls of the function. Probably we should find a way to indicate that a method does not modify its object or an argument. You call a member function using the standard object.member(args) syntax.

I'd like a for loop that allows iteration over either a numeric range with a possible stride, or over all members of a given set or sequence. Perhaps the following syntax:

for i in 1..11 [step 2] # Does this loop for i=11 or not?
for i in some_set

Pattern matching should probably be allowed too:

for (first,second) in pair_sequence

Discussion

A defense of explicit typing in our pseudocode
In a couple places, MShonle has suggested that this book's pseudocode should not use generics, instead leaving certain variables untyped. I respectfully disagree. While certain languages (Perl, Python, Lisp) allow programmers to omit the types, this makes the programmer have a firmer understanding of what type a given variable has at any time since the compiler won't catch mistakes. Since we're trying to teach new programmers, who probably don't understand types as well, the type of any variable in our pseudo code should be specified. Without generics, if a variable declaration has no type, the reader has to guess the type. A mapping ADT will have two types that can vary, and a Graph may have even more. Using templates/generics adds a little syntax but allows us to type every variable we use. MShonle, if I'm wrong about your position, feel free to delete this paragraph. --jyasskin 07:01, 24 Jan 2005 (UTC)

Old discussion from above:

We should include code snippets in as many different languages as possible along with pseudocode.
I'm not sure "as many" languages as possible is desirable. Certainly the current fad language could be used, or a language like it. I'm a fan of pseudocode that absorbs the benefits from several languages. But data-structures in Scheme versus Python versus Java/C# are going to be very different. At some point decisions will have to be made. I believe for the topic of data structures that something closer to C should be used, but organized such that it could "obviously" be turned into an object-oriented program.

At heart, Data Structures is about making linked-list and hash-tables and ballanced binary search trees: it would seem silly to implement a linked list in a language that already includes full, built-in support of them. (However, these high level-languages could offer a lot: as they can help in the design of the interface of the structures; for example by using iterators.) The Algorithms book uses a more pure pseudocode language that offers excellent looping constructs and array slices: the reason being that showing low-level details detracts from the "hard part" of what the algorithm is (i.e., introducing implementation complexity on top of the inherent complex makes the story telling harder)Mshonle 09:45, 1 Aug 2004 (UTC)

From a teaching perspective i think that pseudocode for data structures is much much clearer than any implementation. I would suggest that code snippets be first done in pseudocode, and then possibly in actual language code (is there a way perhaps this could be on a different page, like an appendix or something?) The pseudocode is the clearest way to learn the concept, which is pretty indepedent of language. I think both functional and procedural data structures should be shown, as functional programming is becoming more mainstream, and this book is not simply a programming text, but a CS text, and they are theoretically very important. JustinWick 15:19, 12 Nov 2004 (UTC)

On the history page, Krischik said: "(Discussion - pseudocode vs real language - why not both?)". Answer: We are taking that approach. If people want to supply implementations in C, Scheme, Python or Java they may do so, as an appendix (the sister Algorithms book has one appendix already set up, but with little work done on it). The pseudo-language is closest to Python, so it would be a good idea to test all of the samples by actually running them against a full test-suite (which should also be part of the appendix; see the Algorithms book). MShonle 14:35, 23 August 2005 (UTC)


Jyasskin said: "In a couple places, MShonle has suggested that this book's pseudocode should not use generics, instead leaving certain variables untyped. I respectfully disagree."

Thinking about this more, I think we probably should go with a light-weight kind of generics notation. Basically in the yellow "Some-structure Operations" boxes we could put the conventional <Key,Value> or <Element> notations, just to say "when you see Key here, it could actually be anything". That probably would make things easier to read. This notation would only be used for the ADT descriptions. The methods themselves will just say "Key k" or "Element value", and when converting to a language, the reader will either: (1) delete the types, because it's a dynamically typed language; (2) enter in the generic notation, because it's a language that allows type parameterization; or (3) substitute for a concrete type, like string or integer, because the language doesn't allow for generics, but is also statically typed. MShonle 20:24, 29 August 2005 (UTC)

hi there! i just started reading this book, like it so far, but one thing conufses me: every chapter uses a diferent pseudocode; this realy makes it hard to follow...

TODO note

The references section should have its own _content page, so that it could be seen on the All Chapters view version of the book too. Possibily we'll want to do this for the preamble too. MShonle 20:59, 7 September 2005 (UTC)

reviews, quizzes?

Will this wikibook have reviews for each chapter, or perhaps quizzes, or even a test/project? I didn't really see any mention of this in the intro. Just wondering. I may be able to help with these. Gflores 02:24, 31 October 2005 (UTC)

Err... nevermind. I guess that's up to Data Structures on Wikiversity . It sure does seem like a lot of duplicate work... Gflores 00:41, 1 November 2005 (UTC)

still active?

I was just looking at the history of some of the modules and some date back to February. Is this book still active?


adding implementations

I noticed the part saying that implementations in various languages would be welcome as an Appendix. I agree that this is a good idea, and would be willing to add some of them. I'm not sure what a good structure would be. Have a page as the appendix with everything on it? A page as the appendix with links to pages for individual structures, featuring an implementation in each language, and/or links to pages for each language, featuring an implementation of each structure mentioned in the book. Any thoughts? Cat Parade 06:39, 8 March 2006 (UTC)

I agree completely. It would be great to have an appendix for code and pseudocode. I've noticed several sections that contain mostly code. The implementations are great but too much in the text will make a section unreadable. Christopher.Reilly (discusscontribs) 12:05, 16 June 2015 (UTC)

Pages to merge

  • [[Data Structures/merge Sort]]
  • [[Data Structures/bubble sort]]

-- Adrignola talk contribs 13:39, 17 May 2010 (UTC)

I've merged the above striked out orphan to the relevant Algorithm Implementation page. The orphans don't fit on the scope of this book as they don't relate to data-structures per se. --Panic (talk) 02:12, 20 May 2010 (UTC)

Hash Trees

I would like to see the book cover Hash Trees regarding structures, traversal and uses. --Panic (talk) 02:31, 20 May 2010 (UTC)

Category template

Just a note for editors (and anyone doing administrative work) on the project's namespace. There is no need to add {{BookCat}} at all on subpages that already use the {{Data Structures/ChapNav}} template (it is already included there). This also means you don't want the direct category addition below the template, which will upset the sorting that BookCat provides (would sort on D rather than H).
This should probably be listed in a specific book convention page that explains what and how to use the navegation templates. --Panic (discusscontribs) 09:42, 28 October 2011 (UTC)