Talk:Data Structures
General
This textbook aims to cover what CS majors learn in an introductory data structures and algorithms programming course.
Do we want to do everything for this subject on one page, or make this just a page of links? I like the second idea better.
I prefer dropping new information here until a section gets "too big". Then split off that section onto its own page. See http://CommunityWiki.org/BigBucketsFirst --DavidCary 05:41, 12 Jul 2004 (UTC)
I agree with that article's logic. I'll copy the material from my linked list page back to this page, and work on it here. Later on, maybe we'll span off to other pages aka buckets. --Waxmop
- I think the buckets right now are too small. I think it would be easier (because this is a textbook) to divide the book into chapters, and have each chapter be a subpage (if necessary). Anything finer grained than chapters would mean a lot of clicking and it would be easy to miss having a coherent voice in the book (both for contributing authors and for readers). But that's just my own feelings on the matter: I want these to seem like real books instead of just indexes to entries in the wikipedia. Mshonle 09:31, 1 Aug 2004 (UTC)
Format
I'd like to join the project if I may. I'm ready to tackle a couple of these topics. Is there a particular format to use?--Wcrowe 21:34, 1 Jul 2004 (UTC)
I would like to suggest a change in the order of the chapters. I would like to have Graphs moved before Trees, as a tree is simply a special case of a graph. Pesto 04:11, 28 August 2005 (UTC)
- I see what you are getting at, but the question is What's the best didactic ordering? After all, linked-lists are just another form of graph, further, they are another form of tree, so why not discuss linked-lists last? But to be serious, I think tree traversal algorithms are easier to understand than graph traversal algorithms. And by also being more limited, trees provide a good introduction to the concept of graphs. MShonle 05:37, 28 August 2005 (UTC)
jump in!
If you write an article, you get to choose the format. Later on, people might improve the format, but at the beginning, we should just get out something and let people work on refining later. --Waxmop 17:15, 4 Jul 2004 (UTC)
We should use header markup (with the equal signs) rather than bullet markup. The header markup builds the table of contents. Also, we should put articles into the Programming:Data_Structures namespace for now. I'm starting a linked list article. --Waxmop 17:22, 4 Jul 2004 (UTC)
- I agree: looking at a table of contents gives a better idea of what the book is like, and encourages more modifications and additions. I'm going to change the format to that. Mshonle 09:48, 1 Aug 2004 (UTC)
I dropped the variables and types section.
It links to a javascript page right now that I don't think is relevant. Everyone: remember to use the namespace Programming:Data Structures:your topic for your articles.--Waxmop 22:55, 5 Jul 2004 (UTC)
Wikipedia links are good, but...
This is a textbook, therefore, IMO, sections should use a how-to approach. You will immediately see that this differs quite a bit from an encyclopedic approach. I hope it is everyone's intent to teach these subjects, step by step, rather than just create a collection of Wikipedia links.
Page Structure
I'd rather not make a large change without some discussion, but it seems to me that the sorting portion should be moved to the bottom. I kind of visualize this page as three sections:
I wasn't planning on laying out the whole menu. I just meant to put in some examples, but I went further than I intended. I think that this breakdown gives us better control over presenting pros & cons of collection selections (Since the list selections are together and the stack selections are together), and is more consistant.
Primary Storage Types Single object allocation (used to implements linked lists, most trees, ...) Array (used to implement array list, most stacks, circular queues, ...) Secondary Storage Types Lists Common list operations & example Set(pos, val), Get(pos), remove(pos), append(val) Implemented as a Linked List Single linked list Double linked list Implemented as an Array List Stacks Common stack operations & example Push(val), Pop (, peek?) Implemented on a Linked List Implemented on an array Queues Common queue operations & example enqueue, dequeue Implemented as LinkedList Implemented as an array (circular, fixed size queue) Special cases Dequeues (What is this used for, by the way?) Trees & Graphs (isn't a tree just a special case of a graph?) Common tree operations & example add, get, remove Implemented as links (like linked list) Implemented in an array (Heap) Special cases Binary tree Balanced Quad tree Graphs Hash Tables ... How do I choose a data structure for a given situation? Sorts Big O notation Combinations ArrayList and Hashtable ...
--BillKress 23:06, 5 Nov 2004 (UTC)
- Big O Notation should be discussed in front because it is used throughout the rest of the material. You should also include the discussion of Abstract Data Types and information hiding in fron too. -- Gdarwin 06:42, 7 Nov 2004 (UTC)
- Hi Bill, sorry for the late reply. That structure looks much better, though I would put trees in a section before graphs, because the traversal concepts like inorder, preorder, and post-order (for binary trees) is really something separate enough. After queues should be heaps, aka priority queues, and then the heap sort. For graphs, depth-first and breadth-first should be covered. Below is what I suggest: MShonle 01:00, 21 Dec 2004 (UTC)
Introduction What is Data Abstraction? The Node The most elementary data structure Arrays Matricies: two dimensional arrays Insertion sort on an array Binary search on a sorted array List Structures Common list operations & example set(pos, val), get(pos), remove(pos), append(val) [not sure about these operations: what about next()? or iterators?] [Perhaps discuss operations on Nodes, versus operations on the List itself: example, Nodes have a next operation, but the List itself has a pointer to the head and tail node, and an integer for the number of elements. This view would work well in both the Lisp and OO worlds.] Doubley linked list Vectors (resizeable arrays) Iterators Operating on sequences without knowledge of the sequence itself Stacks and Queues Common stack operations & example push(val), pop, and peek Implemented on a Linked List Implemented on an array Common queue operations & example enqueue, dequeue Implemented as LinkedList Implemented as an array (circular, fixed size queue) Dequeues Trees Common tree operations & example add, get, remove Binary trees in order, pre order, post order traversals Balanced binary trees Iterators for trees Min and Max Heaps A tree implemented as an array! (not to be confused with the heap in memory) Percolation Heap sort Treaps: heaps where elements can be deleted by name Graphs Directed and Undirected graphs matrix and list representations Depth-First Search Breadth-First Search Topological sort Hash Tables Iteration order for hash tables by augmenting the structure - iterating over items in the order in which they were inserted - iterating over the items based on most-recently-used Sets Different implementations - lists, binary search trees, bit arrays, hashtables Disjoint Sets (operations: union, find) Tradeoffs between Data Structures Use asymptotic behaviour to decide, most importantly seeing how the structure will be used: an infrequent operation does not need to be fast if it means everything else will be much faster
The content is designed to compliment the Algorithms text book (the idea is that the reader would cover the DS book first, then the Algo book, and then the Advanced Data Structures and Algorithms book that would pick up the more random and advanced ideas). Most importantly, I'd like to see each book be small enough that it can be read completely in one or two semesters. The more reference-based or encyclopedic content could be saved for either the Advanced book or the wikipedia. MShonle 01:00, 21 Dec 2004 (UTC)
Incoherent Structure
I agree. I find it odd that a lot of list based structures are currently placed under linked lists, when these structures can obviously be impleneted as an array.
Your layout looks good, but I was wondering if we can add a disccusion about big O notation. I see it used here, but where is it formally defined?
Gdarwin 16:13, 29 Oct 2004 (UTC)
Proposed structure has been modified. I'd still like to hear one or two more people say that they prefer this structure before I refactor.
--BillKress 23:04, 5 Nov 2004 (UTC)
Duplication for Array
Currently we have some 2 articles on arrays one here and one in Programming:Arrays. We should merge realy leavin only a short into here and move the rest off.
- Programming:Arrays is only a redirect to Data Structures/Arrays. --Krischik 06:47, 23 August 2005 (UTC)
- I think the DS book should only talk about the most basic form of arrays, that is, integer-indexed arrays. Many of the other topics currently mentioned, like associative arrays, won't make sense to the reader because they haven't read about hash tables (which use simple arrays and linked-lists) or binary search trees yet. The purpose of our story should be abstraction, abstraction, abstraction, and show how the complex gets built up from the very simple; not how the complex gets built up from things just as complex. MShonle 14:50, 22 August 2005 (UTC)
- Speaking of integer based array: even that leaved the question of "which integer"? (See: Ada Programming/Types/array). Before the almost assembler like C came out (and predended to be a higher language) it was almost part of the definition of a (real) higher language that you did not need to choose a register sized mashine word as array index. As such I belive that restricting an array index to "integer" is actualy a loss of abstraction. Abstraction for me means: away from computer implementation - closer to the live of humans. Example: Humans use arrays of (Jan, Feb, ... Dec) or (Mon, Thu, ... Sun) while computers would internaly use (0, 1 ... 12) or (0, 1, .. 6). Even if a human would choose a number to represent a month or day he would choose a 1 for Jan or Mon and not 0 as a computer would do. A higher programming language with better abstraction allow a more human like representation (i.E. using an enumeration type as as index). The Psydo language used in Computer Science should allow for the highest abstraction - and an
array [Jan ... Dez, 1990 ... 2005] of ....
should not be out of the question. --Krischik 06:47, 23 August 2005 (UTC)- I completely agree with you that allowing different index types allows for better abstraction. But we need to be careful about which abstractions we're talking about. The purpose of this book is to describe how one would get from embarrassingly simple, low-level abstractions into very precise, semantic-rich, high-level abstractions. To me, it would kind of be like arguing that the Hashtables chapter should use Perl, because Perl already has hash tables built-in. But the point is not to write the most elegant code examples; the point is to demonstrate computer science concepts.
I find the idea very beautiful that in a language like Lisp, all you need is cons, car and cdr, and from there you can make lists, trees and graphs. Add in a very simple array, and you can make hashtables and matrices, et cetera... This is a very compelling story, and it can't be told as well if unnecessary elements are constantly added as side-notes. MShonle 14:24, 23 August 2005 (UTC)
- (BTW, a deeper discussion of rich programming languages might be better suited for the Programming Languages book, or also possibly the Advanced Data Structures and Algorithms book. MShonle 21:09, 23 August 2005 (UTC))
- I completely agree with you that allowing different index types allows for better abstraction. But we need to be careful about which abstractions we're talking about. The purpose of this book is to describe how one would get from embarrassingly simple, low-level abstractions into very precise, semantic-rich, high-level abstractions. To me, it would kind of be like arguing that the Hashtables chapter should use Perl, because Perl already has hash tables built-in. But the point is not to write the most elegant code examples; the point is to demonstrate computer science concepts.
- Speaking of integer based array: even that leaved the question of "which integer"? (See: Ada Programming/Types/array). Before the almost assembler like C came out (and predended to be a higher language) it was almost part of the definition of a (real) higher language that you did not need to choose a register sized mashine word as array index. As such I belive that restricting an array index to "integer" is actualy a loss of abstraction. Abstraction for me means: away from computer implementation - closer to the live of humans. Example: Humans use arrays of (Jan, Feb, ... Dec) or (Mon, Thu, ... Sun) while computers would internaly use (0, 1 ... 12) or (0, 1, .. 6). Even if a human would choose a number to represent a month or day he would choose a 1 for Jan or Mon and not 0 as a computer would do. A higher programming language with better abstraction allow a more human like representation (i.E. using an enumeration type as as index). The Psydo language used in Computer Science should allow for the highest abstraction - and an
Quicksort misconceptions
- Since superceeded by the Introsort algorithm, the Quicksort algorithm can exact excellent performance on purely random data. It's worst performance, however, occurs on nearly-sorted or on sorted data, at which point Quicksort becomes completely impractical.
The above quote is incorrect. The original quicksort introduced by Hoare used random, not deterministic, methods to select the pivot. Thus, the kind of data input (random or ordered) wasn't important. It would only perform poorly on the very slight chance that the random selections were suboptimal. Some bad textbooks present some pretty poor ways of picking the pivots, but that is not the true quicksort.
I think it would probably be best not to present quicksort in the data-structures book at all because it's already half-way covered in the Algorithms book (currently the partition portion of quicksort is covered). I think Bubble sort probably shouldn't even be mentioned, as selection sort performs better and is easier to understand. For a data-structures book, Heap sort makes the most sense. Part of my hope is that we can keep both the data-structures and algorithms books simple (by covering only the most fundamental tools), and get into the more advanced material (e.g. introsort) in an Advanced Data-Structures and Algorithms book. MShonle 19:34, 20 Dec 2004 (UTC)
Moving forward
It looks like we've resolved many of the issues on this page. How would one go about archiving this talk page? MShonle 19:18, 30 Dec 2004 (UTC)
- Create an /Archive 1 page and move the stuff there. Now that sub-pages are active /Archive 1 allways works ;-). --Krischik 15:31, 23 August 2005 (UTC)
Template Tricks
Here's the page structure I have for the Introduction, and I'd like to repeat it for the other chapters:
Computer Science:Data Structures:Introduction -- simple page that includes: a header {{Computer Science:Data StructuresChapNav}}, - i.e. a template {{Computer Science:Data Structures:Introduction}}, - another template a horizontal line a footer {{Computer Science:Data StructuresChapNav}}, - the same template as the header where the template {{Computer Science:Data Structures:Introduction}} is: Template:Computer Science:Data Structures:Introduction -- a simple page that REDIRECTS to the page Computer Science:Data Structures:Introduction_content Computer Science:Data Structures:Introduction_content -- the article that includes the chapter's *real* content.
The idea of these template tricks is two fold:
- The navigation templates simply allow you to browse the book easier; and
- The _content templates allow the book to be viewed in two different ways: by-chapter, or all-chapters
The by-chapter view is the one shown above. The all-chapters view would be a page that would look like this:
{{Computer Science:Data Structures:Introduction}} {{Computer Science:Data Structures:Chapter 2}} {{Computer Science:Data Structures:Chapter 3}} ... and so on, including all of the _content pages (indirectly, because the templates will redirect)
The indirection is neccessary in order to avoid the chapter navigation getting shown in the all-chapters view.
The idea behind the all chapters view is that people could print out the book easily. Anyway, if you see some weird things, like the above, that's the reason why. MShonle 19:47, 30 Dec 2004 (UTC)
Pseudocode
I've implemented the Stack ADT in a pseudocode that I'm proposing for the rest of the book. I used a literate programming approach in which the implementation is interspersed with prose describing it. The pseudocode for a somewhat more complicated ADT should probably be copied here and used for an example. The details of the pseudocode follow.
All blocks begin with a keyword (so far type and method) and end with a line containing end and the keyword. The pseudocode is sensitive to line endings, although a python-like use of ; to put two commands on the same line would not be unreasonable. Make sure you indent the contents of all blocks.
I use type TypeName[<type-arguments>]
to introduce a new type. I use the word "type" rather than an alternative like "class" or "structure" because this is a book on abstract data types, rather than object oriented classes, and the word "structure" is too long. I drew the angle brackets from C++ template or Java generic syntax. I think it is important to point out that abstract types are parametrized on other types. The type ends with end type
on its own line.
The data keyword declares a private data member. It is of the form data member-name : member-type
. I am using the Pascal style of data typing (: to introduce the type of a variable) because there's already a keyword in front of the variable and this allows us to use type names with spaces in them. Data members (so far) can be accessed by their name alone in any member function. Possibly, we should follow python's lead and require a "self." in front of an access to clearly distinguish them from local variables.
method introduces a public member function/method. These are of the form method method-name (arg-1:arg-1-type,…) [:return-type]
. A method ends with end method
on its own line. If a return type is present, a return statement with the right type must terminate all calls of the function. Probably we should find a way to indicate that a method does not modify its object or an argument. You call a member function using the standard object.member(args)
syntax.
I'd like a for loop that allows iteration over either a numeric range with a possible stride, or over all members of a given set or sequence. Perhaps the following syntax:
for i in 1..11 [step 2] # Does this loop for i=11 or not? for i in some_set
Pattern matching should probably be allowed too:
for (first,second) in pair_sequence
Discussion
- A defense of explicit typing in our pseudocode
- In a couple places, MShonle has suggested that this book's pseudocode should not use generics, instead leaving certain variables untyped. I respectfully disagree. While certain languages (Perl, Python, Lisp) allow programmers to omit the types, this makes the programmer have a firmer understanding of what type a given variable has at any time since the compiler won't catch mistakes. Since we're trying to teach new programmers, who probably don't understand types as well, the type of any variable in our pseudo code should be specified. Without generics, if a variable declaration has no type, the reader has to guess the type. A mapping ADT will have two types that can vary, and a Graph may have even more. Using templates/generics adds a little syntax but allows us to type every variable we use. MShonle, if I'm wrong about your position, feel free to delete this paragraph. --jyasskin 07:01, 24 Jan 2005 (UTC)
Old discussion from above:
- We should include code snippets in as many different languages as possible along with pseudocode.
- I'm not sure "as many" languages as possible is desirable. Certainly the current fad language could be used, or a language like it. I'm a fan of pseudocode that absorbs the benefits from several languages. But data-structures in Scheme versus Python versus Java/C# are going to be very different. At some point decisions will have to be made. I believe for the topic of data structures that something closer to C should be used, but organized such that it could "obviously" be turned into an object-oriented program.
At heart, Data Structures is about making linked-list and hash-tables and ballanced binary search trees: it would seem silly to implement a linked list in a language that already includes full, built-in support of them. (However, these high level-languages could offer a lot: as they can help in the design of the interface of the structures; for example by using iterators.) The Algorithms book uses a more pure pseudocode language that offers excellent looping constructs and array slices: the reason being that showing low-level details detracts from the "hard part" of what the algorithm is (i.e., introducing implementation complexity on top of the inherent complex makes the story telling harder)Mshonle 09:45, 1 Aug 2004 (UTC)
- I'm not sure "as many" languages as possible is desirable. Certainly the current fad language could be used, or a language like it. I'm a fan of pseudocode that absorbs the benefits from several languages. But data-structures in Scheme versus Python versus Java/C# are going to be very different. At some point decisions will have to be made. I believe for the topic of data structures that something closer to C should be used, but organized such that it could "obviously" be turned into an object-oriented program.
- From a teaching perspective i think that pseudocode for data structures is much much clearer than any implementation. I would suggest that code snippets be first done in pseudocode, and then possibly in actual language code (is there a way perhaps this could be on a different page, like an appendix or something?) The pseudocode is the clearest way to learn the concept, which is pretty indepedent of language. I think both functional and procedural data structures should be shown, as functional programming is becoming more mainstream, and this book is not simply a programming text, but a CS text, and they are theoretically very important. JustinWick 15:19, 12 Nov 2004 (UTC)
- I have a suggestion from the Computer Programming Book: Just read the following 3 pages: Computer Programming#Key_concepts, Computer Programming/Control, Ada Programming/Control. It's almost what JustinWick suggested.
On the history page, Krischik said: "(Discussion - pseudocode vs real language - why not both?)". Answer: We are taking that approach. If people want to supply implementations in C, Scheme, Python or Java they may do so, as an appendix (the sister Algorithms book has one appendix already set up, but with little work done on it). The pseudo-language is closest to Python, so it would be a good idea to test all of the samples by actually running them against a full test-suite (which should also be part of the appendix; see the Algorithms book). MShonle 14:35, 23 August 2005 (UTC)
Jyasskin said: "In a couple places, MShonle has suggested that this book's pseudocode should not use generics, instead leaving certain variables untyped. I respectfully disagree."
- Thinking about this more, I think we probably should go with a light-weight kind of generics notation. Basically in the yellow "Some-structure Operations" boxes we could put the conventional <Key,Value> or <Element> notations, just to say "when you see Key here, it could actually be anything". That probably would make things easier to read. This notation would only be used for the ADT descriptions. The methods themselves will just say "Key k" or "Element value", and when converting to a language, the reader will either: (1) delete the types, because it's a dynamically typed language; (2) enter in the generic notation, because it's a language that allows type parameterization; or (3) substitute for a concrete type, like string or integer, because the language doesn't allow for generics, but is also statically typed. MShonle 20:24, 29 August 2005 (UTC)
hi there! i just started reading this book, like it so far, but one thing conufses me: every chapter uses a diferent pseudocode; this realy makes it hard to follow...
TODO note
The references section should have its own _content page, so that it could be seen on the All Chapters view version of the book too. Possibily we'll want to do this for the preamble too. MShonle 20:59, 7 September 2005 (UTC)
reviews, quizzes?
Will this wikibook have reviews for each chapter, or perhaps quizzes, or even a test/project? I didn't really see any mention of this in the intro. Just wondering. I may be able to help with these. Gflores 02:24, 31 October 2005 (UTC)
- Err... nevermind. I guess that's up to Data Structures on Wikiversity . It sure does seem like a lot of duplicate work... Gflores 00:41, 1 November 2005 (UTC)
still active?
I was just looking at the history of some of the modules and some date back to February. Is this book still active?
adding implementations
I noticed the part saying that implementations in various languages would be welcome as an Appendix. I agree that this is a good idea, and would be willing to add some of them. I'm not sure what a good structure would be. Have a page as the appendix with everything on it? A page as the appendix with links to pages for individual structures, featuring an implementation in each language, and/or links to pages for each language, featuring an implementation of each structure mentioned in the book. Any thoughts? Cat Parade 06:39, 8 March 2006 (UTC)
- I agree completely. It would be great to have an appendix for code and pseudocode. I've noticed several sections that contain mostly code. The implementations are great but too much in the text will make a section unreadable. Christopher.Reilly (discuss • contribs) 12:05, 16 June 2015 (UTC)
Pages to merge
[[Data Structures/merge Sort]][[Data Structures/bubble sort]]
Hash Trees
I would like to see the book cover Hash Trees regarding structures, traversal and uses. --Panic (talk) 02:31, 20 May 2010 (UTC)
Category template
Just a note for editors (and anyone doing administrative work) on the project's namespace. There is no need to add {{BookCat}} at all on subpages that already use the {{Data Structures/ChapNav}} template (it is already included there). This also means you don't want the direct category addition below the template, which will upset the sorting that BookCat provides (would sort on D rather than H).
This should probably be listed in a specific book convention page that explains what and how to use the navegation templates. --Panic (discuss • contribs) 09:42, 28 October 2011 (UTC)