File:BareBonesSearch.webm
Summary
Description |
English: Bare-Bones Basics of Full-Text Search—This is a version of a presentation I gave, re-recorded to share more widely. From the introduction: "Most of what we are going to cover is actually more basic than the stuff we do with CirrusSearch, Elasticsearch, and Lucene—which power the search on Wikipedia and other wikis—but it makes for a good mental model of the basic parts of an information retrieval system, and it provides a place to build on to discuss the more complex processing we actually do today. My goal is to start with no prerequisites and go over inverted indexes, tokenization and stemming, basic boolean and proximity retrieval operations, TF/IDF and the vector space model of similarity, field-level indexing, using multiple indexes, and then touch on some of the elements of scoring." |
Date | |
Source | Own work |
Author | Trey Jones (WMF) |
Licensing
I, the copyright holder of this work, hereby publish it under the following license:
This file is licensed under the Creative Commons Attribution-Share Alike 4.0 International license.
Attribution:
Trey Jones
- You are free:
- to share – to copy, distribute and transmit the work
- to remix – to adapt the work
- Under the following conditions:
- attribution – You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
- share alike – If you remix, transform, or build upon the material, you must distribute your contributions under the same or compatible license as the original.