Discovery/Status updates/2018-04-09

This is the weekly update for the week starting 2018-04-09


Discussions

Events and News

  • Erik and Trey went to the OpenSource Connections Haystack Search Relevance Conference and Tom Tom Founders Festival Machine Learning Conference, which were back-to-back in Charlottesville, VA. Erik presented on how we use clickstream information to create training data for our learning to rank models at Haystack. Trey wrote up trip notes—with lots of links—on MediaWiki.

Other Noteworthy Stuff

  • Fix for CirrusSearchCheckerJob errors rolled out.
  • Stas implemented indexing Lexemes & Forms for WikibaseLexeme extension.

Did you know?

  • The English verb "to be" is kind of weird—the infinitive "be" and participles "being, been" start with "b-", while the preterite forms "was, were" start with "w-", and the present forms "am, is, are" start with vowels. The conjugations originally come from three or four different verbs! Why "three or four"? Wiktionary disagrees with itself a bit, listing four on the etymology of "is" and three on the etymology of "be". The conflation goes back at least to Proto-Germanic, so German is similarly weird. Dutch has a greatly simplified paradigm, but still shows some trace of the multiple sources. Other languages, including ASL, Arabic, Bengali, Hawaiian, Hebrew, Indonesian, Japanese, Russian, Turkish, and Ukrainian at least partly avoid this mess by having a zero copula. For search on-wiki, we deal with this problem in part with stemming and stop words.

--