Tools of the Trade
Last updated
Last updated
Alcock, Lara. How to Think About Analysis. Oxford, UK: Oxford University Press, 2014. This book was written to help out undergraduates taking their first course in analysis. It is intended to be a precursor to, or a companion for, an analysis course. That said, the book provides good coverage of elementary concepts such as limits and continuity. It may be a good alternative if one would rather not slog through more detailed treatments such as Rudin [61].
Anaconda. Anaconda is a downloadable platform for R and Python that includes additional tools such as Jupyter Notebook and other utilities.
Bertsekas, Dmitri P. and John N. Tsitsklis. Introduction to Probability. 2nd ed. Belmont, MA: Athena Scientific, 2008. An introduction to probability that requires some familiarity with single-variable and multivariable calculus. This is a companion text to an introductory course on probability at MIT, available on edX [68] or the MIT OpenCourseWare site at .
Buuren, Stef van. Flexible Imputation of Missing Data. Boca Raton, FL: CRC Press, 2012. Missing data are the norm, rather than the exception in the real world. Van Buuren provides an up-to-date guide on the latest data imputation methods. Contains numerous examples in R.
CodeProject. Contains practical examples if coding in a number of different programming languages. There is a section on artificial intelligence.
Cormen, Thomas H. et al. Introduction to Algorithms. 3rd ed. Cambridge, MA: MIT Press, 2009. This book has evolved into the quintessential introductory text on algorithms. Provides an encyclopedic coverage of most algorithms used in computer science. Extensive coverage of graphical and multithreaded algorithms.
Cover, Thomas M. and Joy A. Thomas. Elements of Information Theory. 2nd ed. New York: Wiley, 2006. Information theory is an important foundation for many areas of statistics and machine learning. This book by Cover and Thomas provides an excellent introduction to information theory and goes into considerable detail in some areas. Several topics in the book that are relevant to machine learning include information theory and statistics, Kolmogorov complexity, and data compression (relevant to minimum description length).
Garrity, Thomas A. All the Mathematics You Missed: But Need to Know for Graduate School. Cambridge, UK: Cambridge University Press, 2002. An excellent survey of various branches of mathematics, written mainly for graduate mathematics majors but also useful for graduate students in engineering and statistics. Covers the basic concepts in a number of areas useful for machine learning, including linear algebra, analysis, algorithms, and probability. The book provides an extensive guide – alas, now somewhat dated – to references for more in-depth follow-up reading on specific topics.
Gelman, Andrew, John B. Carlin, Hal S. Stern, David B. Dunson, Aki Vehtari, and Donald B. Rubin. Bayesian Data Analysis. 3rd ed. Boca Raton, FL: CRC Press, 2014. A comprehensive text on Bayesian data analysis that covers all aspects of Bayesian data analysis, including modern approaches to Bayesian computation. The latest edition adds three new chapters on nonparametric models. The authors also provide access to STAN, a new Hamiltonian Monte Carlo package for carrying out numerical Bayesian analysis.
GNU compiler collection (GCC). This collectin includes front ends for C, C++, Objective-C, Fortran, Ada, and Go, as well as libraries for these languages. Note that source code for some open source programs like R still have routines written in Fortran.
GNU Octave. GNU Octave is a freeware matrix processing environment that is largely compatible with Matlab. Includes both a GUI and command-line interfaces.
GNU web page on random number generator algorithms. Random number generation is crucial to machine learning, statistical analysis, and simulation. Yet it is probably safe to say that many random number generators in use today are not very “random” at all. (For example: linear congruential random number generators are the most frequently used, yet are not capable of producing high-quality streams of random numbers.) This page, part of the GNU Scientific Library, provides a link to two high-quality random number generators: the Mersenne Twister, and L’Ecuyer’s WELL random number generator.
Goldberg, David. What Every Computer Scientist Should Know About Floating-Point Arithmetic. (Available on the Internet in several locations including the following: ). Not knowing what goes inside the computer can, and often does, lead to problems for the user. This article goes into great detail on floating point representation, particularly the IEEE 754 standard, which is in common use today.
Golub, Gene H. and Charles F. Van Loan. Matrix Computations. 4th ed. Baltimore: Johns Hop-kins University Press, 2013. A classic on implementing matrix computations. The latest edition covers parallel algorithms.
Hunt, Andrew and David Thomas. The Pragmatic Programmer – From Journeyman to Master. Boston: Addison-Wesley, 2000. Exactly what it says. Covers many of the hands-on issues in code development that most books on the topic ignore.
JAGS – Just Another Gibbs Sampler. JAGS is an open-source cross-platform implementation of the BUGS (Bayesian Analysis Using Gibbs Sampling) language. Versions are available for Windows, Linux, and Mac OS X. Both R and Python have packages that can interface with JAGS.
Java programming language. Downloads for various versions of Java and programming environments.
Julia Language. Julia is a high-level, high-performance dynamic programming language for technical computing that was created at MIT. The language is still in early development, but early testing shows that its speed of performance is superior to alternatives such as R and Matlab. It is designed to work with IJulia, an interactive graphical interface to Julia that was created in collaboration with the Jupyter project.
Keras deep learning library. Keras is an open-source Python-based library of deep learning tools that can interface with TensorFlow [66].
Knuth, Donald E. The Art of Computer Programming. Reading, MA: Addison-Wesley. 4 vols. Volume 1: Fundamental Algorithms. Volume 2: Seminumerical Algorithms. Volume 3: Sorting and Searching. Volume 4A: Combinatorial Algorithms, Part 1. This comprehensive work has been regarded in the computer science world as the bible of algorithms. Volume 2 is particularly useful for its coverage of random number generation. One problem with this series is that the coding examples in volumes 1 – 3 are written in a 1960s-style assembly language created by the author, so they might be difficult to understand; Ruckert [60] rewrites these examples in a modern RISC-based assembly language, and is recommended as a supplement.
Kruschke, John K. Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan. 2nd ed. Boston: Academic Press, 2015. As the title says, this is a tutorial on how to do Bayesian analysis, with an emphasis on coding Bayesian models in JAGS and Stan. Nevertheless, the author does not stint on the necessary background and rationale behind Bayesian analysis. The discussion on Bayesian vs frequentist analysis is particularly useful because it points out a number of areas where frequentists mistakenly assign a Bayesian interpretation to frequentist measures such as p-values and confidence intervals. The book is strongly oriented toward R, but it is still be useful for those doing Bayesian analysis in other environments such as Python.
Luenberger, David G. and Yinyu Ye. Linear and Nonlinear Programming. 4th ed. New York: Springer, 2008. This latest update of Luenberger’s classic now contains an introduction to interior point methods. Luenberger provides extensive discussion on unconstrained optimization methods, carefully pointing out the advantages of each.
Mahajan, Sanjoy. Street-Fighting Mathematics: The Art of Educated Guessing and Opportunistic Problem Solving. Cambridge, MA: MIT Press, 2010. A delightful book, written to encourage thinking about mathematical approaches rather than engaging in rigorous proofs. Ideal for engineers. An online version of this book is available on the MIT OpenCourseWare site for a course by the same name (MIT course18.098 / 6.099).
Mattuck, Arthur P. Introduction to Analysis. San Francisco: Pearson, 1998. Most analysis texts are written for mathematics majors. This text was written with non-mathematicians – e.g., physicists and engineers – in mind. The pace is slow and careful, but the book covers much of the material that can be found in more advanced texts like Rudin [61].
McConnell, Steve. Code Complete. 2nd ed. Redmond, WA: Microsoft Press, 2004. This is by far the best book on code construction, including variable naming, statement organization, program organization, design, and debugging. If you follow its precepts you will find that you will end up developing programs much quicker with much less debugging. And when you return to something you might have written several years before, you will be able to recognize what you wrote. No matter what language you write in, this book will help you. This is the one book that you should keep by your side when programming.
McElreath, Richard. Statistical Rethinking: A Bayesian Course with Examples in R and Stan. Boca Raton, FL: CRC Press, 2014. This book is oriented toward
McGrayne, Sharon Bertsch. The Theory That Would Not Die: How Bayes' Rule Cracked the Enigma Code, Hunted Down Russian Submarines, & Emerged Triumphant from Two Centuries of Controversy. New Haven, CT: Yale University Press, 2011. This is a history book, not a text. What makes this book interesting is its narrative on how Bayesian statistics finally emerged from behind the shadows of frequentist statistics to take its place as the mainstream approach to statistics in the 21st century.
Microsoft Cognitive Toolkit. An open-source machine learning toolkit with interfaces in a number of languages including Python, C++, and .NET. Includes a variety of methods such as neural networks and time series analysis.
Microsoft Visual Studio Community Edition. A fully functional free version that is available for use by academics, individual users, and small business only. Includes all the tools and languages in Microsoft Visual Studio Professional Edition, including C#, C++, and F#.
Motwani, Rajeev and Prabhakar Raghavan. Randomized Algorithms. Cambridge, UK: Cambridge University Press, 1995. For many applications, a randomized algorithm can be the simplest and fastest way to reach a near-optimal solution. This book provides a solid introduction to the topic, with example algorithms.
Nocedal, Jorge and Stephen J. Wright. Numerical Optimization. 2nd ed. New York: Springer, c2006. Nonlinear optimization algorithms including non-derivative methods and interior-point methods.
Press, William H., Saul A. Teukolsky, William T. Vetterling, and Brian P. Flannery. Numerical Recipes: The Art of Scientific Computing. 3rd ed. Cambridge, UK: Cambridge University Press, 2007. (Earlier versions of the book can be found at .) This massive book is exactly what it says: a set of recipes for doing numerical computations. Reflecting the authors’ backgrounds, the book is mainly toward an audience with a physics background. That said, the book covers a number of useful topics for developing machine learning programs including: matrix and vector computations, equation solving, optimization, classification algorithms and hidden Markov models (although this book should not be the primary reference for learning about these topics), Markov chain Monte Carlo, random number generation, and wavelets. The discussions in the book include coverage of how well the various algorithms work under different circumstances and how to avoid getting into computational trouble. The coding examples in the book are its biggest strength and its biggest weakness – they show how one might implement an algorithm, but they are poorly written.
Python programming language. The main site for downloading Python. Contains links to references for learning about Python.
R Project. This is the home page for the R project. If you aren’t using R for machine learning, you should consider doing so. R has over 10,000 user-contributed add on packages, many of which deal with machine learning, e.g.: rattle, a user interface for developing machine learning models; e1071, a package of tools for machine learning; and kernlab, which contains functions for SVMs and RVMs. Support for R is available on the R list server (but be prepared for some snarky comments if you are asking a question that shows you didn’t bother to use the help facility in R); other support can be found through several R groups on LinkedIn and in Stack Overflow.
Ruckert, Martin. The MMIX Supplement: Supplement to The Art Of Computer Pro-gramming, Volumes 1, 2, 3 by Donald E. Knuth. Upper Saddle River, NJ: Addison-Wesley, 2015. Volumes 1-3 of Knuth [45] contain examples written in a 1960s-style assembly language, which is difficult to understand by modern standards. This book contains a rewrite of all those coding examples in MMIX, a RISC-based assembly language that is similar to modern assembly languages.
Rudin, Walter. Principles of Mathematical Analysis. 3rd ed. New York: McGraw-Hill, 1976. This is a venerable reference that has stood the test of time and is a standard text in many upper-division courses on real analysis. The book covers standard topics such as limits, compactness, implicit function theorem, Lebesgue measure and integration, and Stokes’s theorem.
Scala programming language. Scala is a programming language that runs on the Java Virtual Machine. It has a number of libraries specifically for machine learning.
Sedgewick, Robert. Algorithms in Java. Boston: Addison-Wesley, 2003. There are similar books by this author written for C and for C++. Covers basic algorithms, including graphs.
Stack Overflow. Stack Overflow is a general discussion web site that covers a wide range of topics on computing including programming languages, software development, and machine learning.
Strang, Gilbert. Introduction to Linear Algebra. 5th ed. Wellesley, MA: Wellesley-Cambridge Press, 2016. An excellent introductory text on linear algebra that provides an excellent background in linear algebra for machine learning. Important concepts are carefully introduced and clearly explained. Includes examples of applications of linear algebra, including optimization and regression analysis. This is a companion text to Strang’s course 18.06 on the MIT Open CourseWare site:
TensorFlow. TensorFlow is an open source machine learning library created by Google. Numerical computations are represented by data flow graphs, where nodes represent mathematical operations and the graph edges represent the data arrays (tensors) communicated between nodes. Keras [] is a high-level interface to TensorFlow. The Python API is the most developed and tested of the available APIs, but other languages such as C, C++, and Java can be used as well to work with TensorFlow.