AI and Libraries: A Bibliography

As part of the work I’ve been doing in this space I’ve collected a reasonable, working bibliography that might be helpful to others. I’m very interested in the early days of this area (hence the resources from the 1980s and 1990s) as well as contemporary work. Happy reading.


Alberico, R., & Micco, M. (1990). Expert systems for reference and information retrieval. Meckler.

Aluri, R., & Riggs, D. E. (1988). Application of expert systems to libraries. Advances in Library Automation and Networking, 2, 1–43.

Aluri, R., & Riggs, D. E. (1990). Expert systems. In M. Gorman (Ed.), Convergence: Proceedings of the Second National conference of the Library and Information Technology Association, October 2-6, 1988, Boston, Massachusetts (pp. 169–178). American Library Association.

Arlitsch, K., & Newell, B. (2017). Thriving in the age of accelerations: A brief look at the societal effects of artificial intelligence and the opportunities for libraries. Journal of Library Administration, 57(7), 789–798.

Arny, P. (1990). A prototype expert system for library reference. In M. Gorman (Ed.), Convergence: Proceedings of the Second National conference of the Library and Information Technology Association, October 2-6, 1988, Boston, Massachusetts (pp. 179–182). American Library Association.

Bailey, C. W. (1991). Intelligent library systems: Artificial intelligence technology and library automation systems. Advances in Library Automation and Networking, 4, 1–23.

Bailey, C. W. (1993). The intelligent reference information system project. A merger of CD-ROM LAN and expert system technologies. Information Technology and Libraries, 11(3), 237–44.

Bailey, C. W. (1990). Building knowledge-based systems for public use: The intelligent reference systems project at the University of Houston Libraries. In M. Gorman (Ed.), Convergence: Proceedings of the Second National conference of the Library and Information Technology Association, October 2-6, 1988, Boston, Massachusetts (pp. 190–194). American Library Association.

Bailey, C. W., & Downes, R. N. (1993). Intelligent reference information system (IRIS). In J. V. Boettcher (Ed.), 101 success stories of information technology in higher education: The Joe Wyatt Challenge (pp. 402–407). McGraw-Hill.

Bailey, C. W., Fadell, J., Myers, J. E., & Wilson, T. C. (1989). The Index Expert system: A knowledge-based system to assist users in index selection. Reference Services Review, 17(4), 19–28.

Bailey, C. W., & Gunning, K. (1990). The intelligent reference information system. CD-ROM Librarian, 5(8), 10.

Bell, S. (2016). Promise and peril of AI for academic librarians. Library Journal.

Beta Writer. (2019). Lithium-Ion Batteries: A Machine-Generated Summary of Current Research. Springer Nature.

Boman, C. (2019). An exploration of machine learning in libraries. Library Technology Reports, 55(1), 21–25.

Bourg, C. (2017, March 17). What happens to libraries and librarians when machines can read all the books? Feral Librarian.

Bridy, A. (2012). Coding creativity: Copyright and the artificially intelligent author. Stanford Technology Law Review, 5.

Buckland, M. K., & Florian, D. (1992). Expertise, task complexity, and artificial intelligence: A conceptual framework. Journal of the American Society for Information Science, 42(9), 635–43. <a href="<635::AID-ASI2>3.0.CO;2-L

Bush, V. (1945). As we may think. Atlantic Monthly, 176(July), 101–108.

Butkovich, N. J., Taylor, K. L., Dent, S. H., & Moore, A. S. (1989). An expert system at the reference desk: Impressions from users. The Reference Librarian, 23, 61–74.

Calhoun, K. (2014). Exploring digital libraries: Foundations, practice, prospects. Neal-Schuman.

Calvert, S. (2020). Emerging technologies for research and learning: Interviews with experts. Association of Research Libraries.

Canadian Association of Research Libraries. (2017). Submission to the House of Commons Committee on Access to Information, Privacy and Ethics (ETHI Committee) hearings on the Personal Information Protection and Electronic Documents Act (PIPEDA). CARL.

Canadian Federation of Library Associations. (2018). Artificial intelligence and intellectual freedom: Key policy concerns for Canadian libraries. CFLA.

Cavanagh, J. M. (1989). Library applications of knowledge-based systems. The Reference Librarian, 23, 1–19.

Cerf, V. G. (2019). Libraries considered hazardous. Communications of the ACM, 62(2), 5.

Chen, J. (2010). Artificial intelligence. In M. J. Bates & M. N. Maack (Eds.), Encyclopedia of library and information sciences (3rd ed., pp. 289–298). CRC Press.

Chu, H.-C., & Yang, S.-W. (2012). Innovative semantic web services for next generation academic electronic library via web 3.0 via distributed artificial intelligence. In J. S. Pan, S. M. Chen, & N. T. Nguyen (Eds.), Intelligent Information and Database Systems (pp. 118–124). Springer.

Coleman, C. (2017a, November 3). Artificial intelligence and the library of the future, revisited. Digital Library Blog.

Coleman, C. (2017b, November 3). Artificial intelligence and the library of the future, revisited. Digital Library Blog.

Conrad, L. Y. (2019, June 25). The robots are writing: Will machine-generated books accelerate our consumption of scholarly literature? The Scholarly Kitchen.

Cox, A. M., Pinfield, S., & Rutter, S. (2018). The intelligent library: Thought leaders’ views on the likely impact of artificial intelligence on academic libraries. Library Hi Tech.

Elosua, J., Brede, A. S., Ritola, M., & Botev, V. (2018).’s project Aiur: An open, community-governed AI engine for knowledge validation.

Enis, M. (2018). Technology: University opens AI lab in library. Library Journal, 143(17), 12–14.

Fernandez, P. (2016). “Through the looking glass: Envisioning new library technologies” how artificial intelligence will impact libraries. Library Hi Tech News, 33(5), 5–8.

Fister, B. (2020, March 9). Libraries and the practice of freedom in the age of algorithms.

Geist, M. (2017, June 2). Toward a Canadian knowledge transfer strategy: My appearance before the Standing Committee on Industry, Science and Technology. Michael Geist.

Gorman, M. (Ed.). (1990). Convergence: Proceedings of the Second National Conference of the Library and Information Technology Association, October 2-6, 1988, Boston. American Library Association.

Gramatica, R. (2018). How AI will change libraries.

Griffey, J. (Ed.). (2019). Artificial intelligence and machine learning in libraries. Library Technology Reports, 55(1).

Griffey, J., & Webster, K. (2019). Artificial intelligence: Impacts and roles for libraries. v=R4Hk5l7Bvr4&

Harper, C. (2018). Machine learning and the library or: How I learned to stop worrying and love my robot overlords. Code4Lib, 41.

Head, A. J., Fister, B., & MacMillan, M. (2020). Information literacy in the age of algorithms: Student experiences with news and information, and the need for change. Project Information Literacy.

Henry, G. (2019). Research librarians as guides and navigators for AI policies at universities. Research Library Issues, 299, 47–64.

Herron, J. (2017). Intelligent agents for the library. Journal of Electronic Resources in Medical Libraries, 14(3–4), 139–144.

Hilt, K. (2017). What does the future hold for the law librarian in the advent of artificial intelligence? Canadian Journal of Information and Library Science, 41(3), 211–227.

Hristov, K. (2017). Artificial intelligence and the copyright dilemma. Idea, 57(3), 454.

Hsieh, C., & Hall, W. (1989). Survey of artificial intelligence and expert systems in library and information science literature. Information Technology and Libraries, 8(2), 209.

Johnson, B. (2018). Libraries in the age of artificial intelligence. Computers in Libraries, 38(1).–Libraries-in-the-Age-of-Artificial-Intelligence.shtml

Johnson, S. A. (2019). Technology innovation and AI ethics. Research Library Issues, 299, 14–27.

Kennedy, C. A. (2019). You and AI. Against the Grain.

Kennedy, M. L. (2019). What do artifical intelligence (AI) and ethics of AI mean in the context of research libraries? Research Library Issues, 299, 3–13.

Lancaster, F. W. (1993). Artificial intelligence and expert systems: How will they contribute? In F. W. Lancaster (Ed.), Libraries and the future: Essays on the library in the twenty-first century (pp. 147–156). Haworth Press.

Lancaster, F. Wilfrid, & Smith, L. C. (Eds.). (1992). Artificial intelligence and expert systems: Will they change the library. Graduate School of Library Science, University of Illinois at Urbana-Champaign.

Lankes, R. D. (2019, July 3). Decoding AI and libraries. R. David Lankes.

Leung, S., Baildon, M., & Albaugh, N. (2019). Applying concepts of algorithmic justice to reference instruction, and collections work. MIT Libraries.

Licklider, J. C. R. (1960). Man-computer symbiosis. Human Factors in Electronics, IRE Transactions On, HFE-1(1), 4–11.

Licklider, J. C. R. (1965). Libraries of the Future. MIT Press.

Lippincott, S. (2020). Mapping the current landscape of research library engagement with emerging technologies in research and learning. Association of Research Libraries.

Liu, G. (2011). The application of intelligent agents in libraries: A survey. Program: Electronic Library and Information Systems, 45(1), 78–97.

Liu, J., Liu, C., & Belkin, N. J. (2020). Personalization in text information retrieval: A survey. Journal of the Association for Information Science and Technology, 71(3), 349–369.

Liu, X., Guo, C., & Zhang, L. (2014). Scholar metadata and knowledge generation with human and artificial intelligence. Journal of the Association for Information Science and Technology, 65(6), 1187–1201.

Lynch, C. (2017). Stewardship in the “age of algorithms.” First Monday, 22(12).

Massis, B. (2018). Artificial intelligence arrives in the library. Information and Learning Science, 119(7/8), 456–459.

McDonald, C., & Weckert, J. (Eds.). (1991). Libraries and expert systems: Proceedings of a conference and workshop held at Charles Sturt University – Riverina, Australia, July 1990. Taylor Graham.

Miller, R. B., & Wolf, M. T. (Eds.). (1992). Thinking robots, an aware internet, and cyberpunk librarians. Library and Information Technology Association.

Morris, A. (Ed.). (1992). The Application of expert systems in libraries and information centres. Bowker-Saur.

Mostafa, J. (2018). Documents and (as) machines. Journal of the Association for Information Science and Technology, 69(1), 3–5.

Nardi, B. A., & O’Day, V. (1996). Intelligent agents: What we learned at the library. Libri, 46, 59–88.

Neary, M. A., & Chen, S. X. (2017). Artificial intelligence: Legal research and law librarians. AALL Spectrum, 21(5).

Neill, S. D. (1980). Canadian libraries in 2001. Parabola Systems.

Noble, S. (2018). Algorithms of oppression: How search engines reinforce racism. New York University Press.

Nyce, J. M., & Kahn, P. (Eds.). (1991). From Memex to hypertext: Vannevar Bush and the mind’s machine. Academic Press.

Padilla, T. (2019). Responsible operations. Data science, machine learning, and AI in libraries. OCLC Research.

Padilla, Thomas, Allen, L., Frost, H., Potvin, S., Roke, E. R., & Varner, S. (2019). Always already computational: Collections as data.

Peters, S. E., Zhang, C., Livny, M., & Ré, C. (2014). A machine reading system for assembling synthetic paleontological databases. PLOS ONE, 9(12), e113523.

Pinfield, S., Cox, A. M., & Rutter, S. (2017). Mapping the future of academic libraries. SCONUL.

Pulla, P. (2019). The plan to mine the world’s research papers. Nature, 571(7765), 316–318.

Reidsma, M. (2016, March 11). Algorithmic bias in library discovery systems.

Reidsma, M. (2019). Masked by trust: Bias in library discovery. Litwin Books.

Ridley, M. (2019). Explainable artificial intelligence. Research Library Issues, 299, 28–46.

Robertson, C. A. (1990). Designated searchers with expert systems support: A model for the delivery of online information to scientists. In M. Gorman (Ed.), Convergence: Proceedings of the Second National conference of the Library and Information Technology Association, October 2-6, 1988, Boston, Massachusetts (pp. 183–189). American Library Association.

Rolan, G., Humphries, G., Jeffrey, L., & Samaras, E. (2018). More human than human? Artificial intelligence in the archive. Archives and Manuscripts, 1–25.

Ronzano, F., & Saggion, H. (2015). Dr. Inventor framework: Extracting structured information from scientific publications. In N. Japkowicz & S. Matwin (Eds.), Discovery Science (pp. 209–220). 18

Rubin, V. L., Chen, Y., & Thorimbert, L. M. (2010). Artificially intelligent conversational agents in libraries. Library Hi Tech, 28(4), 496–522.

Schafer, B., Komuves, D., Zatarain, J., & Diver, L. (2015). A fourth law of robotics? Copyright and the law and ethics of machine co-production. Artificial Intelligence and Law, 23(3), 217–240.

Schoenenberger, H., Chiarcos, C., & Schenk, N. (2019). Preface. In Beta Writer, Lithium-Ion Batteries: A Machine-Generated Summary of Current Research (pp. v–xxiii). Springer.

Schonfeld, R. C. (2017, July 18). Defining a new content type: The exploratory resource. The Scholarly Kitchen.

Seeber, K. (2018). Teaching CRAAP to robots: Artificial intelligence, false binaries, and implications for information literacy. Critical Librarianship & Pedagogy Symposium, Tucson, AZ.

Smith, D. E. (1989). Reference expert systems: Humanizing depersonalized service. The Reference Librarian, 23, 177–190.

Smith, L. C. (1976). Artificial intelligence in information retrieval systems. Information Processing and Management, 12(3), 189–222.

Smith, L. C. (1981a). Citation analysis. Library Trends, 30(1), 83–106.

Smith, L. C. (1981b). Representation issues in information retrieval system design. ACM SIGIR Forum, 16(1), 100–105.

Smith, L. C. (1986). Knowledge-based systems, artificial intelligence and human factors. In P. Ingwersen, L. Kajberg, & A. M. Pejtersen (Eds.), Information technology and information use (pp. 98–110). Taylor Graham.

Smith, L. C. (1987). Artificial intelligence and information retrieval. In M. E. Williams (Ed.), Annual Review of Information Science and Technology (pp. 41–77). Elsevier.

Smith, L. C. (1989). Artificial intelligence: Relationships to research in library and information science. Journal of Education for Library and Information Science, 30(1), 55–56.

Sparck Jones, K. (1991). The role of artificial intelligence in information retrieval. Journal of the American Society for Information Science, 42(8), 558–565.

Special Libraries Association. (1991). Expert systems and library applications: An SLA information kit. SLA.

Swanson, D. R. (1990). Medical literature as a potential source of new knowledge. Bulletin of the Medical Library Association, 78(1), 29–37.

Taskin, Z., & Al, U. (2019). Natural language processing applications in library and information science. Online Information Review, 43(4), 676–690.

Tavosanis, M. L. A. (2017). Libraries, linguistics and artificial intelligence: J. C. R. Licklider and the libraries of the future. JLIS.It, Italian Journal of Library and Information Science, 8(3), 137.

Tay, A. (2017, April 9). How libraries might change when AI, machine learning, open data, block chain & other technologies are the norm. Musings about Librarianship.

Tshitoyan, V., Dagdelen, J., Weston, L., Dunn, A., Rong, Z., Kononova, O., Persson, K. A., Ceder, G., & Jain, A. (2019). Unsupervised word embeddings capture latent knowledge from materials science literature. Nature, 571(7763), 95–100.

Walters, E. (2017). Artificial intelligence libraries. AALL Spectrum, 22(1), 21–23.

Watson, S. M. (2016). Towards a constructive technology criticism. Tow Center for Digital Journalism, Columbia University.

Wetherington, C., & Wagner, S. (2019). A comprehensive approach to algorithmic machine sorting of Library of Congress call numbers. Information Technology and Libraries, 38(4), 62–75.

Whitehair, K. (2016). Libraries in an artificially intelligent world. Public Libraries Online.

Wittmann, R., Neatrour, A., Cummings, R., & Myntti, J. (2019). From digital library to open datasets: Embracing a “collections as data” framework. Information Technology and Libraries, 38(4), 49–61.

Yelton, A. (2019). HAMLET: Neural-net-powered prototypes for library discovery. Library Technology Reports, 55(1), 10–15.

Nature vs Nurture: AI Style

There’s an interesting debate going on the in AI community. It has actually been happening for a while, it’s just that recently it has become more public and more personal.

Not exactly the old nature vs. nurture discussion but something similar. The essential question is: What is intelligence and how do AI-agents become intelligent (more intelligent)? Let’s simplify the debate and make it about two people: Rich Sutton and Gary Marcus.

Rich Sutton is the leading proponent of reinforcement learning (the trial, error, and reward based learning often associated with humans).

Gary Marcus is a cognitive scientist who has often called for the return of symbolic AI (e.g. GOFAI) and is currently advancing what he calls “robust AI”.

In a nutshell:

Sutton, The Bitter Lesson (2019), believes that intelligence is computation and that all we need to do is leverage computational scale and intelligent agents/systems (actions) will emerge. All the human knowledge, building in what AI folks call “priors”, is a “distraction” – not worth the effort. General systems always win.

Score one for nurture.

Marcus, The Next Decade in AI: Four Steps Towards Robust Artificial Intelligence (2020), does not dispute that computation has a role in intelligence, he just doesn’t think it’s sufficient or even efficient. People learn, in part, because they have “priors” – innate understand and knowledge. We use that as building blocks to create more knowledge as we experience new situations. Why not, says Marcus, use those priors in AI to speed understanding and knowledge creation?

Score one for nature and nurture.

From my perspective, both approaches are interesting and, frankly, valid (in their own way). The difference for me is the outcome. From my bias and interest in machine information behaviour, the agent resulting from the strategy of Sutton or Marcus will behave differently. And that’s OK. I think. Grin.

Join the debate.


Green AI?

Is machine learning an environmental hazard or an environmental solution?

The answer is, apparently, “yes”.

Recently a number of papers have focused attention on machine learning and climate change. Interesting findings.

Tackling Climate Change with Machine Learning” (Rolnick et al. 2019) is a manifesto published in advance of the NeurIPS conference. This extensive and detailed report outlines many ways in which applying ML can have a positive impact on addressing significant aspects of climate change. In summary:

“ML can enable automatic monitoring through remote sensing (e.g. by pinpointing deforestation, gathering data on buildings, and assessing damage after disasters). It can accelerate the process of scientific discovery (e.g. by suggesting new materials for batteries, construction, and carbon capture). ML can optimize systems to improve efficiency (e.g. by consolidating freight, designing carbon markets, and reducing food waste). And it can accelerate computationally expensive physical simulations through hybrid modeling (e.g. climate models and energy scheduling models).”

A report from the AI Now Institute, “AI and Climate Change: How they’re connected, and what we can do about it” (Dobbe & Whittaker, 2019), is not so optimistic:

“The estimated 2020 global footprint [of the tech industry] is comparable to that of the aviation industry, and larger than that of Japan, which is the fifth biggest polluter in the world. Data centers will make up 45% of this footprint (up from 33% in 2010) and network infrastructure 24%.”

They conclude that overall, “we see little action to curb emissions, with the tech industry playing a significant role in the problem.”

While the Rolnick et al. report illustrates that applying ML to environmental challenges has been and will continue to be productive, the story is a bit different when looking at the environment challenges of training the ML models to do this very work.

Strubell et al., “Energy and Policy Considerations for Deep Learning in NLP” (2019), estimate that “training BERT [a widely used NLP model] on GPU is roughly equivalent to a trans-American flight.” The authors of “Green AI” (Schwartz et al., 2019) note that the amount of compute required to train a model has increased 600,000 times (!) since 2013. More and more data, millions of parameters, and hundreds of GPUs. And it’s getting worse. They advocate “making efficiency an evaluation criterion for research alongside accuracy and related measures. In addition, we propose reporting the financial cost or “price tag” of developing, training, and running models to provide baselines for the investigation of increasingly efficient methods.”

Whatever the directions taken, the ML community, and the tech industry more generally, are going to have to take their environmental impact much more seriously. The role of environmental solution is possible but not at the increased expense of environmental hazard.


Contesting the State of AI

Periodically the AI field has entered an “AI Winter” where the dominant paradigm seems to have run its course and researchers look for new options.

Are we entering another AI Winter?

Three recent books suggest not so much renewed stormy weather as a need to broaden perspectives … some looking backward, some merely looking around.

The basic questions raised are simple: Is Deep Learning (the state of the art in machine learning) sufficient? Is it the path to towards more intelligent machines (even AGI – artificial general intelligence).

Stuart Russell. Human Compatible (2019).

Russell is widely known as the co-author of Artificial Intelligence: A Modern Approach (3rd ed. 2009), the definitive textbook in the field. In the past few years he has been exploring the concept of “beneficial AI” and this book further articulates that concept.

“The history of AI has been driven by a single mantra: ‘The more intelligent the better.’ I am convinced that this is a mistake.”

Russell. Human Compatible (2019)

Russell’s concern is that the current path of increasing AI autonomy fueled by more data, opaque algorithms, and enhanced computing will lead to a loss of control by humans. Not as bleak as Bostrom’s Superintelligence (2014), Russell’s solution is a design concept: make intelligent systems defer to human preferences.

Russell has three guiding principles:

  1. The machine’s only objective is to maximize the realization of human preferences.
  2. The machine is initially uncertain about what those preferences are.
  3. The ultimate source of information about human preferences is human behavior.

Putting humans at the center of intelligent machines seems reasonable and certainly desirable. But will it be effective and advance AI?

Gary Marcus & Ernest Davis. Re-Booting AI (2019).

The concern of Marcus (a long standing and vocal critique of Deep Learning) and Davis is related to Russell’s but the focus is different: not a control problem but a myopic problem – AI “doesn’t know what its talking about”; it doesn’t actually “understand” anything.

“The cure for risky AI is better AI, and the royal road to better AI is through AI that genuinely understands the world.” p. 199

Marcus & Davis. Re-Booting AI (2019)

And the way to understand the world is through “common sense”. In part this looks back to the symbolic (logic) representations of GOFAI (“Good Old Fashioned AI”) and it part it is about training AI about “time, space, causality, basic knowledge of physical objects and their interactions, basic knowledge of humans and their interactions.” Getting there requires us to train AI like children learn (an observation Turing made in 1950).

Brian Cantwell Smith. The Promise of AI (2019)

Smith picks up the issue of “understanding the world” and argues that AI must be “in the world” in a more visceral way – “deferring” to the world (reality) as we do. Two key concepts standout: judgment and ontology.

Judgment: Smith makes the distinction between “reckoning” (which most machine learning systems accomplish – calculation and prediction) and “judgment” which he views as the essence of intelligence and the missing component in AI.

Ontology: Smith contends that machine learning has “broken ontology.” It has given us a view of the world as more “ineffably dense” than we have ever perceived. The complexity and richness of the world require us to conceptualize the world differently.

The arguments about judgment and ontology converge in a discussion about knowledge presentation and point the way for machine learning to transcend its current limitations:

“If we are going to build a system that is itself genuinely intelligent, that knows what it is talking about, we have to build one this is itself deferential – that itself submits to the world it inhabits , and does not merely behave in ways that accord with our human deference.”

Smith. The Promise of AI (2019)

This book celebrates the power of machine learning while lamenting its shortcomings. However:

“I see no principled reason why systems capable of genuine judgment might not someday be synthesized – or anyway may not develop out of synthesized origins.”

Smith. The Promise of AI (2019)

Good books. All worth your time IMHO.


An AI Journal Club (for everyone)

The Collaboratory at the Ryerson University Library is starting a journal club to explore AI/ML. With this club, Jae Duk Seo, a graduate student at Ryerson, and I want to encourage a multidisciplinary perspective on AI/ML.

Too often the various disciplinary communities interested in AI/ML (computer science, engineering, mathematics, social sciences, humanities, etc.) talk among themselves …. to the detriment of an inclusive discussion.

This journal club will select topics and associated reports or papers that implicate a wide variety of disciplines (e.g. deepfakes, facial recognition, green AI, diversity, ethics, disinformation). Participants will bring their own readings of the issues and allow the group to explore the topic in ways that will open up new insights.

The hope is that exploring both common ground and different perspectives among the disciplines will create a challenging interchange for all. We want to encourage an informed discussion that incorporates all the aspects of AI/ML – technical, social, economic, and political.

Initially this group will meet in person; hopefully in the future we can accommodate both in-person and online.

Details on the first meeting to follow soon.


Training Datasets, Classification, and the LIS Field

At the core of machine learning are training datasets. These collections, the most common are images, have labels (metadata) describing their contents and are used by an algorithm to learn how to classify them. A portion of the dataset is reserved for validation – testing the learned model with new, previously unseen, data. If all goes well, the model is then ready to classify entirely new data from the real world.

There are many such datasets and they are used repeatedly by AI researchers and developers to build their models.

And therein lies the problem.

Issues with datasets (e.g. lack of organizing principles, biased coding, poor metadata, and little or no quality control) result in models trained with those problems and reflecting this in their operation.

While over reliance on common datasets has long been a concern (see Holte, “Very simple classification rules perform well on most commonly used datasets”, Machine Learning, 1993), the issue has received widespread attention because of the work of Kate Crawford and Trevor Paglen. Their report, Excavating AI: The Politics of Images in Machine Learning Training Sets, and their online demonstration tool, ImageNet Roulette (no longer available as of September 27th), identified extraordinary bias, misidentification, racism, and homophobia. Frankly, it will shock you.

Kate Crawford and Trevor Paglen 
(with their misidentified classifications from ImageNet Roulette
Kate Crawford and Trevor Paglen
(with their misidentified classifications from ImageNet Roulette

Calling their work the “archeology of datasets”, Crawford and Paglen uncovered what is well known to the LIS field: all classification is political, social, and contextual. In essence, any classification system is wrong and biased even if it is useful (see Bowker & Star, Sorting Things Out, 1999).

From an LIS perspective, how is ImageNet constructed? What is the epistemological basis, the controlled taxonomy, and the subclasses? Who added the metadata, under what conditions, and with what training and oversight?

ImageNet was crowdsourced using Amazon’s Mechanical Turk. Once again, therein lies the problem.

While ImageNet did use the WordNet taxonomy to control classifications, it is not clear how effectively this was managed. The results uncovered by Crawford and Paglen suggest not very much. This year many training datasets were taken offline or made unavailable, and many were severely culled (ImageNet will remove 600,000 images). However, these datasets are important; ML relies on them.

Bottom line: the LIS field has extensive expertise and practical experience in creating and managing classification systems and the requisite metadata. We are good at this, we know the pitfalls, and it is clear and compelling opportunity for LIS researchers and practitioners to be centrally involved in the creation of ML training datasets.


“Hey Google, what do you know about virtual assistants and libraries?”

Who’s staffing the reference desk or the library chatlines? These days, or in the near future, it might be Google Assistant, Alexa, Cortana, or Siri. Library users may increasingly turn to virtual or personal assistants before they interact with specific library services. And why not, they appear to be getting quite good. 

In 2018 Perficient Digital tested Alexa, Cortana, Google Assistant and Siri with nearly 5,000 questions :

Comparing Digital Personal Assistants

BTW, they also tested which assistant was the funniest by tracking the jokes they made in response to some questions. “What is the meaning of life?” Siri: “All evidence to date suggests it’s chocolate.”

Results like this intrigued Amanda Wheatley and Sandy Hervieux of the McGill University Library. As a result, they initiated a multi-phase research project to explore the awareness of AI among libraries and librarians, their use of this technology, and what their expectations are for the future.

Amanda Wheatley and Sandy Hervieux

They believe AI will “change the nature of our work but won’t take our jobs.” AI will not displace librarians and library staff but operate as “an immersive environment where we coexist.” From their perspective “AI is not one thing” but an array of options and opportunities to be used in thoughtful ways. However, it is time to be proactive not reactive; we should lead in the use of this technology not be used by it.

Phase 1 (completed): an environmental scan of libraries and their use of AI as indicated in strategy plans or other documentation. The result? Not too much happening. This could be a lack of funds for technology innovation or it might be a concern about the nature of the technology.

Phase 2 (in process): a broad survey of libraries and librarians to assess their awareness and expectations of AI. That survey is currently live. The deadline for responses is September 6, 2019. You are encouraged to participate!

Phase 3 (in process): testing various devices with sample reference questions. The first test pitted Google against Siri with Google a clear winner. It responded by summarizing information, presenting relevant graphs and charts, and providing credible research materials … “it was terrifying!”. They are now starting to work with the Alexa Skills Kit to teach Alexa new library skills.

Phase 4 (planned): an AI experience in the McGill libraries to give the community a hands-on opportunity to explore the technology. 

If you want more information about their work, visit their guide to the project or contact them via email: Amanda Wheatley ( or Sandy Hervieux (

Amanda and Sandy are editing a book for ACRL on the use of AI in libraries. A call for chapters will go out in the fall.

[UPDATE: the call for chapters for this book is out. Deadline for proposals is November 17.]

Lots of interesting research to follow. Looking forward to hearing about their progress.


Unsupervised Text Mining

While AI text mining is not new, this article presents a new development that has important implications for research libraries:

Tshitoyan, V., Dagdelen, J., Weston, L., Dunn, A., Rong, Z., Kononova, O., … Jain, A. (2019). Unsupervised word embeddings capture latent knowledge from materials science literature. Nature, 571(7763), 95–100.

Of course, it’s from Nature; it’s behind a paywall. Sigh. Hopefully you are able to obtain a copy.

Using unsupervised methods of text mining in the area of materials science, the authors have demonstrated “that latent knowledge regarding future discoveries is to a large extent embedded in past publications.” The discoveries of the future were evident in the literature of past.

Using current and past literature, these approaches “have the potential to unlock latent knowledge not directly accessible to human scientists.”

“Such language-based inference methods can become an entirely new field of research at the intersection between natural language processing and science, going beyond simply extracting entities and numerical values from text and leveraging the collective associations present in the research literature.”

Interestingly, this possibility was explored much earlier during the formative years of MEDLINE albeit with less sophisticated tools:

Swanson, D. R. (1990). Medical literature as a potential source of new knowledge. Bulletin of the Medical Library Association, 78(1), 29–37.

The Tshitoyan et al. research is an exciting development using ML approaches that should become standard tools for research libraries. It is well worth your consideration. It is also, therefore, a concern that this work goes on without any involvement from libraries or those with LIS expertise.


An AI-Authored Scholarly Book

Earlier this year Springer Nature published an open access book written by AI: Lithium-Ion Batteries: A Machine-Generated Summary of Current Research. The author is identified as “Beta Writer”.

Beta Writer algorithmically categorized and summarized more than 150 key research publications selected from over 1,000 published from 2016 to 2018. I’m no expert on lithium-ion batteries so others will have to weigh in on whether this is a credible book . However, a book that synthesizes and summarizes a large and complex corpus of current research literature is a valuable contribution.

The process of the book, a combination of various “off the shelf” natural language processing (NLP) tools, preprocesses the documents to address various linguistic and semantic normalizations, clusters documents by content similarity (i.e. chapters and sections of the book), generates abstracts, summaries, introductions, and conclusions, and outputs XML as a final manuscript. And it does so in a manner that is sensitive to copyright infringements. The details are outlined in a human written Preface (Henning Schoenenberger, Christian Chiarcos, and Niko Schenk) and provide an interesting comparison to current cataloguing and metadata processes and theories.

Book Production Workflow

In an interview published in The Scholarly Kitchen, Schoenenberger was clear that the intent is “to initiate a public debate on the opportunities, implications and potential risks of machine-generated content in scholarly publishing.” This book is far from perfect and Springer acknowledges that. Commendably, Springer has gone to great lengths to document their process, discuss alternative strategies, identify weaknesses and outright failures, and to encourage critical commentary.

We foresee that in future there will be a wide range of options to create content – from entirely human-created content, a variety of blended man-machine text generation to entirely machine-generated text.

Henning Schoenenberger, Director Product Data & Metadata Management at Springer Nature

Future projects will have an “emphasis on an interdisciplinary approach, acknowledging how difficult it often is to keep an overview across the disciplines.” This is intriguing given the importance of interdisciplinarity and the challenges of tracking concepts in new, unfamiliar fields.

Reviewers of the book argue that it’s not actually a book because it lacks a narrative, a integrating storyline. Agreed. But frankly our definition of “a book” has always been, and remains, fairly elastic. So, it’s a book; just a different book. And it’s a very interesting book at that.


Welcome to Library AI

Algorithmic decision-making arising from machine learning is ubiquitous, powerful, often opaque, sometimes invisible, and (most importantly) consequential in our everyday lives.

Machine learning (ML) is critically important for libraries because it offers new tools for knowledge organization and knowledge discovery. It also, however, presents significant challenges with respect to fairness, accountability, and transparency.

I believe that artificial intelligence will become a major human rights issue in the twenty-first century.

Safiya Noble (2018). Algorithms of Oppression.

This blog will attempt to chart ML developments and issues in libraries and to identify trends in the wider AI community that impact libraries.

“The danger is not so much in delegating cognitive tasks, but in distancing ourselves from – or in not knowing about – the nature and precise mechanisms of that delegation”

de Mul & van den Berg (2011). Remote control: Human autonomy in the age of computer-mediated agency.

Libraries have often been instrumental in championing new technologies and making them more accessible. As we adopt and develop ML tools and services, something I think is an imperative if we are to advance our mission, we also need to be aware of the emerging “new digital divide”:

A class of people who can use algorithms and a class used by algorithms.

David Lankes (Director, SLIS, Univ. of Southern Carolina).

Looking forward to this journey. Let me know what you think.