Monday, September 30, 2019

[Santa Cruz County] California Grand Jury: Data Analytics Threaten Patron Privacy

Blog note: this opinion piece about the Santa Cruz County Grand Jury report raises national concerns.
Following an investigation into Santa Cruz Public Libraries’ (SCPL) use of Gale Analytics on Demand, a California grand jury reported on June 24 that the use of data analytics tools by libraries “is a potential threat to patron privacy and trust.” The report’s broadly negative view regarding the use of big data and analytics software raises several questions about library privacy policies and how they should apply to the use of any data collected about patrons by third parties, when patrons have not explicitly given libraries permission to use that data.
This finding wasn’t the result of a lawsuit. California’s Superior Court convenes 58 separate civil grand juries each year—one for each of the state’s counties. These carry out several functions, including “investigating and reporting on the operations of local government.” In this watchdog role, a grand jury acts as a representative for county residents, generating recommendations for improving operations and enhancing local government accountability. Any local government entity subject to an investigation is required to respond to the recommendations within 90 days. In this case, the investigation was launched in early 2019, in response to concerns raised by SCPL staff.
These recommendations are not legally binding, and the report explains that SCPL’s use of Analytics on Demand does not appear to have violated any state laws. In addition, SCPL Director Susan Nemitz told LJ that the combination of staff concerns about utilizing commercial big data software to analyze patron habits, and the sense that it would require a major initiative to integrate Analytics on Demand into the library’s marketing efforts, had already led SCPL leadership to discontinue use of the tool prior to the investigation.
“Even though it’s a relatively simple product” to use, she explained, library management ultimately decided that “it really would take a major staff effort to make it part of our institutional research processes. So I don’t think our experiments [with Analytics on Demand] really went very far.”
Analytics on Demand is built on Experian Mosaic, a demographic analysis and classification tool used by many businesses for neighborhood-level analysis of customers and potential customers. Mosaic classifies households into 19 groups and 71 unique types such as “middle-class melting pot” or “young, city solos.” Since it is driven by the vast trove of consumer data collected and aggregated by multinational credit-reporting agency Experian, the tool can generate a lot of information, reporting demographic composition and predicting consumer habits, product preferences, and the prevailing attitudes of neighborhoods—or even individual households.
SCPL officials had used an Analytics on Demand license provided by the Pacific Library Partnership (PLP) consortium for a handful of projects beginning in 2017, Nemitz said.
“We aren’t a large library system—we don’t have a huge marketing team—so we had a couple of staff…go to a [PLP] training at Oakland Public,” she explained. “For us, the interest was, we collect no demographic data on our users. Could we [use Analytics on Demand to] provide our funding bodies with some reports about demographic use? Proving that we are serving low-income patrons? Another thing that we looked at when temporarily closing a branch, was…where to put temporary services. We did do one marketing thing to try to figure out where history programs geared toward older adults might be best presented.”
These uses are typical for Analytics on Demand, and indicative of pressures common throughout the library field, including limited outreach budgets and a demand for specific information about a library’s usage and local impact from government and other funding bodies. Yet SCPL’s staff concerns are also reflective of the tension between the implicit promise of privacy for library users and the competition of library services with commercial entities, such as Amazon, that have expansive data collection and analysis policies built into their terms of service agreements.
According to the report, a key sticking point for concerned SCPL staff was that by inputting address information into Analytics on Demand, the library was downloading significant household-level data that patrons had never consented to give the library.
“This gets into the question of combining data sets,” explained Becky Yoose, Library Data Privacy Consultant for LDH Consulting Services. “You have patron data in your integrated library system. You have patron data collected by individual electronic systems, like your catalog, your web analytics software, your electronic resources, [and] authentication systems like EZproxy. The issue comes when you start combining this information in one central place—especially when you’re combining this information with other external datasets that might have other sensitive or ‘high-risk’ data,” including information that could personally identify a user.
In addition, SCPL staff expressed concern about how Gale might be using patron data generated by the platform. Noting that the grand jury report did not include any specific recommendations for Gale, company representatives declined to comment for this article. However, the report cited prior SCPL communication with Gale, in which the company stated that “Gale does not personally handle the library data. There is no need for someone outside the library to manually review, handle, or receive files, like there is with other services. All data is submitted to [Analytics on Demand] directly by the library. In other words, there is no data being ‘exchanged with third parties’…. When the tool generates reports, the library can delete the report at their discretion. There is nothing maintained by us or [any additional third] party. The only information [Analytics on Demand] requires to function is an address. We do not require a name or any other identifiable information that is not public record.”
These statements imply that libraries using Analytics on Demand are pulling data directly from Experian Mosaic via patron address ranges, and Gale is not storing or exchanging any resulting reports with other third parties. Still, the grand jury report found that the library’s use of Analytics on Demand was inconsistent with its policy on Confidentiality of Library Records and companion document, “Information We Keep About You,” which was most recently revised in 2010. Among its many recommendations, the report states that the use of any data analytics tools should be clearly addressed in privacy policies. Patrons should be informed about their use, and all vendor contracts should be thoroughly vetted to ensure that vendors protect the interests of patrons and libraries.
Carol Frost, CEO of PLP and executive director, Peninsula Library System, noted that the grand jury process is not yet complete (SCPL’s reply to the report is due September 23), and PLP wished to honor that process in comments to LJ. But she added that “the section of the report which applies to PLP has some points which all libraries should consider when signing contracts. PLP has an NDA (Non-Disclosure Agreement) which covers patron privacy as well as the non-sharing of data, and addresses most of the items listed in the recommendations. We think it is a best practice for all libraries to use an NDA as a supplement to an agreement when patron privacy is involved, as well as having patron privacy policies. Gale Cengage also has several documents which were not referenced in the Grand Jury report which outline the protection of data when using Analytics on Demand.”
PLP member libraries are located in communities throughout Silicon Valley, and the consortium is “acutely aware of data privacy,” Frost added. “The Facebook sharing of data last year, along with the California Consumer Protection Act (which goes into effect in January 2020) made our libraries start to think about their own data privacy policies. In January we decided to apply for a [Library Services and Technology Act] grant to explore that nexus between library policies and the Consumer Protection Act.”
The grant was awarded, and PLP has used the funding to develop California-specific training workshops, as well as “a resource toolkit for libraries on privacy-related topics surrounding library data privacy and digital safety, including privacy policy and procedure best practices, tips for library staff for working with vendors in sharing patron data, and an overview of the data privacy lifecycle in libraries,” according to an announcement regarding the funding.
SCPL will be one of the library systems taking advantage of these new classes and other resources this fall, Nemitz said. SCPL also has established a page on its website with a list of every third party vendor the library uses, along with links to the privacy policies of those vendors, login methods, data retained by each vendor, and other information at santacruzpl.org/data_privacy.
“I want to own that, clearly, we did not address staff concerns well enough” with the library’s use of Analytics on Demand, Nemitz said. Going forward, SCPL is facing a challenge that is becoming increasingly common within the field—meeting the expectations of patrons who have become accustomed to the seamless conveniences enabled by big data, while adhering to policies that promise privacy.
The grand jury report “keeps us talking about really important issues in our field,” Nemitz said. “And I don’t think there are perfect answers right now…. But we as professionals need to care, and we need to help our patrons understand a lot more about data privacy.”
August 12, 2019
Library Journal (an American trade publication for librarians)
By Matt Enis


No comments: