Data Sharing Risks And Rewards

By: Joseph Cazier, Walter Haefeker, Edgar Hassler

For Hobbyist Beekeepers.

Introduction

In our September Bee Culture article, “BeeXML Part I: The Power of Big Data and Analytics,” we discussed how data science can use standardized data to help bees and beekeepers everywhere. In October we followed up with another article titled, “BeeXML Part II – Achieving the Goal of Standardized Data,” which focused on the technology for collecting and aggregating the data necessary to achieve the potential benefits that big data has to offer. In this month’s article, we focus on privacy and information sharing risks and benefits. 

We begin with a discussion on Privacy Risk Theory to explain the risks in sharing data, then move to discuss the various types of data sharing, going from keeping one’s own records, to sharing summary data, to sharing detailed information with a small group or club, to full data sharing with a non-profit or other trusted, non-governmental third party.

Privacy Risk Theory

In the June 2018 issue of Bee Culture, we introduced the concept of the Technology Acceptance Model (TAM) in an article titled, Nudging Beekeepers Into the Future With the Technology Acceptance Model1. The model posits that there are three main factors influencing a consumer’s decision to use consumer software similar to www.HiveTracks.com. These are:

Ease of Use: How hard or easy the software is to use

Usefulness: How useful the software is to the user

Enjoyment: How enjoyable the software is to use, sometimes referred to as hedonism

What we did not have time to address in that article, but are returning to now, is an extension to the Technology Acceptance Model, which looks specifically at privacy risks and how they can influence the use of software. A scientific article by Cazier et. al. in 20072 framed the issue by breaking privacy risks into two parts and adding them to the other primary constructs in the TAM Model3.  These parts are:

Privacy Risk Likelihood (RL): This is the probability or likelihood that someone’s privacy will be violated.

Privacy Risk Harm (RH): This is the level of damage that could occur in the event of a privacy breach.

It is useful to break the risks into two parts. If a privacy disclosure is very unlikely, but can do great harm if discovered (i.e. finding out a beekeeper is using stolen hives to sell pollination services to almond growers to increase his/her revenue as shown in Figure 1.), a beekeeper might choose one set of behaviors.  As the likelihood of the discovery (Risk Likelihood) increases, the (bad) behavior is expected to be reduced.

From Risk Harm we can look at how damaging something can be to someone if it was exposed or acted on in an inappropriate or harmful way. For example, one of the authors of this article loves the TV Series Star Trek.  Disclosing this is potentially a privacy breach, but one that has very little risk of harm attached to it. Figure 2. contains an image from the original article (cited above) introducing this concept.

The idea is that by separating out these concepts into how likely they are to occur and how much harm could happen if they were to occur, we can better predict someone’s behavior, particularly around the idea of willingness to use an information system to track their bee data. Then, by taking the value or estimate for each component (RL and RH), you can combine them for a total estimate risk score. You can then compare different levels of risk for each dimension and use that information to make better decisions.

For example, if, on a one to 10 scale, the harm that American Foulbrood (AFB) is rated near 10 (see Figure 3.), but the likelihood of an infestation is near one or two, you can then combine these numbers for a total risk score of 10 X 2 = 20 out of 100 possible5.  Numbers and risk scores will vary based on geography, genetics, and a number of other conditions.

However, rather than just saying the risk is high, if we break them into these components, the nebulous risk can be assessed, categorized, and weighted to help us make better management decisions on what actions (or inactions) to prioritize.

This approach also gives us the ability to compare various risk scores. For example, if the RH is a seven for a Varroa infestation (perhaps due to treatment options), but its likelihood is a four given its prevalence in a given area, then the total risk score would be a (7 X 4) = 28, ranking it relatively higher than the AFB for those hives. This in turn would influence where rational beekeepers should spend their limited time, attention, and resources in taking care of their bees as they weigh the various threats.

This same concept applies to privacy risks.  If the risk of some data being exposed is relatively real (A hobbyist beekeeper forgets to log out of the system on a public computer [such as at the library]  and becomes worried that someone will see his/her bee records), but the harm is low (someone might know how much honey the beekeeper produced last year), then the total risk can be calculated as two (someone looks instead of simply logging the beekeeper off) X 1 (someone knows the beekeeper’s honey number) = 2/100.  At which point the beekeeper might not be overly concerned when realizing he/she forgot to log off the computer.

Please note, however, that when it comes to information sharing, just as there are risks and benefits in sharing information, there are also risks and benefits in not sharing information or in sharing fake or misleading data. 

Hobby vs. Commercial Beekeepers

The discussion of privacy mainly applies to hobbyists who perceive sharing data about their hobby as something related to their personal privacy. In this context, the issue is part of a larger civil liberties discussion. For hobbyists, the Risk Harm is generally very low, except when they are in a highly regulated environment where society (government) might impose costs or restrictions on them that they do not see as valuable.

On the other end of the spectrum, business and hive data are classic trade secrets for commercial beekeepers because they tend to guard the data, as any privately-held enterprise would, very closely for very good reasons. For commercial beekeepers and large sideliners, the Risk Harm is generally much greater than for hobbyist beekeepers.

To give each group proper attention, we will focus on hobby beekeepers in this article and commercial beekeepers in a future article. For the purpose of this discussion, we define a hobbyist as someone who keeps his/her bees mostly for enjoyment and who does not earn any meaningful income from their bee operation.

Information Sharing for Hobby Beekeepers

There are several incentives and a couple of discentives for hobbyists to share information and they differ depending on whether the beekeeper is in a heavily regulated environment or a largely unregulated one. Additionally, many beekeepers perceive some regulations as helpful and others as not very well thought out.

For example, in the U.S. there is the recent requirement for beekeepers to consult with a veterinarian before utilizing antibiotics.6 While some might argue that this regulation is good for society in that it can reduce antibiotic use and slow the development of resistance to it, others have criticized it as ineffective and expensive, and noted that many rural veterinarians are focused on large animals and have limited experience with bees, often relying on the beekeeper for added expertise.

The potential for Risk Harm, at least from the beekeeper’s perspective, seems to center around who the data is being shared with and in what form. Sharing with a group of friends or a non-profit seems to bear little risk harm or risk likelihood, especially if proper societal and management controls are in place to protect the individual.

However, when it comes to sharing information with the government, many may have a different view. Some may perceive elevated risk harm because they are doing something the government does not want them to do (e.g. avoiding hive taxes in Europe or using off label treatment methods in the U.S.). Other concerns for hobbyists may stem from a lack of trust driven by their perception of the government’s motivations or ability to help (or interfere).

Thus, the risk harm seems to stem from these three possibilities:

1. Reputational Risk: A loss of reputation among friends, a small group, a private company, or a non-profit organization

2. Compliance Risk: Risk of fines or punishment if doing something not permissible by the government (avoiding fees, taxes, registration or using off label treatment) and

3. Regulatory Risk – Risk of government interceding  in the beekeeping operation in a way that the beekeeper would perceive as unwelcome, unnecessarily, without cause, or for dubious reasons, while also not interceding when needed.

There are many benefits to recording and sharing information about your bees.  Let’s review some of them by looking at recording and sharing separately. 

Keeping Records for Yourself

There are several benefits for hobby beekeepers to record their data. Many of these were discussed in our article in our May 2018 Bee Culture article titled, Electronic Records: A Path to Better Beekeeping. Here is a summary of a few key ones below.

Best Management Practices: Recording management actions and treatments to avoid redundancy or missing a necessary action.

Personalized Hive Management: Remembering how the colony was doing at a given time to see changes in state or to identify the best queen from which to split a colony and to learn what works with your bees in your area.

Business Management: Understanding revenue and expenses for your operations and other factors to optimize profit and productivity.

Research: Keeping quality and consistent records. A hallmark of good science for generations, good records help us learn valuable information about bees and beekeeping, especially when combined with other data and when available at scale.

Documentation: Keeping records for legal or regulatory concerns for government or other reporting requirements. Other times, we might need good records to settle insurance or legal claims.

The benefits listed above will most directly benefit individual beekeepers in managing their hives in a traditional manner, with some benefits going to society as beekeepers and researchers share general knowledge in the form of best practices. More detailed records of better quality, such as that done in a scientific research study, can yield additional benefits, but they also have to be balanced with the cost of collecting that information and the usefulness of the knowledge gained.

Reputational Risk: The Privacy Risk Harm and Risk Likelihood to this type of data keeping is very low, given that the records are kept and controlled by the individual, especially for hobby beekeepers who keep bees for reasons other than their personal livelihood.

Compliance Risk: The risk of sanctions are low for keeping your own records. It is possible that if there is a problem, the court may subpoena a beekeeper’s records and use them as evidence in a case; it is also likely the records could be used to clear them, therefore it may be a wash.

Regulatory Risk: Since the beekeeper is keeping records for his/her own use, the risk is primarily that government entities will make decisions with incomplete or overly generalized information.

Sharing Summary Information

The next level of value comes from sharing information from the records that are collected. For some people this is from sharing information about their experiences or collecting observational data from a wide variety of sources. For others it is the scientific reports that analyze the results of experiments and observations and then share general principles.

Figure 4. Colony Loss Map from BIP.

In addition to the scientific reports, a good example of this type of information sharing would be the annual survey for colony losses done by the Bee Informed Partnership (BIP) as shown in Figure 4. You may also see similar benefits from governments or other groups collecting and sharing summary information.

There are critical benefits to this. This type of summary sharing can inform legislative policies, guide grant funding to address critical issues, and bring attention to the problems beekeepers are facing.  It also gives us data to see more clearly what things are or can work to help address at least some of the problems.

Reputational Risk: The risk of sharing summary information is low. As information is generally anonymized and aggregated by researchers, the harm of having it shared is low and the likelihood of a breach is mitigated by trust in the researchers.

Compliance Risk: This risk is also low as records are generally anonymized.  If sharing with the government, as required in parts of the world, there could be some risk if the data is not anonymized. Both harm and likelihood are minimal.

Regulatory Risk: Here we may be looking at trust in the organization receiving the data. If hobbyists believe in the ability of the organization to use their data to help bees, such as a group like BIP, then they may make the effort to share. Both harm and likelihood appear to be minimal.

Sharing Detailed Information with a Small Group

The next level of data collection begins to move beyond the individual summary information to sharing or pooling detailed information into larger groups. This move is very important as each hive is unique in its history, climate, genetics, disease profile, and bee behaviors. By going beyond simply sharing and analyzing summary information to sharing details and key information, we can move from general knowledge to specific knowledge.

Figure 5. HiveTracks.com Community Feature for Small Groups and Clubs.

For example, you could pool data to address a common problem in a region that a group of beekeepers face. Because you have larger numbers and are perhaps in a similar environment with similar challenges, pooling your data might help to address a problem by giving you greater statistical power to detect an effect because larger numbers help control for the variation. This can be very helpful to a club, county, or region faced with a geographic, genetic, or environmental problem where additional data could help quickly identify a solution.

Another advantage to sharing detailed information within a group can come in the form of a warning system, similar to those of old set up to monitor and fight an invading army or common enemy. This is generally done through a vector analysis and alert system. For example, a friend of ours, Michael Rubbingg, the Chief Science Office for the Austrian Beekeepers Association, has successfully set up a Varroa alert system across Austria. By sharing specific, detailed data among beekeepers in different locations across the country, they can track the movement of varroa and warn beekeepers when the risk is high for them to be infected. This gives beekeepers time to prepare and take preemptive action to prevent or reduce harm.

Figure 6. Mite Load Graphics from HiveTracks.com

Clubs, counties, and regions could take a similar approach with help from Apiary Management Software such as HiveTracks, to choose the most pressing issue their members face and collect detailed data on it across the region. Queen breeders could also team up with beekeepers to move beyond selling queens to better tracking how their queens are doing in different geographies with different threats. It could enable queen breeders and their clients to share very detailed information within their group that can give specific (as opposed to general) knowledge of what is best for a client in a given location under given circumstances at a given time. In doing so, queen breeders and beekeepers alike can create and share in the value derived from increased efficiency, effectiveness, and profit.

Reputational Risk: For the hobbyist, sharing this type of information still bares little risk, especially when compared to the potential reward. Perhaps some embarrassment among their peers, or pressure to treat their hives in a different manner if the group thinks it is better for all. However, the risk harm is still low, though the likelihood may increase due to the detail.

Compliance Risk: Unless sharing with the government, the likelihood of sanctions are low. Indeed, one may learn of inadvertent non-compliance or more non-intrusive means of compliance. If shared with government entities however, the harm can be significant.

Regulatory Risk: If there is little trust in the ability or goodwill of the group to help, there is little benefit to sharing data. However, the risk is generally small unless beekeepers believe a group may mandate participation in an activity they do not support.

Building a Global Repository of the Worlds Standardized Bee Data

If we go a step farther, we get to the real sharing of data from multiple vendors and sources into a common data platform where we can apply advanced data analytics and machine learning techniques to build what we have been calling the Genius Hive. In last month’s article (November 2018) we reviewed some of the benefits of sharing detailed information worldwide.  Here are a few benefits from that article by way of review.

Hive Placement Optimization: Determine the best location to place your bees, optimized for proper forage and environmental conditions for bees, honey production, and crops.

Status Alerts: Provide updates on the current state of the hive, such as problems with the queen, pests, or pathogens.

Predictive Alerts: Use predictive analytics to anticipate problems before they start and send alerts. 

Treatment Optimization: Use data from thousands of outcomes of similar hives to guide which treatment options would be most likely to succeed for a given hive under given conditions.

Trend Analysis: Monitor regional and national trends in real time for better policy and response to incoming threats.

The risks associated with using such a platform are:

Reputational Risk: Reputational risk is minimal with proper safeguards as the data would go to a distant group who may not know the beekeeper.

Compliance Risk: If the purpose of the group is to learn from the data, the risk likelihood of sanctions in minimal, though the harm might be significant.

Regulatory Risk – If there is a lack of trust in the group doing the collection and analysis, there is little incentive to share data, but also little harm or likelihood. There is also the potential of added complexity of complying with international privacy laws, which may ultimately increase user privacy and security, though perhaps an added burden to the collecting organization.

Since we have covered this at length in other articles, suffice it to say that we believe the benefits of building a common data platform, with appropriate privacy and security controls, is a necessity for the long and short term survival and viability of bees and beekeepers everywhere, especially since these events are additive. Any benefit from the earlier types of sharing can also apply to sharing at this level, but with much more detail and possible benefit.

This type of information sharing offers the greatest potential benefit beekeepers. These benefits apply to both the individual beekeeper who will be able to use the features of a genius hive to make better decisions that ultimately benefit society as a whole as more bees survive. Yet the benefits are less concrete than simply keeping your own records, are harder to visualize, and have a greater time delay. For hobbyists, who keep bees because they love them, the risks are still low, especially with privacy protections in place to anonymize and protect their data. This minimizes both the likelihood of a privacy risk and the harm of a potential event, while maximizing the benefits data sharing.

Conclusion

The risk for hobbyists to share data is generally low, but the benefits for them and for beekeepers everywhere is very high. We hope that hobbyist beekeepers will participate in efforts to “save the bees” by becoming citizen scientists and sharing their data with groups that can credibly use the data to help us all.

We do note that there is at least a potentially significant risk when it comes to trust and government actions that warrants further investigation. There is also a very different risk profile for commercial beekeepers with large operations and trade secrets. We plan to write about trust, commercial beekeepers and privacy in the next few articles along with possible technical and policy solutions to address likely concerns.

Finally, special thanks to Project Apis m. for supporting a portion of this work with a Healthy Hives 2020 grant, to leaders at HiveTracks.com (Figure 7.) for sharing their thoughts on this topic and to the editors of Bee Culture for publishing this work. These efforts would not have been possible without visionary groups like this one providing support and resources. 

Please stay tuned next month for our planned follow up on building trust in the government sector to address some of these issues and the following month for a review of emerging technologies that can help address privacy concerns.


1Cazier, Joseph A., Wilkes, James T. and Hassler, Ed E. (2018) “Nudging Beekeepers Into the Future With the Technology Acceptance Model”, Bee Culture, June 2018 Issue. Pages 35-40.

2Cazier, J. A., Wilson, E. V., & Medlin, B. D. (2007). “The Role of Privacy Risk in IT Acceptance: An Empirical Study”. International Journal of Information Security and Privacy, 1(2), 61-73.

3Note that this was before the iPhone was launched and enjoyment became  a primary component of acceptance, hence the models from that time generally did not include that factor as important as they do today, this why it was not included in that original model focused on privacy risks.

4https://abc7.com/news/stolen-beehives-worth-nearly-$1-million-recovered-near-fresno/2004572/

5Of course these numbers are explaining the concept, real risk scores vary greatly based on a number of factors including geography, genetics, hive history, infestations in a region, migration patterns, time of year etc.

6https://www.beeculture.com/do-i-need-a-vet-for-my-bees/


Joseph Cazier is the Chief Analytics Officer for HiveTracks.com and the Director of the Center for Analytics Research and Education at Appalachian State University. You can reach him at joseph@hivetracks.com

Walter Haefeker is a professional beekeeper from Upper Bavaria, board member of the German Professional Beekeepers Association, as well as President of the European Professional Beekeepers Association.

Edgar Hassler, Ph.D is the Associate Director for Technology at the Center for Analytics Research and Education at Appalachian State University. You can reach him at hassleree@appstate.edu