A Behavioral Scientist’s Initial Thoughts on the Tin Can API, Big Data, and Learning Analytics

I attended the DevLearn 2012 Conference this week and learned a lot more about the emerging Tin Can API. What is the Tin Can API? I’m still learning about it myself and am by no means an expert, but basically it is a set of standards and specifications for handling and structuring learning data. In some ways it represents the evolution, expansion, and modernization of SCORM (Sharable Content Object Reference Model), the primary standard used in e-learning for the past decade or so. While SCORM has been great in many ways, it can also be a huge pain to implement and work with, and has a number of key limitations in terms of what it can do or support.

I won’t spend a lot of time here explaining all of the dazzling new features and capabilities of Tin Can or how it addresses the limitations of SCORM. The Tin Can API website has plenty of information, as does Rustici Software, the primary developer of the API. I’ll summarize it with a quote from the Tin Can API website :

The Tin Can API is a brand new learning technology specification that opens up an entire world of experiences (online and offline). This API captures the activities that happen as part of learning experiences. A wide range of systems can now securely communicate with a simple vocabulary that captures this stream of activities. Previous specification were difficult and had limitations — the Tin Can API is simple and flexible. It lifts many of the older restrictions. Mobile learning, simulations, virtual worlds, serious games, real-world activities, experiential learning, social learning, offline learning, and collaborative learning are just some of the things that can now be recognized and communicated well with the Tin Can API.

Sounds cool, huh? And it is. Most people at DevLearn were quite excited about Tin Can, and rightfully so. I expect the use of Tin Can will grow rapidly in the world of corporate training and workforce development.

Tin Can and Higher Education

Although I think it will take a little longer for Tin Can to gain widespread adoption in higher education, where I focus most of my e-learning development time these days, I believe it will have a substantial impact there, as well. For colleges and universities, Tin Can represents an important, standardized way of collecting (and moving and sharing) “big data” on learning that can power analytics used to improve not only learning, but also student recruitment, retention, and engagement. By allowing students to incorporate both offline and informal learning activities into their “learning activity stream” (think Facebook timeline for nerds), faculty and administrators can gain a much more comprehensive look at a student’s learning and engagement. Those data could be used to:

Enhance electronic portfolios by supplementing artifacts of learning with detailed documentation or summaries of all learning activities
Identify activities and “learning paths” related to student success (e.g., does a mandatory internship increase a student’s likelihood of graduating or landing a job? Does number of Facebook posts per week correlate with GPA?)
Measure more “fuzzy” and comprehensive learning objectives (e.g., want to measure whether a student is “engaged with the subject matter”? Check to see how many relevant books, articles, or websites the student read during their time on campus)
Power rich, compelling stories of how institutional goals have been achieved for accreditation reports
Allow student learning data to be easily transported from one LMS to another, or even from one school to another or to the student’s job
And probably lots more

As an educational technologist, I’m very excited about Tin Can. As a behavioral scientist, I have some reservations about the ways I’m already hearing people talk about Tin Can and the data the system can collect.

Self-Report vs. Automatic Recording

One of the big advantages of e-learning is that it allows for automated recording of user responses. Typically, the range of responses it records is pretty limited, however. Usually you’ll just be recording mouse clicks or finger swipes. If you’re extra fancy, you might track eye movement or other body movements with a motion-sensing device or a smartphone’s gyroscope, accelerometer, proximity sensor, etc. Even with the ubiquity and advances of mobile technology, though, it is difficult to automatically record all of the relevant aspects of different learning experiences a person might have. For example, during a visit to a local museum I might participate in a tour group, watch a reenactment of an important historical event, visit a dozen exhibits, and read about 3 dozen informational signs. These might be great learning experiences that I would like to record.

Enter Tin Can. With its support for virtually any type of learning experience, either online or offline, we now have a system for recording a much wider range of responses and experiences. This is a big step up from what SCORM and most learning management systems currently allow. BUT…

Many of those responses and experiences will be self-reported, not automatically recorded by a machine. At the DevLearn conference, for example, one presenter demoed an “I learned this” button that can be added to a web browser toolbar. With this button, a learner reading a website could click on the button and have the activity recorded to the LMS (or, more specifically, an LRS–learning record store–using Tin Can terminology). But did the student actually read the content of that web page? We don’t know. Did the student learn anything from that web page? We don’t know. Did the student even really visit the web page? We don’t know (his or her little brother might have done it). In short, we are faced with all of the problems of self-report: the report may or may not correspond to what actually happened.

You might think that a student or learner would have little reason to “lie” about an informal learning experience like viewing a web page. But as important consequences start to be attached to Tin Can learning activities, motivation will be in place for some learners to “game” the system. Want to be promoted to a management position? Tell your LRS that you’ve been reading lots of management books, attending seminars, and admiring and taking notes about your boss’s management style. Want to be competitive for a scholarship sponsored by a local environmental group? Tell your school’s LMS that you’ve been collecting discarded recyclables from around campus, attending guest lectures on environmental policy, volunteering for Greenpeace, and reading environmental news websites daily.

It is well-known to most behavioral scientists that self-report measures are notoriously unreliable and often inaccurate for reasons like social desirability bias. Direct observation or recording of behavior is much better, as is the collection or examination of response products (how many widgets did you produce today?). Self-report data should not be considered the equal of data collected via automatic recording or direct observation (in terms of measuring the target performance; it can be very interesting to examine the contextual variables that leads to someone making a specific self-report, of course, but that is a different story and analysis). Thus, any self-reported activity data provided by an LRS or LMS (or any other system, really) should be both: a) clearly identified as such and b) interpreted cautiously.

Of course, the flip side of this issue is that the Tin Can API can also allow some online activities (such as viewing a web page or video) to be automatically recorded to the LRS. This, of course, also raises a host of privacy and data ownership issues that will need to be addressed by vendors, organizations, schools, and anyone else using the API. It also still applies to only a relatively small subset of learning experiences (i.e., those that are computer-based or can be automatically captured by a digital system).

Correlation ≠ Causation

One of the reasons people get fired up about Tin Can is that it allows the recording of a much wider range of “experiences” than SCORM or traditional LMS models. That smells like “big data,” people. And most people these days think big data smells delicious! Almost like bacon.

I’m no expert on “big data,” but it seems the general idea is that the more data you have, the better able you are to predict things and make better decisions about issues related to those data. That makes a lot of sense. Using big data in this way is generally referred to as “analytics,” which to most people these days both smells and tastes like bacon. It’s good stuff. Google Analytics is a great example in the web development world. It automatically records tons of information about how and when people use your website, and you can use those data to make better decisions about advertising, infrastructure, usability, web design, etc. Big data leads to analytics which leads to nirvana.

In higher education, schools already have lots and lots of data about their students. By implementing the Tin Can API, they can have lots more. The hope is that by having “big data” about all sorts of learning experiences–not just the digital ones easily captured by an LMS–educators can identify valuable or more effective learning experiences for students. There may be particular patterns of college student experiences that lead to greater success. Students who participate in a large number of volunteer activities, for example, might be much more likely to graduate.

A very big problem with this hope, though, is that nearly all of the data and analyses being used will be correlational in nature. Correlational analyses can be very good for prediction, which is obviously very useful in many contexts. They are not, however, always good for figuring out how to change or influence learning or performance (and I have argued elsewhere that being able to both predict and influence human behavior is a core goal of education in general and instructional design in particular). An important principle every psychology student learns is that “correlation does not imply causation.” This is because the relationship between the correlated variables might be spurious or influenced by other (often unseen) variables. For example, you might find a strong correlation between volunteer activities and graduation, but there may not be a causal relationship between these variables. Efforts to encourage students to engage in more volunteer activities, then, may not have any significant impact on graduation rates.

Of course, the value of “big data” is that you can have information on more and more potentially relevant variables. That is true, and it may make it less likely that you’ll find only spurious correlations. However, I would argue that there will still be a large number of important variables and experiences that will never make it into the LRS or LMS, so caution must always be exercised when interpreting your big data or analytics. In an ideal world, the correlations identified by such analytics would serve to inform carefully conducted research on the correlated variables to determine whether or not a functional relationship actually exists between them. That research would then inform institution-wide strategies to promote retention, enhance learning, and all of the other wonderful things universities like to do.

Another caveat: if your analytics rely heavily on self-reported data, then even more caution must be exercised. You no longer have, for example, an actual correlation between volunteering and graduation. You have a correlation between people saying they volunteered and graduation, which is a different thing. It can still be important and might even be accurate, but you can’t count on it. I certainly wouldn’t invest precious campus resources into any substantial interventions based solely on correlations between self-report measures and anything else, regardless of how big the data set might be.

What’s the Goal?

Somewhat lost in the discussions of Tin Can I’ve seen so far is the role of desired outcomes and performance. It’s great to have lots of data on a learner’s diverse array of experiences, but many of them may not have any relation (either functional or spurious) to what an organization or school (or individual) is trying to achieve. We won’t know this, of course, unless we are measuring those outcomes. It is possible that Tin Can will allow us to expand the way we think about and define some of these outcomes, but it will not excuse us from stating them clearly and measuring them directly.

0 0 votes

Article Rating

5 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Megan Bowe

November 8, 2012 8:51 pm

Excellent points!This is a really good post, Eric. You’ve addressed a lot of points that need to be heavily considered. The fact that self reported data is better at pointing to a person’s interest areas or forces on a person (supervisors, teachers, etc) is something we all need to be aware of in analyzing these big data sets. Automated interaction tracking also needs to be dealt with carefully, the designer and developer building an interaction need to be very specific about what certain interactions and data mean. Because learning and performance are such personal things, automation of collection and analysis… Read more »

Koreen Olbrish

February 15, 2013 4:57 pm

Data is meaningless out of contextHi Eric, GREAT post and happy to see you’ve delved into one of my biggest issues with the Tin Can API. Ironically, someone pointed out this post to me, having just left ASTD’s TechKnowledge conference where there was continued buzz and interest. As a immersive learning designer and someone who has struggled with how to quantify practice into meaningful data, I am thrilled that there maybe an emerging standard that could help capture that data. The problem, as you correctly point out, is that reporting activity neither demonstrates learning nor performance improvement. The hard work… Read more »

Roger Brownlie

May 7, 2014 11:33 pm

At last…This is the first and only critical analysis of TinCan I have found. There’s so much hype but not much criticism in the classical sense. This article introduces so many potential caveats – not least privacy. The ownership of the LRS – who controls the learner’s learning activity records – is not defined by the API but it does create the potential for creating a personal data locker, where learners can track their own experiences and control their own learning data – a good thing. Alternatively, it gives an unprecedented level of oversight to managers into the corporate activities… Read more »

gajo

November 11, 2014 8:43 am

I second the last post; there should be more discussionI too am glad to see deeper discussion. I could be lacking some knowledge, which would lead to some misunderstanding. One big question I’ve had, that I thought was going to get answered by the final version is, why the schema of Subject-Verb-Object, if it is going to be completely boundless to what those values should be? What standard does that really achieve, when there could be endless interpretation of arbitrary sentences? Quite the opposite it seems. How do you reconcile, ‘John tried’ with ‘John attempted’ or ‘John ‘? Isn’t that… Read more »

November 11, 2014 8:46 am

The last John sentence was
The last John sentence was supposed to be ‘John <insert regional synonym>’