A Behavioral Scientist's Initial Thoughts on the Tin Can API, Big Data, and Learning Analytics

Tin Can APII attended the DevLearn 2012 Conference this week and learned a lot more about the emerging Tin Can API. What is the Tin Can API? I’m still learning about it myself and am by no means an expert, but basically it is a set of standards and specifications for handling and structuring learning data. In some ways it represents the evolution, expansion, and modernization of SCORM (Sharable Content Object Reference Model), the primary standard used in e-learning for the past decade or so. While SCORM has been great in many ways, it can also be a huge pain to implement and work with, and has a number of key limitations in terms of what it can do or support.

I won’t spend a lot of time here explaining all of the dazzling new features and capabilities of Tin Can or how it addresses the limitations of SCORM. The Tin Can API website has plenty of information, as does Rustici Software, the primary developer of the API. I’ll summarize it with a quote from the Tin Can API website :

The Tin Can API is a brand new learning technology specification that opens up an entire world of experiences (online and offline). This API captures the activities that happen as part of learning experiences. A wide range of systems can now securely communicate with a simple vocabulary that captures this stream of activities. Previous specification were difficult and had limitations — the Tin Can API is simple and flexible. It lifts many of the older restrictions. Mobile learning, simulations, virtual worlds, serious games, real-world activities, experiential learning, social learning, offline learning, and collaborative learning are just some of the things that can now be recognized and communicated well with the Tin Can API.

Sounds cool, huh? And it is. Most people at DevLearn were quite excited about Tin Can, and rightfully so. I expect the use of Tin Can will grow rapidly in the world of corporate training and workforce development.

Tin Can and Higher Education

Although I think it will take a little longer for Tin Can to gain widespread adoption in higher education, where I focus most of my e-learning development time these days, I believe it will have a substantial impact there, as well. For colleges and universities, Tin Can represents an important, standardized way of collecting (and moving and sharing) “big data” on learning that can power analytics used to improve not only learning, but also student recruitment, retention, and engagement. By allowing students to incorporate both offline and informal learning activities into their “learning activity stream” (think Facebook timeline for nerds), faculty and administrators can gain a much more comprehensive look at a student’s learning and engagement. Those data could be used to:

  • Enhance electronic portfolios by supplementing artifacts of learning with detailed documentation or summaries of all learning activities
  • Identify activities and “learning paths” related to student success (e.g., does a mandatory internship increase a student’s likelihood of graduating or landing a job? Does number of Facebook posts per week correlate with GPA?)
  • Measure more “fuzzy” and comprehensive learning objectives (e.g., want to measure whether a student is “engaged with the subject matter”? Check to see how many relevant books, articles, or websites the student read during their time on campus)
  • Power rich, compelling stories of how institutional goals have been achieved for accreditation reports
  • Allow student learning data to be easily transported from one LMS to another, or even from one school to another or to the student’s job
  • And probably lots more

As an educational technologist, I’m very excited about Tin Can. As a behavioral scientist, I have some reservations about the ways I’m already hearing people talk about Tin Can and the data the system can collect.

Self-Report vs. Automatic Recording

One of the big advantages of e-learning is that it allows for automated recording of user responses. Typically, the range of responses it records is pretty limited, however. Usually you’ll just be recording mouse clicks or finger swipes. If you’re extra fancy, you might track eye movement or other body movements with a motion-sensing device or a smartphone’s gyroscope, accelerometer, proximity sensor, etc. Even with the ubiquity and advances of mobile technology, though, it is difficult to automatically record all of the relevant aspects of different learning experiences a person might have. For example, during a visit to a local museum I might participate in a tour group, watch a reenactment of an important historical event, visit a dozen exhibits, and read about 3 dozen informational signs. These might be great learning experiences that I would like to record.

Enter Tin Can. With its support for virtually any type of learning experience, either online or offline, we now have a system for recording a much wider range of responses and experiences. This is a big step up from what SCORM and most learning management systems currently allow. BUT

Many of those responses and experiences will be self-reported, not automatically recorded by a machine. At the DevLearn conference, for example, one presenter demoed an “I learned this” button that can be added to a web browser toolbar. With this button, a learner reading a website could click on the button and have the activity recorded to the LMS (or, more specifically, an LRS–learning record store–using Tin Can terminology). But did the student actually read the content of that web page? We don’t know. Did the student learn anything from that web page? We don’t know. Did the student even really visit the web page? We don’t know (his or her little brother might have done it). In short, we are faced with all of the problems of self-report: the report may or may not correspond to what actually happened.

You might think that a student or learner would have little reason to “lie” about an informal learning experience like viewing a web page. But as important consequences start to be attached to Tin Can learning activities, motivation will be in place for some learners to “game” the system. Want to be promoted to a management position? Tell your LRS that you’ve been reading lots of management books, attending seminars, and admiring and taking notes about your boss’s management style. Want to be competitive for a scholarship sponsored by a local environmental group? Tell your school’s LMS that you’ve been collecting discarded recyclables from around campus, attending guest lectures on environmental policy, volunteering for Greenpeace, and reading environmental news websites daily.

It is well-known to most behavioral scientists that self-report measures are notoriously unreliable and often inaccurate for reasons like social desirability bias. Direct observation or recording of behavior is much better, as is the collection or examination of response products (how many widgets did you produce today?). Self-report data should not be considered the equal of data collected via automatic recording or direct observation (in terms of measuring the target performance; it can be very interesting to examine the contextual variables that leads to someone making a specific self-report, of course, but that is a different story and analysis). Thus, any self-reported activity data provided by an LRS or LMS (or any other system, really) should be both: a) clearly identified as such and b) interpreted cautiously.

Of course, the flip side of this issue is that the Tin Can API can also allow some online activities (such as viewing a web page or video) to be automatically recorded to the LRS. This, of course, also raises a host of privacy and data ownership issues that will need to be addressed by vendors, organizations, schools, and anyone else using the API. It also still applies to only a relatively small subset of learning experiences (i.e., those that are computer-based or can be automatically captured by a digital system).

Correlation ≠ Causation

One of the reasons people get fired up about Tin Can is that it allows the recording of a much wider range of “experiences” than SCORM or traditional LMS models. That smells like “big data,” people. And most people these days think big data smells delicious! Almost like bacon.

I’m no expert on “big data,” but it seems the general idea is that the more data you have, the better able you are to predict things and make better decisions about issues related to those data. That makes a lot of sense. Using big data in this way is generally referred to as “analytics,” which to most people these days both smells and tastes like bacon. It’s good stuff. Google Analytics is a great example in the web development world. It automatically records tons of information about how and when people use your website, and you can use those data to make better decisions about advertising, infrastructure, usability, web design, etc. Big data leads to analytics which leads to nirvana.

In higher education, schools already have lots and lots of data about their students. By implementing the Tin Can API, they can have lots more. The hope is that by having “big data” about all sorts of learning experiences–not just the digital ones easily captured by an LMS–educators can identify valuable or more effective learning experiences for students. There may be particular patterns of college student experiences that lead to greater success. Students who participate in a large number of volunteer activities, for example, might be much more likely to graduate.

A very big problem with this hope, though, is that nearly all of the data and analyses being used will be correlational in nature. Correlational analyses can be very good for prediction, which is obviously very useful in many contexts. They are not, however, always good for figuring out how to change or influence learning or performance (and I have argued elsewhere that being able to both predict and influence human behavior is a core goal of education in general and instructional design in particular). An important principle every psychology student learns is that “correlation does not imply causation.” This is because the relationship between the correlated variables might be spurious or influenced by other (often unseen) variables. For example, you might find a strong correlation between volunteer activities and graduation, but there may not be a causal relationship between these variables. Efforts to encourage students to engage in more volunteer activities, then, may not have any significant impact on graduation rates.

Of course, the value of “big data” is that you can have information on more and more potentially relevant variables. That is true, and it may make it less likely that you’ll find only spurious correlations. However, I would argue that there will still be a large number of important variables and experiences that will never make it into the LRS or LMS, so caution must always be exercised when interpreting your big data or analytics. In an ideal world, the correlations identified by such analytics would serve to inform carefully conducted research on the correlated variables to determine whether or not a functional relationship actually exists between them. That research would then inform institution-wide strategies to promote retention, enhance learning, and all of the other wonderful things universities like to do.

Another caveat: if your analytics rely heavily on self-reported data, then even more caution must be exercised. You no longer have, for example, an actual correlation between volunteering and graduation. You have a correlation between people saying they volunteered and graduation, which is a different thing. It can still be important and might even be accurate, but you can’t count on it. I certainly wouldn’t invest precious campus resources into any substantial interventions based solely on correlations between self-report measures and anything else, regardless of how big the data set might be.

What’s the Goal?

Somewhat lost in the discussions of Tin Can I’ve seen so far is the role of desired outcomes and performance. It’s great to have lots of data on a learner’s diverse array of experiences, but many of them may not have any relation (either functional or spurious) to what an organization or school (or individual) is trying to achieve. We won’t know this, of course, unless we are measuring those outcomes. It is possible that Tin Can will allow us to expand the way we think about and define some of these outcomes, but it will not excuse us from stating them clearly and measuring them directly.


This is a really good post, Eric. You've addressed a lot of points that need to be heavily considered. The fact that self reported data is better at pointing to a person's interest areas or forces on a person (supervisors, teachers, etc) is something we all need to be aware of in analyzing these big data sets. Automated interaction tracking also needs to be dealt with carefully, the designer and developer building an interaction need to be very specific about what certain interactions and data mean. Because learning and performance are such personal things, automation of collection and analysis will take a long time to get right. We're all new at this and far from magic bullets :)

I think that the privacy issue with automated tracking can be better addressed by allowing the person who is being tracked to collect their data and share it with whom they choose, rather than the tool or organization collecting ALL of the data about a person by default. That's one paradigm that we're working to shift now, that people need to own the data about what they've done not the schools or places they work. If you have an idea for a higher ed use case, I would love to talk and see where that goes. Thanks again for writing this, very relevant right now.


...interpretation and explanation of the Tin Can API that I have heard so far, outside the spec development community. Love that you are drawing a clear line between causation and correlation.

Maybe true: "I would argue that there will still be a large number of important variables and experiences that will never make it into the LRS or LMS" but I would strongly argue that there will quickly be more and more experiences that will.

Example: One of our partners has built a fully functioning Khan Academy connector to report learning data from any profile (with the approval of the learner using OAuth) to any LRS. He did so in 5 days. That is a testament to how extensible and flexible the Tin Can API truly is.

Thank you for posting, look forward to seeing more thoughts from you on this subject.



Your points about the unreliability of self reporting are valid and this is an issue which has been considered in development of the spec. One property of the statement that has not had as much publicity is the 'authority' property. This states who is asserting the statement to be true. This might be the learner themselves, a tutor, a supervisor, an exam board, another student etc. Reports could then filter based on authority to only include statements from approved, trusted authorities.

Authority is described on pages 21 and 22 of the spec which can be found here: http://www.tincanapi.co.uk/wik...

You will notice that security considerations have been taken into account so nobody can send statements with the authority of my Twitter account @mrandrewdownes unless they know my Twitter password.

Hope that's helpful. Thanks for a great article!

Andrew Downes


I forgot to mention, in response to your section on "what's the goal" - this was a concern of mine joining the specification effort and coming from an academic background. As a result of discussions around this the spec includes facility to define planned learning or learning goals.

This blog post explains in details:




Hi Eric,

GREAT post and happy to see you've delved into one of my biggest issues with the Tin Can API. Ironically, someone pointed out this post to me, having just left ASTD's TechKnowledge conference where there was continued buzz and interest. As a immersive learning designer and someone who has struggled with how to quantify practice into meaningful data, I am thrilled that there maybe an emerging standard that could help capture that data. The problem, as you correctly point out, is that reporting activity neither demonstrates learning nor performance improvement.

The hard work is in establishing actual performance metrics and measuring improvement through various learning activities. Simply reporting that you did something doesn't show qualitatively OR quantitatively whether that activity has any impact on what you know, or what you can do better.

I see potential here, but it troubles me that people are too "oooh! shiny!" about one minor piece of a much bigger piece of work, namely, correlating activity to performance. There are A LOT of potential problems with the Tin Can API, and so far, I haven't seen any best practices, use cases, or case studies that demonstrate its most effective use. Another problem? Those directly involved in the creation of the Tin Can API jumping into every discussion and squashing the much needed conversation among the rest of the community. Much like a community manager who tries to force conversations in a certain direction, there has been much talk in the industry about our inability to have deep conversations about the pros and cons of the Tin Can API without the developers and evangelists inserting themselves and trying to guide the conversations. Like pushy sales people, they are becoming a turn off for people who want to research, investigate and share their own conclusions. If I want to hear the sales pitch, I'll talk to a sales person. If I want to talk to my peers, I go to social media. Unfortunately, the sales people have hijacked these social conversations and are creating an atmosphere where no one wants to participate.

It reminds me too keenly of my experience with virtual worlds...the developers so in love with what they had built that they stopped listening to the concerns, needs, and objections of consumers and their potential customers. When you build something, you sometimes get too close to it to be able to see the chinks in the armor. When developers start arguing with naysayers, it's a clear sign that it's time for them to let go. Like an artist, you can have intention with your work, but the true meaning is what every person brings to it. It may be time for the developers to step down, and let practitioners lead the next phase of the discussion to help the Tin Can API survive its inevitable fall into the "trough of disillusionment."

Let's hope the conversation continues and that the real problems with the Tin Can API are not ignored. No system is perfect, but ignoring the problems, or trying to squash the conversation about them, does not make them go away. I'm hopeful that we can all learn from each other in making the Tin Can API useful and meaningful.


This is the first and only critical analysis of TinCan I have found. There's so much hype but not much criticism in the classical sense. This article introduces so many potential caveats - not least privacy.

The ownership of the LRS - who controls the learner’s learning activity records - is not defined by the API but it does create the potential for creating a personal data locker, where learners can track their own experiences and control their own learning data - a good thing. Alternatively, it gives an unprecedented level of oversight to managers into the corporate activities of employees, way beyond "learning".


I too am glad to see deeper discussion. I could be lacking some knowledge, which would lead to some misunderstanding. One big question I've had, that I thought was going to get answered by the final version is, why the schema of Subject-Verb-Object, if it is going to be completely boundless to what those values should be? What standard does that really achieve, when there could be endless interpretation of arbitrary sentences? Quite the opposite it seems. How do you reconcile, 'John tried' with 'John attempted' or 'John '? Isn't that formation already redundant with standard English? It seems to me this would also cause major issues with data portability. If I were a student or employee and if I were to transfer to a different institution, what would that data mean to the institution I would be transferring to? It seems presumptuous to think that some sentences will quantify the value of an employee based on well-formed sentences. Please, someone inform me if I'm missing something.


The last John sentence was supposed to be 'John <insert regional synonym>'


Add new comment