erambler/legacy/blog/research-data-management/index.html

<!DOCTYPE html>
<html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta charset="utf-8">
<meta content="IE=edge" http-equiv="X-UA-Compatible">
<meta content="width=device-width, initial-scale=1" name="viewport">
<link href="../../old/assets/style/styles.css" rel="stylesheet" type="text/css">
<!--[if lt IE 9]>
<script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
<script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
<![endif]-->
<script>
  (function() {
    var s, scheme, wf;

    this.WebFontConfig = {
      google: {
        families: ['Amaranth:700,700italic:latin', 'Inconsolata::latin']
      }
    };

    wf = document.createElement('script');

    scheme = 'https:' === document.location.protocol ? 'https' : 'http';

    wf.src = scheme + "://ajax.googleapis.com/ajax/libs/webfont/1/webfont.js";

    wf.type = 'text/javascript';

    wf.async = 'true';

    s = document.getElementsByTagName('script')[0];

    s.parentNode.insertBefore(wf, s);

  }).call(this);
</script>

<title>What I'm doing these days: research data management | eRambler</title>
<link href="https://erambler.co.uk/rss.xml" rel="alternate" type="application/rss+xml">
</head>
<body class="single-post">
<div id="container">
<header class="page-header">
<hgroup>
<div class="h1"><a href="../../">eRambler</a></div>
<div class="lead">Jez Cope's blog on becoming a research technologist</div>
</hgroup>
<nav><ul>
<li><a href="../../">Home</a></li>
<li><a href="../../about/">About</a></li>
<li><a href="../../blogroll/">Blogroll</a></li>
</ul>
</nav>
</header>
<section>
<div class="row">
<p class="archive-warning"><strong><em>Please note:</em></strong> this older content has been <strong>archived</strong> and is no longer fully linked into the site. Please go to the <a href="../../">current home page</a> for up-to-date content.</p>
</div>
<div id="content">
<article class="h-entry">
<div class="row">
<h1 class="post-title p-name">What I'm doing these days: research data management</h1>
</div>
<div class="row">
<div class="post-info">
<div class="post-date dt-published">
<a class="u-url" href="http://erambler.co.uk/blog/research-data-management/">Sunday 15 March 2015</a>
</div>
Tagged with
<ul class="post-tags">
<li class="p-category"><span class="tag">Research data management</span></li>
<li class="p-category"><span class="tag">Work</span></li>
<li class="p-category"><span class="tag">Meta</span></li>
</ul>
</div>
<div class="post-body">
<div class="post-content e-content">
<p>So, it’s been a while since I’ve properly updated this blog, and since I seem to be having another try, I thought it would be useful to give a brief overview of what I’m doing these days, so that some of the other stuff I have in the pipeline makes a bit more sense.</p>

<p>My current work focus is research data management: helping university researchers to look after their data in ways that let them and the community get the most out of it.  Data is the bedrock of most (all?) research: the evidence on which all the arguments and conclusions and new ideas are based.  In the past, this data has been managed well (generally speaking) by and for the researchers collecting and using it, and this situation could have continued indefinitely.</p>

<p>Technology, however, has caused two fundamental changes to this position.  First, we’re able to measure more and more about more and more, creating <a href="http://eprints.soton.ac.uk/257648/">what has been termed a “data deluge”</a>.  It’s now possible for on researcher to generate, in the normal course of their work, far more data than they could possibly analyse themselves in a lifetime.  For example, the development of polymerase chain reaction (PCR) techniques have enabled the fast, cheap sequencing of entire genomes: for some conditions, patients’ genomes are now routinely sequenced for future study.  A typical human genome sequence occupies 8TB (about 1700 DVDs), and after processing and compression, this shrinks to around 100GB (21 DVDs).  This covers approximately 23,000 genes, of which any one researcher may only be interested in a handful.</p>

<p>Second, the combination of the internet and cheap availability of computing power means that it has never been easier to share, combine and process this data on a huge scale.  To continue our example, it’s possible to study genetic variations across hundreds or thousands of individuals to get new insights into how the body works.  The <a href="http://www.genomicsengland.co.uk/the-100000-genomes-project/">100,000 Genomes Project (“100KGP”)</a> is an ambitious endeavour to establish a database of such genomes and, crucially, develop the infrastructure to allow researchers to access and analyse it at scale.</p>

<p>In order to make this work, there are plenty of barriers to overcome.  The practices that kept data in line long enough to publish the next paper are no longer good enough: the organisation and documentation must be made explicit and consistent so that others can make sense of it.  It also needs to be protected better from loss and corruption.  Obviously, this takes more work than just dumping it on a laptop, so most people want some reassurance that this extra work will pay off.</p>

<p>Sharing has risks too.  Identifiable patient data cannot be shared without the patients consent; indeed doing so would be a criminal offence in Europe.  Similar rules apply to sensitive commercial information.  Even if there aren’t legal restrictions, most researchers have a reasonable expectation (albeit developed before the “data deluge”) that they be able to reap the reputational rewards of their own hard work by publishing papers based on it.</p>

<p>There is therefore a great deal of resistance to these changes.  But there can be benefits too.  For society, there is the possibility of making advancing knowledge in directions that would never have been possible even ten years ago.  But there are practical benefits to the individuals too: every PhD supervisor and most PhD students know the frustration of trying to continue a student’s poorly-documented work after they’ve graduated.</p>

<p>For funders the need for change is particularly acute.  Budgets are being squeezed, and with the best will in the world there is less money to go around, so there is pressure to ensure the best possible return on investment.  This means that it’s no longer acceptable, for example, for several labs in the country to be running identical experiments to do different things with the results.  It’s more important than ever to make more data available to and reusable by more people.</p>

<p>So the funders (in the UK, particularly the <a href="http://www.rcuk.ac.uk">government-funded research councils</a>), are introducing requirements on the researchers they fund to move along this path quicker than they might feel comfortable with.  It therefore seems reasonable to offer these hard-working people some support, and that’s where I come in.</p>

<p>I’m currently spending my time providing training and advice, bringing people together to solve problems and trying to convince a lot of researchers to fix what, in many cases, they didn’t think was broken!  They are subject to conflicting expectations and need help navigating this maze so that they can do what they do best: discover amazing new stuff and change the world.</p>

<p>For the last 6ish months I’ve been doing this at <a href="http://imperial.ac.uk/">Imperial College</a> (my <em>alma mater</em>, no less) and loving it.  It’s a fascinating area for me, and I’m really excited to see where it will lead me next!</p>

<p>If you have time, here’s a (slightly tongue-in-cheek) take on the problem from the perspective of a researcher trying to reuse someone else’s data:</p>

<div class="video-container"><iframe width="487" height="274" src="https://www.youtube.com/embed/N2zK3sAtr-4" frameborder="0" allowfullscreen=""></iframe></div>

</div>
<div id="disqus_thread"></div>
<script>
  /* * * CONFIGURATION VARIABLES: EDIT BEFORE PASTING INTO YOUR WEBPAGE * * */
  var disqus_shortname = 'erambler'; // required: replace example with your forum shortname
  var disqus_identifier = 'tag:erambler.co.uk,2015-03-15:/blog/research-data-management/';
  var disqus_title = 'What I'm doing these days: research data management'
  var disqus_url = 'http://erambler.co.uk/blog/research-data-management/';
  var disqus_developer = 0;
  if (window.location.hostname == 'localhost')
    disqus_developer = 1;

  /* * * DON'T EDIT BELOW THIS LINE * * */
  (function() {
      var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true;
      dsq.src = 'http://' + disqus_shortname + '.disqus.com/embed.js';
      (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq);
  })();
</script>
</div>
</div>
</article>
</div>
<div id="sidebar">
<div class="sidebar-box about-me h-card">
<p>Hi, I’m <a href="http://erambler.co.uk" class="p-name u-url">Jez Cope</a> and this is my
blog, where I talk about technology in research and higher
education, including:</p>

<ul>
  <li>Research data management;</li>
  <li>e-Research;</li>
  <li>Learning;</li>
  <li>Teaching;</li>
  <li>Educational technology.</li>
</ul>
</div>
<div class="sidebar-box links">
<h2>Me elsewhere</h2>
<ul>
  <li><a href="https://twitter.com/jezcope" rel="me">Twitter</a></li>
  <li><a href="https://github.com/jezcope" rel="me">github</a></li>
  <li><a href="https://linkedin.com/in/jezcope">LinkedIn</a></li>
  <li><a href="http://diigo.com/user/jezcope">Diigo</a></li>
  <li><a href="https://www.zotero.org/jezcope">Zotero</a></li>
  <li><a href="http://gplus.to/jezcope">Google+</a></li>
</ul>
</div>
</div>
<div class="row">
<footer><a class="license" href="http://creativecommons.org/licenses/by-sa/4.0/" rel="license">
<img alt="Creative Commons License" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" style="border-width:0">
</a>
<span href="http://purl.org/dc/dcmitype/Text" property="dct:title" rel="dct:type" xmlns:dct="http://purl.org/dc/terms/">
eRambler
</span>
by
<a href="http://erambler.co.uk/" property="cc:attributionName" rel="cc:attributionURL" xmlns:cc="http://creativecommons.org/ns#">
Jez Cope
</a>
is licensed under a
<a href="http://creativecommons.org/licenses/by-sa/4.0/" rel="license">
Creative Commons Attribution-ShareAlike 4.0 International license
</a>
</footer>
</div>
</section>
</div>
<script>
  if (!/^http:\/\/localhost/.test(window.location)) {
    var _gaq = _gaq || [];
    _gaq.push(['_setAccount', 'UA-10201101-1']);
    _gaq.push(['_trackPageview']);

    (function() {
      var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
      ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
      var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
    })();
  }
</script>

</body>
</html>