Initial commit

This commit is contained in:
Jez Cope 2016-08-10 07:58:37 +01:00
commit 0d5fe706f7
14 changed files with 344 additions and 0 deletions

2
.dir-locals.el Normal file
View File

@ -0,0 +1,2 @@
((nil . ((create-lockfiles . nil)
)))

2
.gitignore vendored Normal file
View File

@ -0,0 +1,2 @@
/public/
/tmp/

3
.gitmodules vendored Normal file
View File

@ -0,0 +1,3 @@
[submodule "themes/sidmouth"]
path = themes/sidmouth
url = git@github.com:jezcope/theme-sidmouth-hugo

38
config.yaml Normal file
View File

@ -0,0 +1,38 @@
baseurl: http://erambler.co.uk/
title: eRambler
copyright: "(c) 2016 Jez Cope"
languageCode: en-GB
metaDataFormat: yaml
ignoreFiles: ["^\\.#"]
theme: sidmouth
disqusShortname: erambler
permalinks:
post: '/blog/:slug/'
page: '/:slug/'
taxonomies:
tag: tags
author:
name: Jez Cope
params:
home: home
brand: a blog about
topline: 'research communication + higher education + technology + stuff'
copyright: "Powered by [Hugo](http://gohugo.io)."
sidebar: left
author: Jez Cope
authorlocation: Yorkshire, United Kingdom
bio: "where does my bio appear?"
github: jezcope
twitter: jezcope
linkedin: jezcope
...

19
content/page/about.md Normal file
View File

@ -0,0 +1,19 @@
---
Categories: []
Description: "About me"
Tags: []
date: "2016-07-19T07:33:41+01:00"
menu: "main"
title: "About me"
slug: about
---
I help people in Higher Education communicate and collaborate more effectively using technology. I currently work at the [University of Sheffield][] focusing on [research data management][] policy, practice, training and advocacy.
In my free time, I like to: run; play the accordion; [morris dance][]; climb; cook; read (fiction and non-fiction); write.
[Morris dance]: http://www.fiveriversmorris.org.uk/
[University of Sheffield]: http://www.sheffield.ac.uk/
[research data management]: https://www.sheffield.ac.uk/library/rdm

View File

@ -0,0 +1,50 @@
---
Description: "description here"
date: 2016-07-19T07:42:33+01:00
menu: "main"
title: RDM Resources
share: false
---
I occasionally get asked for resources to help someone learn more about research data management (RDM) as a discipline (i.e. for those providing RDM support rather than simply wanting to manage their own data). I've therefore collected a few resources together on this page. If you're lucky I might even update it from time to time!
First, a caveat: this is very focussed on UK Higher Education, though much of it will still be relevant for people outside that narrow demographic.
My general recommendation would be to start with the [Digital Curation Centre (DCC)](http://www.dcc.ac.uk/) website and follow links out from there. I also have a slowly growing [list of RDM links on Diigo](https://www.diigo.com/outliner/11flkb/Research-data-management?key=mzirvjs7rt), and there's an RDM section in my [list of blogs and feeds](/blogroll/) too.
## Mailing lists
- [Jiscmail](http://www.jiscmail.ac.uk/) is a popular list server run for the benefit of further and higher education in the UK; the following lists are particularly relevant:
- RESEARCH-DATAMAN
- DATA-PUBLICATION
- DIGITAL-PRESERVATION
- LIS-RESEARCHSUPPORT
- The [Research Data Alliance](https://rd-alliance.org/groups) have a number of Interest Groups and Working Groups that discuss issues by email
## Events
- [International Digital Curation Conference](http://www.dcc.ac.uk/events/international-digital-curation-conference-idcc) — major annual conference
- [Research Data Management Forum](http://www.dcc.ac.uk/events/research-data-management-forum-rdmf) — roughly every six months, places are limited!
- [RDA Plenary](https://rd-alliance.org/plenary-meetings.html) — also every 6 months, but only about 1 in every 3 in Europe
## Books
In no particular order:
- Martin, Victoria. Demystifying eResearch: A Primer for Librarians. Libraries Unlimited, 2014.
- Borgman, Christine L. Big Data, Little Data, No Data: Scholarship in the Networked World. Cambridge, Massachusetts: The MIT Press, 2015.
- Corti, Louise, Veerle Van den Eynden, and Libby Bishop. Managing and Sharing Research Data. Thousand Oaks, CA: SAGE Publications Ltd, 2014.
- Pryor, Graham, ed. Managing Research Data. Facet Publishing, 2012.
- Pryor, Graham, Sarah Jones, and Angus Whyte, eds. Delivering Research Data Management Services: Fundamentals of Good Practice. Facet Publishing, 2013.
- Ray, Joyce M., ed. Research Data Management: Practical Strategies for Information Professionals. West Lafayette, Indiana: Purdue University Press, 2014.
## Reports
- Ten Recommendations for Libraries to Get Started with Research Data Management. LIBER, 24 August 2012. http://libereurope.eu/news/ten-recommendations-for-libraries-to-get-started-with-research-data-management/.
- Science as an Open Enterprise. Royal Society, 2 June 2012. https://royalsociety.org/policy/projects/science-public-enterprise/Report/.
- Mary Auckland. Re-Skilling for Research. RLUK, January 2012. http://www.rluk.ac.uk/wp-content/uploads/2014/02/RLUK-Re-skilling.pdf.
## Journals
- [International Journal of Digital Curation](http://www.ijdc.net/) (IJDC)
- [Journal of eScience Librarianship](http://escholarship.umassmed.edu/jeslib/) (JeSLib)

View File

@ -0,0 +1,26 @@
---
title: "#IDCC16 Day 0: business models for research data management"
date: 2016-02-22T18:20:55+01:00
slug: idcc16-day-0
draft: false
tags:
- IDCC16
- Research data management
- Conference
- Service planning
---
I'm at the [International Digital Curation Conference 2016][IDCC16] (#IDCC16) in Amsterdam this week. It's always a good opportunity to pick up some new ideas and catch up with colleagues from around the world, and I always come back full of new possibilities. I'll try and do some more reflective posts after the conference but I thought I'd do some quick reactions while everything is still fresh.
Monday and Thursday are pre- and post-conference workshop days, and today I attended [*Developing Research Data Management Services*][workshop]. Joy Davidson and Jonathan Rans from the [Digital Curation Centre (DCC)][] introduced us to the [Business Model Canvas][BMC], a template for designing a business model on a single sheet of paper. The model prompts you to think about all of the key facets of a sustainable, profitable business, and can easily be adapted to the task of building a service model within a larger institution. The DCC used it as part of the [Collaboration to Clarify Curation Costs (4C) project][4C], whose output the [Curation Costs Exchange][CCEx] is also worth a look.
It was a really useful exercise to be able to work through the whole process for an aspect of research data management (my table focused on training & guidance provision), both because of the ideas that came up and also the experience of putting the framework into practice. It seems like a really valuable tool and I look forward to seeing how it might help us with our RDM service development.
Tomorrow the conference proper begins, with a range of keynotes, panel sessions and birds-of-a-feather meetings so hopefully more then!
[IDCC16]: http://www.dcc.ac.uk/events/idcc16
[workshop]: http://www.dcc.ac.uk/events/idcc16/workshops#Workshop%201
[Digital Curation Centre (DCC)]: http://www.dcc.ac.uk/
[BMC]: http://www.businessmodelgeneration.com/canvas/bmc
[4C]: http://www.curationexchange.org/about#4cproject
[CCEx]: http://www.curationexchange.org/

View File

@ -0,0 +1,38 @@
---
comments: true
date: 2016-02-23T19:43:57+01:00
draft: false
image: ""
menu: ""
share: true
tags:
- IDCC16
- Research data management
- Conference
- Open data
slug: idcc16-day-1
title: "#IDCC16 Day 1: Open Data"
---
The main conference opened today with an inspiring keynote by Barend Mons, Professor in Biosemantics, Leiden University Medical Center. The talk had plenty of great stuff, but two points stood out for me.
First, Prof Mons described a newly discovered link between Huntingdon's Disease and a previously unconsidered gene. No-one had previously recognised this link, but on mining the literature, an indirect link was identified in more than 10% of the roughly 1 million scientific claims analysed. This is knowledge for which we already had more than enough evidence, but **which could never have been discovered without such a wide-ranging computational study**.
Second, he described a number of behaviours which **should be considered "malpractice" in science**:
- Relying on supplementary data in articles for data sharing: the majority of this is trash (paywalled, embedded in bitmap images, missing)
- Using the Journal Impact Factor to evaluate science and ignoring altmetrics
- Not writing data stewardship plans for projects (he prefers this term to "data management plan")
- Obstructing tenure for data experts by assuming that all highly-skilled scientists must have a long publication record
A second plenary talk from Andrew Sallons of the [Centre for Open Science](http://cos.io) introduced a number of interesting-looking bits and bobs, including the [Transparency & Openness Promotion (TOP) Guidelines][TOP] which set out a pathway to help funders, publishers and institutions move towards more open science.
[TOP]: https://osf.io/9f6gx/wiki/Guidelines/
The rest of the day was taken up with a panel on open data, a poster session, some demos and a birds-of-a-feather session on sharing sensitive/confidential data. There was a great range of posters, but a few that stood out to me were:
- Lessons learned about ISO 16363 ("Audit and certification of trustworthy digital repositories") certification from the British Library
- Two separate posters (from the Universities of Toronto and Colorado) about disciplinary RDM information & training for liaison librarians
- A template for sharing psychology data developed by a psychologist-turned-information researcher from Carnegie Mellon University
More to follow, but for now it's time for the conference dinner!

View File

@ -0,0 +1,54 @@
---
title: '#IDCC16 day 2: new ideas'
# description: 'Lots of new ideas from #IDCC16 day 2!'
slug: idcc16-day-2
date: 2016-03-16T07:44:14+01:00
type: post
tags:
- IDCC16
- Conference
- Open data
- Research data management
---
*Well, I did a great job of blogging the conference for a couple of days, but then I was hit by the bug that's been going round and didn't have a lot of energy for anything other than paying attention and making notes during the day! I've now got round to reviewing my notes so here are a few reflections on day 2.*
Day 2 was the day of many parallel talks! So many great and inspiring ideas to take in! Here are a few of my take-home points.
## Big science and the long tail ##
The first parallel session had examples of practical data management in the real world. Jian Qin & Brian Dobreski (School of Information Studies, Syracuse University) worked on reproducibility with one of the research groups involved with the recent gravitational wave discovery. "Reproducibility" for this work (as with much of physics) mostly equates to computational reproducibility: tracking the provenance of the code and its input and output is key. They also found that in practice the scientists' focus was on making the big discovery, and ensuring reproducibility was seen as secondary. This goes some way to explaining why current workflows and tools don't really capture enough metadata.
Milena Golshan & Ashley Sands (Center for Knowledge Infrastructures, UCLA) investigated the use of Software-as-a-Service (SaaS, such as Google Drive, Dropbox or more specialised tools) as a way of meeting the needs of long-tail science research such as ocean science. This research is characterised by small teams, diverse data, dynamic local development of tools, local practices and difficulty disseminating data. This results in a need for researchers to be generalists, as opposed to "big science" research areas, where they can afford to specialise much more deeply. Such generalists tend to develop their own isolated workflows, which can differ greatly even within a single lab. Long-tail research also often struggles from a lack of dedicated IT support. They found that use of SaaS could help to meet these challenges, but with a high cost required to cover the needed guarantees of security and stability.
## Education & training ##
This session focussed on the professional development of library staff. Eleanor Mattern (University of Pittsburgh) described the immersive training introduced to improve librarians' understanding of the data needs of their subject areas in delivering their [RDM service delivery model][UPitt model]. The participants each conducted a "disciplinary deep dive", shadowing researchers and then reporting back to the group on their discoveries with a presentation and discussion.
Liz Lyon (also University of Pittsburgh, formerly UKOLN/DCC) gave a systematic breakdown of the skills, knowledge and experience required in different data-related roles, obtained from an analysis of job adverts. She identified distinct roles of data analyst, data engineer and data journalist, and as well as each role's distinctive skills, pinpointed common requirements of all three: Python, R, SQL and Excel. This work follows on from an earlier phase which identified an allied set of roles: data archivist, data librarian and data steward.
[UPitt model]: http://d-scholarship.pitt.edu/26738/
## Data sharing and reuse ##
This session gave an overview of several specific workflow tools designed for researchers. Marisa Strong (University of California Curation Centre/California Digital Libraries) presented *[Dash](https://dash.cdlib.org/)*, a highly modular tool for manual data curation and deposit by researchers. It's built on their flexible backend, *Stash*, and though it's currently optimised to deposit in their Merritt data repository it could easily be hooked up to other repositories. It captures DataCite metadata and a few other fields, and is integrated with ORCID to uniquely identify people.
In a different vein, Eleni Castro (Institute for Quantitative Social Science, Harvard University) discussed some of the ways that [Harvard's Dataverse](http://dataverse.org/) repository is streamlining deposit by enabling automation. It provides a number of standardised endpoints such as [OAI-PMH](https://www.openarchives.org/pmh/) for metadata harvest and [SWORD](http://swordapp.org/) for deposit, as well as custom APIs for discovery and deposit. Interesting use cases include:
- An addon for the [Open Science Framework](https://osf.io/) to deposit in Dataverse via SWORD
- An [R package](https://cran.r-project.org/web/packages/dvn/README.html) to enable automatic deposit of simulation and analysis results
- Integration with publisher workflows Open Journal Systems
- A growing set of visualisations for deposited data
In the future they're also looking to integrate with [DMPtool](https://dmptool.org/) to capture data management plans and with Archivematica for digital preservation.
Andrew Treloar ([Australian National Data Service](http://ands.org.au/)) gave us some reflections on the ANDS "applications programme", a series of 25 small funded projects intended to address the fourth of their strategic transformations, *single use**reusable*. He observed that essentially these projects worked because they were able to throw money at a problem until they found a solution: not very sustainable. Some of them stuck to a [traditional "waterfall" approach to project management](https://en.m.wikipedia.org/wiki/Waterfall_model), resulting in "the right solution 2 years late". Every researcher's needs are "special" and communities are still constrained by old ways of working. The conclusions from this programme were that:
- "Good enough" is fine most of the time
- Adopt/Adapt/Augment is better than Build
- Existing toolkits let you focus on the 10% functionality that's missing
- Succussful projects involved research champions who can: 1) articulate their community's requirements; and 2) promote project outcomes
## Summary ##
All in all, it was a really exciting conference, and I've come home with loads of new ideas and plans to develop our services at Sheffield. I noticed a continuation of some of the trends I spotted at last year's IDCC, especially an increasing focus on "second-order" problems: we're no longer spending most of our energy just convincing researchers to take data management seriously and are able to spend more time helping them to do it *better* and get value out of it. There's also a shift in emphasis (identified by closing speaker Cliff Lynch) from sharing to reuse, and making sure that data is not just available but valuable.

View File

@ -0,0 +1,33 @@
---
title: "Data is like water, and language is like clothing"
teaser: "Data is like information in more ways than one, and it's like water too"
date: 2016-03-31T17:40:00+01:00
slug: language-is-like-clothing
draft: false
tags:
- Language
- Grammar
- Data
---
I admit it: I'm a grammar nerd. I know the difference between 'who' and 'whom', and I'm proud.
I used to be pretty militant, but these days I'm more relaxed. I still take joy in the mechanics of the language, but I also believe that English is defined by its usage, not by a set of arbitrary rules. I'm just as happy to abuse it as to use it, although I still think it's important to know what rules you're breaking and why.
My approach now boils down to this: **language is like clothing**. You (probably) wouldn't show up to a job interview in your pyjamas[^2], but neither are you going to wear a tuxedo or ballgown to the pub.
Getting commas and semicolons in the right place is like getting your shirt buttons done up right. Getting it wrong doesn't mean you're an idiot. Everyone will know what you meant. It will affect how you're perceived, though, and that will affect how your *message* is perceived.
And there are former rules[^1] that some still enforce that are nonetheless dropping out of regular usage. There was a time when everyone in an office job wore formal clothing. Then it became acceptable just to have a blouse, or a shirt and tie. Then the tie became optional and now there are many professions where perfectly well-respected and competent people are expected to show up wearing nothing smarter than jeans and a t-shirt.
[^1]: Like not starting a sentence with a conjunction...
One such rule IMHO is that 'data' is a plural and should take pronouns like 'they' and 'these'. The origin of the word 'data' is in the Latin plural of 'datum', and that idea has clung on for a considerable period. But we don't speak Latin and the English language continues to evolve: 'agenda' also began life as a Latin plural, but we don't use the word 'agendum' any more. It's common everyday usage to refer to data with singular pronouns like 'it' and 'this', and it's very rare to see someone referring to a single datum (as opposed to 'data point' or something).
If you want to get technical, I tend to think of data as a mass noun, like 'water' or 'information'. It's uncountable: talking about 'a water' or 'an information' doesn't make much sense, but it uses singular pronouns, as in 'this information'. If you're interested, the Oxford English Dictionary also takes this position, while Chambers leaves the choice of singular or plural noun up to you.
There is absolutely nothing wrong, in my book, with referring to data in the plural as many people still do. But it's no longer a rule and for me it's weakened further from guideline to preference.
It's like wearing a bow-tie to work. There's nothing wrong with it and some people really make it work, but it's increasingly outdated and even a little eccentric.
[^2]: or maybe you'd totally rock it.

View File

@ -0,0 +1,33 @@
---
title: 'Wiring my web'
slug: wiring-my-web
date: 2016-04-01T17:37:00+01:00
type: post
tags:
- APIs
- Web
- Automation
- IFTTT
---
<!-- [![XKCD: automation](http://imgs.xkcd.com/comics/automation.png){:.main-illustration}](https://xkcd.com/1319/) -->
{{< figure alt="XKCD: automation" src="http://imgs.xkcd.com/comics/automation.png" class="main-illustration fr" link="https://xkcd.com/1319/" >}}
I'm a nut for automating repetitive tasks, so I was dead pleased a few years ago when I discovered that [IFTTT](https://ifttt.com) let me plug different bits of the web together. I now use it for tasks such as:
- Syndicating blog posts to social media
- Creating scheduled/repeating todo items from a Google Calendar
- Making a note to revisit an article I've starred in Feedly
I'd probably only be half-joking if I said that I spend more time automating things than I save not having to do said things manually. Thankfully it's also a great opportunity to learn, and recently I've been thinking about reimplementing some of my IFTTT workflows myself to get to grips with how it all works.
There are some interesting open source projects designed to offer a lot of this functionality, such as [Huginn](https://github.com/cantino/huginn), but I decided to go for a simpler option for two reasons:
1. I want to spend my time learning about the APIs of the services I use and how to wire them together, rather than learning how to use another big framework; and
2. I only have a small Amazon EC2 server to pay with and a heavy Ruby on Rails app like Huginn (plus web server) needs more memory than I have.
Instead I've gone old-school with a little collection of individual scripts to do particular jobs. I'm using the built-in scheduling functionality of systemd, which is already part of a modern Linux operating system, to get them to run periodically. It also means I can vary the language I use to write each one depending on the needs of the job at hand and what I want to learn/feel like at the time. Currently it's all done in Python, but I want to have a go at Lisp sometime, and there are some interesting new languages like Go and Julia that I'd like to get my teeth into as well.
You can see my code on github as it develops: <https://github.com/jezcope/web-plumbing>. Comments and contributions are welcome (if not expected) and let me know if you find any of the code useful.
*Image credit: [xkcd #1319, Automation](https://xkcd.com/1319/)*

View File

@ -0,0 +1,38 @@
---
title: 'Fairphone 2: initial thoughts on the original ethical smartphone'
slug: fairphone-first-thoughts
date: 2016-05-07T16:56:29+01:00
type: post
tags:
- Gadgets
- Fairphone
- Smartphone
- Technology
- Ethics
---
<!-- ![Naked Fairphone](/assets/images/posts/2016-05-07-fairphone.jpg){:.main-illustration} -->
{{< figure alt="Naked Fairphone" src="/assets/images/posts/2016-05-07-fairphone.jpg" class="main-illustration fr" >}}
I've had my eye on the [Fairphone 2](https://www.fairphone.com/) for a while now, and when my current phone, an aging Samsung Galaxy S4, started playing up I decided it was time to take the plunge. A few people have asked for my thoughts on the Fairphone so here are a few notes.
## Why I bought it
The thing that sparked my interest, and the main reason for buying the phone really, was the ethical stance of the manufacturer. The small Swedish company have gone to great lengths to ensure that both labour and materials are sourced as responsibly as possible. They regularly inspect the factories where the parts are made and assembled to ensure fair treatment of the workers and they source all the raw materials carefully to minimise the environmental impact and the use of conflict minerals.
Another side to this ethical stance is a focus on longevity of the phone itself. This is not a product with an intentionally limited lifespan. Instead, it's designed to be modular and as repairable as possible, by the owner themselves. Spares are available for all of the parts that commonly fail in phones (including screen and camera), and at the time of writing the [Fairphone 2 is the only phone to receive 10/10 for reparability from iFixit](https://www.ifixit.com/Teardown/Fairphone+2+Teardown/52523). There are plans to allow hardware upgrades, including an expansion port on the back so that NFC or wireless charging could be added with a new case, for example.
## What I like
So far, the killer feature for me is the dual SIM card slots. I have both a personal and a work phone, and the latter was always getting left at home or in the office or running out of charge. Now I have both SIMs in the one phone: I can recieve calls on either number, turn them on and off independently and choose which account to use when sending a text or making a call.
The OS is very close to "standard" Android, which is nice, and I really don't miss all the extra bloatware that came with the Galaxy S4. It also has twice the storage of that phone, which is hardly unique but is still nice to have.
Overall, it seems like a solid, reliable phone, though it's not going to outperform anything else at the same price point. It certainly feels nice and snappy for everything I want to use it for. I'm no mobile gamer, but there is that distant promise of upgradability on the horizon if you are.
## What I don't like
I only have two bugbears so far. Once or twice it's locked up and become unresponsive, requiring a "manual reset" (removing and replacing the battery) to get going again. It also lacks NFC, which isn't really a deal breaker, but I was just starting to make occasional use of it on the S4 (mostly experimenting with my [Yubikey NEO](https://www.yubico.com/products/yubikey-hardware/yubikey-neo/)) and it would have been nice to try out Android Pay when it finally arrives in the UK.
## Overall
It's definitely a serious contender if you're looking for a new smartphone and aren't bothered about serious mobile gaming. You do pay a premium for the ethical sourcing and modularity, but I feel that's worth it for me. I'm looking forward to seeing how it works out as a phone.

7
data/Menu.yaml Normal file
View File

@ -0,0 +1,7 @@
about:
Name: "about"
URL: "/about/"
rdm:
Name: "rdm resources"
URL: "/rdm-resources/"

1
themes/sidmouth Submodule

@ -0,0 +1 @@
Subproject commit 69bc12dc33f04f23f0ebc82f583ef1fda111f476