ORBIT/Development: Difference between revisions
SimonKnight (talk | contribs) No edit summary |
SimonKnight (talk | contribs) |
||
Line 26: | Line 26: | ||
While PDFs live up to their name as a Portable Document Format, the provision of ''only'' PDF files by other providers, and us, was deemed problematic. We sought to provide export options - including in PDF - alongside more flexible, easily remixable and editable formats. | While PDFs live up to their name as a Portable Document Format, the provision of ''only'' PDF files by other providers, and us, was deemed problematic. We sought to provide export options - including in PDF - alongside more flexible, easily remixable and editable formats. | ||
<div class="mw-collapsible-content"> | <div class="mw-collapsible-content"> | ||
Within the Wiki, a decision was required regarding whether resources should be provided as: | |||
*PDF | *PDF | ||
*.doc (or similar) | *.doc (or similar) | ||
*html | *html (not editable once uploaded, but more flexible formatting than wikitext) | ||
*wikitext | |||
or a combination? | or a combination? | ||
In general, we sought to provide a wikitext version, and a .doc version for all activities, with the ability to export pages to PDF provided - for example - through the 'book creator' function. The 'book creator' was used to collate resources for our own coursebook, but could also be used by readers who wished to collect their own resources for a customised book. | |||
However, in order to provide resources in these formats, some - openly licensed - resources needed to be converted from PDF (an issue [[User:SimonKnight|Simon Knight]] discussed in a blog [http://www.nominettrust.org.uk/knowledge-centre/blogs/creative-commons-open-government-licensing-and-pdfs here]). While many tools can convert basic PDFs, including the Open Source Libre Office suite, and Google Docs, larger and more complicated PDFs are more challenging to convert in a way that preserves formatting, and reduces the time required for manual post-conversion-editing. The [http://www.pdftoword.com/ Nitro PDF converter] (free to use online) was at the time of conversion (summer 2012) found to be the most successful, although the [http://www.zamzar.com Zamzar] conversion suite (free to use online) was also very successful. However, even those programs frequently: converted table frames and text boxes as images (making them harder to edit); converted headers, footers, and some images into 'backgrounds' on word documents; failed to convert bullets and numbered lists/headings properly; and created paragraphs with line breaks between each line, as opposed to maintaining the continuous text flow. These are well known problems with PDF, and PDFs were not intended for conversion to and from the format, they are however problematic for creative commons projects - particularly those which seek to facilitate reuse, and remixing. | |||
The issue in this case is how we can release files in such a way that they can be disassembled, and reassembled in various formats, mixes, and versions. PDF is not well equipped for this role. There is a related technical issue here related to the tracking of Creative Commons content (e.g., our resources) once they are "out in the wild" - when/if they are appropriated for use on other sites (again, [[User:SimonKnight|Simon Knight]] discusses this in a blog [http://www.nominettrust.org.uk/knowledge-centre/blogs/measuring-impact-tracking-open-content-wild here]). PDFs - particularly if they have embedded images which link to an original on the authors website - can be used for this purpose, and make it particularly easy to track content in so far as PDFs cannot be disassembled so | |||
# They are less likely to be uploaded elsewhere, and more likely to remain as links to the original website and | |||
# Authors only need to track one document, not multiple sections of a document, some of which may have been versioned for particular purposes (for example, translation into another language). | |||
However, these elements of content use are things we should be seeking to encourage! It is thus important to consider as an author why you might want to track, and how that can be done to best maximise the primary aim of the resources - in our case, to provide flexible open resources for interactive teaching. | |||
</div> | </div> | ||
</div> | </div> | ||
=Attribution, Reuse, Remixing= | =Attribution, Reuse, Remixing= |
Revision as of 12:38, 11 November 2012
This page brings together a number of resources describing the development of the ORBIT wiki as a work in progress. It is not comprehensive, but is intended as an illustrative guide to some of the issues we've faced, particularly with a view that our learning might prove useful to other OER and MediaWiki projects.
Google Docs
We used google's ability to 'scrape' tables to extract information from our Wiki, and manipulate it in a spreadsheet...
Table Scraping
As the project progressed, the wiki became more complete, and the 'status' levels of resources more complex - with some resources requiring longer to gain permissions, others considered strong enough to go up on the wiki but - if time - would benefit from some editing, and others considered finalised (in so far as that's ever true on a Wiki!). At this later stage a decision was made to try and embed as much of the data from google docs into the wiki tables as possible. This was for a few reasons including
- To maintain a clear - and public - record of provenance, reasoning behind meta-data assigning, and resource progression
- To make it clearer to anyone navigating the wiki - particularly editors - what stage resources were at, and what would be needed to 'finalise' resources
- To allow for an automated check between our google docs spreadsheet, and data on the wiki, with a view to automating updates of the google spreadsheet. This was done using google's 'scraper' function.
On the Wiki we setup a number of queries of the following form, specifying the category, and information from that category to appear in the columns :{{#ask: [[Category:ToolInfo]]| ?resourcenumber| ?final| format=table | limit=200 }} Within google, a small formula can retrieve these tables, for example
- =importhtml("http://orbit.educ.cam.ac.uk/wiki/User:Bjoern/resourceoverview","table",3)
Report Writing
PDF and Resource Pages
While PDFs live up to their name as a Portable Document Format, the provision of only PDF files by other providers, and us, was deemed problematic. We sought to provide export options - including in PDF - alongside more flexible, easily remixable and editable formats.
Within the Wiki, a decision was required regarding whether resources should be provided as:
- .doc (or similar)
- html (not editable once uploaded, but more flexible formatting than wikitext)
- wikitext
or a combination?
In general, we sought to provide a wikitext version, and a .doc version for all activities, with the ability to export pages to PDF provided - for example - through the 'book creator' function. The 'book creator' was used to collate resources for our own coursebook, but could also be used by readers who wished to collect their own resources for a customised book.
However, in order to provide resources in these formats, some - openly licensed - resources needed to be converted from PDF (an issue Simon Knight discussed in a blog here). While many tools can convert basic PDFs, including the Open Source Libre Office suite, and Google Docs, larger and more complicated PDFs are more challenging to convert in a way that preserves formatting, and reduces the time required for manual post-conversion-editing. The Nitro PDF converter (free to use online) was at the time of conversion (summer 2012) found to be the most successful, although the Zamzar conversion suite (free to use online) was also very successful. However, even those programs frequently: converted table frames and text boxes as images (making them harder to edit); converted headers, footers, and some images into 'backgrounds' on word documents; failed to convert bullets and numbered lists/headings properly; and created paragraphs with line breaks between each line, as opposed to maintaining the continuous text flow. These are well known problems with PDF, and PDFs were not intended for conversion to and from the format, they are however problematic for creative commons projects - particularly those which seek to facilitate reuse, and remixing.
The issue in this case is how we can release files in such a way that they can be disassembled, and reassembled in various formats, mixes, and versions. PDF is not well equipped for this role. There is a related technical issue here related to the tracking of Creative Commons content (e.g., our resources) once they are "out in the wild" - when/if they are appropriated for use on other sites (again, Simon Knight discusses this in a blog here). PDFs - particularly if they have embedded images which link to an original on the authors website - can be used for this purpose, and make it particularly easy to track content in so far as PDFs cannot be disassembled so
- They are less likely to be uploaded elsewhere, and more likely to remain as links to the original website and
- Authors only need to track one document, not multiple sections of a document, some of which may have been versioned for particular purposes (for example, translation into another language).
However, these elements of content use are things we should be seeking to encourage! It is thus important to consider as an author why you might want to track, and how that can be done to best maximise the primary aim of the resources - in our case, to provide flexible open resources for interactive teaching.