Digital Humanities
Foreword
Notes. From:
- The Historian’s Macroscope
- Book exploring big historical data.
- Digital Humanities
- Standford Humanities Center.
Blogs¶
- Historyonics.
- Day of Archaeology.
- The Day of Archaeology is an event where archaeologists write about their activities on a group blog.
- Currently there are over 1000 posts on the blog; a lot to read in one sitting. Rather than closely read each post, we can do a distant reading to get some insights into the corpus. Distant reading refers to efforts to understand texts through quantitative analysis and visualisation.
- Global Perspective on Digital History.
- The Programming Historian
- Lessons, projects, research, blog.
- The Programming Historian offers novice-friendly, peer-reviewed tutorials that help humanists learn a wide range of digital tools, techniques, and workflows to facilitate their research.
- ITHAKA S+R.
- Provides research and strategic guidance to help the academic community navigate economic and technological change.
Cases¶
- Big Data + Old History.
- Video; dig into documents without reading them.
- Old Bailey.
- The proceedings of the Old Bailey, 1674-1913. A fully searchable edition of the largest body of texts detailing the lives of non-elite people ever published, containing 197,745 criminal trials held at London’s central criminal court.
- White paper; Data Mining with Criminal Intent.
Courses¶
Data and Models¶
Datasets and Projects¶
- Le Programme de recherche en démographie historique.
- Données généalogiques de la Nouvelle France au Québec contemporain.
- ORBIS.
- The Stanford Geospatial Network Model of the Roman World.
- Could become a boardgame (or an app).
- Pelagios.
- LOTR Project.
- LotrProject is dedicated to bringing J.R.R. Tolkien’s works to life through various creative web projects (genealogy, interactive maps, timelines, and statistics).
- Computational Folkloristics.
- Mapping folktales and linking themes.
- Magazine.
- Article; Big Folklore: A Special Issue on Computational Folkloristics.
- Dataverse.
- Dataverse collects data on social-scientific, health, and environmental data for the world as a whole for the past four or five centuries.
- CLIWOC.
- Climatological Database for the World’s Oceans 1750-1850.
- Wikipedia.
- Royal Netherlands Meteorological Institute.
Networks¶
- Access Linked Open Data.
- RDF databases, graph databases, and how researchers can access these data though the query language called SPARQL.
- RDF represents information in a series of three-part ‘statements’ that comprise a subject, predicate, and an object.
- Network visualizations.
- Data extraction and network visualization of historical sources.
- UCINETS.
- UCINET 6 is a software package for the analysis of social network data.
- Pajek.
- Analysis and visualization of large networks.
- Analyse des réseaux : une introduction à Pajek.
- Network Workbench.
- A Large-Scale Network Analysis, Modeling and Visualization Toolkit for Biomedical, Social Science and Physics Research.
- Sci2.
- The Science of Science (Sci2) Tool is a modular toolset specifically designed for the study of science.
- It supports the temporal, geospatial, topical, and network analysis and visualization of scholarly datasets at the micro (individual), meso (local), and macro (global) levels.
- NodeXL.
- Network overview, discovery and exploration for Excel.
- Gephi.
- Visualization and exploration software for all kinds of graphs and networks.
- Viewer.
- Exporter.
- Web Export.
Tools, Data Mining and Analyzing¶
- Google Ngram.
- Mine Google Books.
- Online and an API is available.
- R and Python.
- Packages for text mining, text analysis, NLP (natural language processing).
- Topic Modeling Tool.
- Unix.
- Mining text and qualitative data with Unix.
- Download, trim, dig into dir and subdir, find patterns, count files, lines, words, save into a new file or a subdir, count instances and other stats like concordances, use pipelines, create lists with unix commands, etc.
- OpenRefine.
- Clean data, remove duplicate records, separate multiple values contained in the same field, analyse the distribution of values throughout a data set, group together different representations of the same reality, etc.
- AntConc.
- Corpus analysis.
- Create/download a corpus of texts, conduct a keyword-in-context search, identify patterns surrounding a particular word, use more specific search queries, look at statistically significant differences between corpora, make multi-modal comparisons using corpus lingiustic methods.
- Many other tools on Laurence Anthony’s Website:
- AntConc : a freeware corpus analysis toolkit for concordancing and text analysis.
- AntPConc : a freeware parallel corpus analysis toolkit for concordancing and text analysis using UTF-8 encoded text files.
- AntWordProfiler : a freeware tool for profiling the vocabulary level and complexity of texts.
- AntFileConverter : a freeware tool to convert PDF and Word (DOCX) files into plain text for use in corpus tools like AntConc.
- AntMover : a freeware text structure (moves, outline, flow) analysis program.
- AntCLAWSGUI : a front-end interface to the CLAWS tagger developed at Lancaster University.
- EncodeAnt : a freeware tool for detecting and converting character encodings.
- FireAnt : a freeware social media and data analysis toolkit (developed in collaboration with Claire Hardaker of Lancaster University).
- ProtAnt : a freeware prototypical text analysis tool (developed in collaboration with Paul Baker of Lancaster University).
- SarAnt : a freeware batch search and replace tool.
- SegmentAnt : a freeware Japanese and Chinese segmenter (segmentation/tokenizing tool).
- TagAnt : a freeware Part-Of-Speech (POS) tagger built on TreeTagger (developed by Helmut Schmid).
- VariAnt : A freeware spelling VariAnt analysis program.
- Beautiful Soup.
- Python module.
- Text parsing.
- TaPOR.
- Gateway to the tools for sophisticated text analysis and retrieval.
- Paper Machines.
- Visualize thousands of texts.
- Plugin for the Zotero bibliographic management software.
- Voyant.
- Online tool for concordances, wordles, stats, graphics.
- Can be set on a local server.
- Documentation
- Overview.
- Search, visualize, and review your documents.
- Up to hundreds of thousands of them, in any format.
- MALLET.
- NLP toolkit and machine learning.
- Topic Analysis, keywords, bags of words.
- Topic modeling tool takes a single text (or corpus) and looks for patterns in the use of words; it is an attempt to inject semantic meaning into vocabulary.
- Topic models represent a family of computer programs that extract topics from texts. A topic to the computer is a list of words that occur in statistically meaningful ways. A text can be an email, a blog post, a book chapter, a journal article, a diary entry – that is, any kind of unstructured text.
- Documentation.
- Article; Getting Started with Topic Modeling and MALLET.
- Stanford Topic Modeling Toolbox.
- The Stanford Topic Modeling Toolbox (TMT) brings topic modeling tools to social scientists and others who wish to perform analysis on datasets that have a substantial textual component:
- Import and manipulate text from cells in Excel and other spreadsheets.
- Train topic models (LDA, Labeled LDA, and PLDA new) to create summaries of the text.
- Select parameters (such as the number of topics) via a data-driven process.
- Generate rich Excel-compatible outputs for tracking word usage across topics, time, and other groupings of data.
- The Stanford Topic Modeling Toolbox (TMT) brings topic modeling tools to social scientists and others who wish to perform analysis on datasets that have a substantial textual component:
- Regexr.
- Online tool for processing regular expression or regex.
- Regex can also be done.
- Tutorial on how to process regular expressions in Notepad++, on TextWrangler or other text/code editors (Vim, Emacs, etc.).
Tools, Mining the Web¶
- Retrieving Web Archive:
- Mining the Internet Archive Collection.
internetarchive
Python package.
- Wget.
- Automated downloading with Wget.
- Pull data from the web.
- Query.
- Downloading many records using Python.
- How to check if a directory exists and create it if necessary.
- Figshare.
- Web scraping.
- Outwit.
- Find, grab and organize all kinds of data and media from online sources.
- XPath.
- Web scraping, screen scraping, data parsing and other related things.
- About XPath.
- Importio.
- Extract web data the easy way.
- Tabula.
- Extract data from PDF.
Tools, Referencing¶
- Zotero.
- Standalone and add-in to Mozilla Firefox.
- Add-in to text processors (including LaTeX and Markdown editors).
- Dig into journals and books primary sources.
- Collect, organize, cite, and share your research sources.
- JSOR.
- Journals, primary sources, and now BOOKS.
Tools, Scanning¶
- OCR scanner
- Digitization material documents (from books, letters to maps).
- The MNIST Database.
- The MNIST database of handwritten digits has a training set of 60,000 examples, and a test set of 10,000 examples (used in machine learning to process NLP).
- It is a subset of a larger set available from NIST.
- The digits have been size-normalized and centered in a fixed-size image.
Tools, Visualization¶
- Sparklines are mini-graphics. Add-in to Excel.
- GIS: the best open-source database for mapping and GIS is PostgreSQL.
- Google Maps and Google Earth.
- Google My Maps and Google Earth provide an easy way to start creating digital maps. With a Google Account you can create and edit personal maps by clicking on My Places.
- In My Maps you can choose between several different base maps (including the standard satellite, terrain, or standard maps) and add points, lines and polygons. It is also possible to import data from a spreadsheet, if you have columns with geographical information (i.e. longitudes and latitudes or place names). This automates a formerly complex task known as geocoding. Not only is this one of the easiest ways to begin plotting your historical data on a map, but it also has the power of Google’s search engine. As you read about unfamiliar places in historical documents, journal articles or books, you can search for them using Google Maps. It is then possible to mark numerous locations and explore how they relate to each other geographically. Your personal maps are saved by Google (in their cloud), meaning you can access them from any computer with an internet connection. You can keep them private or embed them in your website or blog. Finally, you can export your points, lines, and polygons as KML files and open them in Google Earth or Quantum GIS.
- QGIS
- Omeka.
- Displays items, collections, like in museums, libraries, archives with narratives.
- Augmented Reality.
- Overlaying of digital content (images, video, text, sound, etc.) onto physical objects or locations, and it is typically experienced by looking through the camera lens of an electronic device such as a smartphone, tablet, or optical head-mounted display (e.g. Microsoft Hololens).
- Although AR is a cutting-edge, complex technology, there are a number of user-friendly platforms that allow people with no previous coding experience to create compelling augmented reality experiences.
- D3.js.
- JavaScript visualization for the web; server-side.
- D3 Examples.
- A freelancer’s gallery.
- Bokeh.
- Top 7 Free Infographics Tools & Online Makers in 2016.
- SHANTI INTERACTIVE.
- Suite of tools that make it easy to create highly interactive web-based visualizations, videos, and maps. They are freely available from the University of Virginia’s Sciences, Humanities & Arts Network of Technological Initiatives (SHANTI).
- Qmedia provides new ways to use video for instructional and scholarly purposes. The viewer interacts with the whole screen and sees a wide array of web-based resources and offers an immersive experience that adds context.
- SHIVA takes a new approach that makes it easy to add graphical and data-driven visualizations to websites. Elements such as data, charts, network graphs, maps, image montages, and timelines are easily created.
- MapScholar is an online platform for geospatial visualization funded by the NEH. It enables humanities and social science scholars to create digital “atlases” featuring high-resolution images of historic maps.
- VisualEyes is web-based authoring tool for historic visualization funded by the NEH to weave images, maps, charts, video and data into highly interactive and compelling dynamic visualizations.
- VisualEyes5 is a HTML5 version of the VisualEyes authoring tool for historic visualization to weave images, maps, charts, video and data into highly interactive and compelling dynamic visualizations.
- Suite of tools that make it easy to create highly interactive web-based visualizations, videos, and maps. They are freely available from the University of Virginia’s Sciences, Humanities & Arts Network of Technological Initiatives (SHANTI).