Quant viz

•Transferir como PPTX, PDF•

1 gostou•968 visualizações

Tony Hirst

Educação Tecnologia

Stop wordlists to tf-idf…

term frequency-inverse document frequency

Information retrieval research

wordlists to tf-idf…

Term frequency
term frequency-inverse document frequency

Inverse
Document frequency
Information retrieval research

tf-idf

LARGE when a term is common
in a small number of docs

SMALL when a term is common
across many docs

http://chrisharrison.net/projects/wordspectrum/pdfs/mac-pc-dist0.pdf

http://www.prochronism.com/2012/04/mad-men-3-get-off-phone.html
http://www.prochronism.com/

Presentation Graphics
vs.
Visual Analysis

Explanatory visualization
Data visualizations that are used to
transmit information or a point of
view from the designer to the
reader. Explanatory visualizations
typically have a specific “story” or
information that they are intended
to transmit.

Exploratory visualization
Data visualizations that are used by
the designer for self-informative
purposes to discover
patterns, trends, or sub-problems
in a dataset. Exploratory
visualizations typically don’t have
an already-known story.

http://www.datarevelations.com/the-likert-
question-question.html

http://www.datarevelations.com/the-likert-question-question.html

http://www.datarevelations.com/the-likert-question-question.html

http://www.organizationview.com/net-stacked-distribution-a-better-way-to-visualize-likert-

Data sketches
[ Amanda Cox, New York Times ]

Reproducible Research

Reproducible Visualisation

http://tdwi.org/Articles/2010/04/14/Data-Visualization.aspx?Page=3

Emergent
EmergenEEeee

Social
Positioning

-grab a list of companies that may be associated with
“Tesco” by querying the OpenCorporates reconciliation API
for tesco

-grab the filings for each of those companies

-trawl through the filings looking for director
appointments or terminations

- store a row for each directorial appointment or
termination including the company name and the director.

Visualising Structure
and
Visual Signatures

http://www.neoformix.com/2008/DirectedSentenceDiagrams.html

Mais conteúdo relacionado

Mais procurados

Web_Mining_Overview_Nfaoui_El_HabibEl Habib NFAOUI

PhD Projects in Audio Speech Language Processing TutorialPhD Services

NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...National Information Standards Organization (NISO)

Mendeley Data FAIR hackathonLuiz Olavo Bonino da Silva Santos

Information Extraction and Linked Data CloudDhaval Thakker

DC-2008 Identifiers presentationMikael Nilsson

Tesxt miningMaurice Masih

FAIR Data ecosystemLuiz Olavo Bonino da Silva Santos

Tovek Presentation by Livio Costantinimaxfalc

Semantic searchAndreas Blumauer

Implementing Semantic SearchPaul Wlodarczyk

Text miningAli A Jalil

Textmining Introductionguest0edcaf

DRI Introductory Training: Introduction to Metadatadri_ireland

Open hpi semweb-06-part4Nadine Ludwig

Linked Data and SevicesPlanetData Network of Excellence

Tovek Presentation 2 by Livio Costantinimaxfalc

Introduction to Text MiningMinha Hwang

Preparing Data for Sharing: The FAIR PrinciplesLondon School of Hygiene and Tropical Medicine

Text miningThejeswiniChivukula

Mais procurados (20)

Web_Mining_Overview_Nfaoui_El_Habib

PhD Projects in Audio Speech Language Processing Tutorial

NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...

Mendeley Data FAIR hackathon

Information Extraction and Linked Data Cloud

DC-2008 Identifiers presentation

Tesxt mining

FAIR Data ecosystem

Tovek Presentation by Livio Costantini

Semantic search

Implementing Semantic Search

Text mining

Textmining Introduction

DRI Introductory Training: Introduction to Metadata

Open hpi semweb-06-part4

Linked Data and Sevices

Tovek Presentation 2 by Livio Costantini

Introduction to Text Mining

Preparing Data for Sharing: The FAIR Principles

Text mining

Semelhante a Quant viz

Channeling insights to the right peopleSebastien Lefebvre

HPE IDOL Technical Overview - july 2016Andrey Karpov

Göteborg university(condensed)Zenodia Charpy

KM SHOWCASE 2020 - "Lessons Learned Building a Knowledge Graph" - Chris MarinoKM Institute

Sourcing with Social Media: Tips from a Corporate Sleuth by Sean CampbellReynolds Center for Business Journalism

Semantic Web, e-commerceSemantic Web San Diego

Building arguments on Open DataOpen Knowledge Belgium

lawTechCamp - Knowledge Management Panellawtechcamp

Questions On The And FootballAmanda Gray

Linked data for Enterprise Data IntegrationSören Auer

Session 0.0 poster minutes madnesssemanticsconference

Project Credit: Melissa Haendel - On the Nature of CreditCASRAI

On the nature of Creditmhaendel

How to Create Controlled Vocabularies for Competitive IntelligenceIntelCollab.com

Python for Data Science - TDC 2015Gabriel Moreira

Advanced SEO - Digital Content CreatorsAndrea Berberich

Advanced SEO for Digital Content CreatorsAndrea Berberich

Alitora Innovation Networksalitora

Transcript - DOIs to support citation of grey literatureARDC

Federated Search Webinar for SLA (Special Libraries Assoc.)Helen Mitchell

Semelhante a Quant viz (20)

Channeling insights to the right people

HPE IDOL Technical Overview - july 2016

Göteborg university(condensed)

KM SHOWCASE 2020 - "Lessons Learned Building a Knowledge Graph" - Chris Marino

Sourcing with Social Media: Tips from a Corporate Sleuth by Sean Campbell

Semantic Web, e-commerce

Building arguments on Open Data

lawTechCamp - Knowledge Management Panel

Questions On The And Football

Linked data for Enterprise Data Integration

Session 0.0 poster minutes madness

Project Credit: Melissa Haendel - On the Nature of Credit

On the nature of Credit

How to Create Controlled Vocabularies for Competitive Intelligence

Python for Data Science - TDC 2015

Advanced SEO - Digital Content Creators

Advanced SEO for Digital Content Creators

Alitora Innovation Networks

Transcript - DOIs to support citation of grey literature

Federated Search Webinar for SLA (Special Libraries Assoc.)

Mais de Tony Hirst

15 in 20 research fiestaTony Hirst

Dev8d jupyterTony Hirst

Ili 16 robotTony Hirst

Jupyternotebooks ou.pptxTony Hirst

Virtual computing.pptxTony Hirst

ouseful-parlihacksTony Hirst

Gors appropriateTony Hirst

Robotlab jupyterTony Hirst

Fco open data in half day th-v2Tony Hirst

Notes on the Future - ILI2015 WorkshopTony Hirst

Community Journalism Conf - hyperlocal data wireTony Hirst

Residential school 2015_robotics_interestTony Hirst

Data Mining - Separating Fact From Fiction - NetIKXTony Hirst

Week4Tony Hirst

A Quick Tour of OpenRefineTony Hirst

Conversations with dataTony Hirst

Data reuse OU workshop bingoTony Hirst

Inspiring content - You Don't Need Big Data to Tell Good Data Stories Tony Hirst

Lincoln jun14datajournalismTony Hirst

Mais de Tony Hirst (20)

15 in 20 research fiesta

Dev8d jupyter

Ili 16 robot

Jupyternotebooks ou.pptx

Virtual computing.pptx

ouseful-parlihacks

Gors appropriate

Robotlab jupyter

Fco open data in half day th-v2

Notes on the Future - ILI2015 Workshop

Community Journalism Conf - hyperlocal data wire

Residential school 2015_robotics_interest

Data Mining - Separating Fact From Fiction - NetIKX

Week4

A Quick Tour of OpenRefine

Conversations with data

Data reuse OU workshop bingo

Inspiring content - You Don't Need Big Data to Tell Good Data Stories

Lincoln jun14datajournalism

Último

Earth Day Presentation wow hello nice greatYousafMalik24

ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1

Difference Between Search & Browse Methods in Odoo 17Celine George

Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝9953056974 Low Rate Call Girls In Saket, Delhi NCR

Influencing policy (training slides from Fast Track Impact)Mark Reed

OS-operating systems- ch04 (Threads) ...Dr. Mazin Mohamed alkathiri

ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli

LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxConquiztadors- the Quiz Society of Sri Venkateswara College

AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb

YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxConquiztadors- the Quiz Society of Sri Venkateswara College

THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña

Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105

YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxConquiztadors- the Quiz Society of Sri Venkateswara College

DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu

TataKelola dan KamSiber Kecerdasan Buatan v022.pdfSarwono Sutikno, Dr.Eng.,CISA,CISSP,CISM,CSX-F

Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup

Science 7 Quarter 4 Module 2: Natural Resources.pptxMaryGraceBautista27

Field Attribute Index Feature in Odoo 17Celine George

Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri

Grade 9 Q4-MELC1-Active and Passive Voice.pptxChelloAnnAsuncion2

Quant viz

1. Qualitatively Visual Tony Hirst Dept of Communication & Systems, blog.ouseful.info The Open University @psychemedia

3. STOP Words

4. Stop wordlists to tf-idf… term frequency-inverse document frequency Information retrieval research

5. wordlists to tf-idf… Term frequency term frequency-inverse document frequency Inverse Document frequency Information retrieval research

6. tf-idf LARGE when a term is common in a small number of docs SMALL when a term is common across many docs

10.

11.

12. http://chrisharrison.net/projects/wordspectrum/pdfs/mac-pc-dist0.pdf

13.

14.

15.

16. http://www.prochronism.com/2012/04/mad-men-3-get-off-phone.html http://www.prochronism.com/

17. http://www.prochronism.com/

18.

19. Presentation Graphics vs. Visual Analysis

20. Explanatory visualization Data visualizations that are used to transmit information or a point of view from the designer to the reader. Explanatory visualizations typically have a specific “story” or information that they are intended to transmit. Exploratory visualization Data visualizations that are used by the designer for self-informative purposes to discover patterns, trends, or sub-problems in a dataset. Exploratory visualizations typically don’t have an already-known story.

21. http://www.datarevelations.com/the-likert- question-question.html http://www.datarevelations.com/the-likert-question-question.html

22. http://www.datarevelations.com/the-likert-question-question.html

23. http://www.datarevelations.com/the-likert-question-question.html

24.

25. http://www.organizationview.com/net-stacked-distribution-a-better-way-to-visualize-likert-

26. Data sketches [ Amanda Cox, New York Times ]

27.

28.

29.

30. Reproducible Research Reproducible Visualisation

31. @mediaczar (Accession Plot)

32.

33.

34.

35. http://tdwi.org/Articles/2010/04/14/Data-Visualization.aspx?Page=3

36. http://eagereyes.org/techniques/spirals

37.

38.

39.

40.

41.

42.

43.

44.

45.

46.

47. Emergent EmergenEEeee Social Positioning

48. Is followed by A focus B

49. peer Is followed by A focus B peer

50. peer Is followed by A focus B

51. Google+(Python)

52. Co-tags/co-topics

53. Journalists by co-tag

54. Friends’ Likes (Google Refine)

55.

56.

57.

58.

59.

60.

61. -grab a list of companies that may be associated with “Tesco” by querying the OpenCorporates reconciliation API for tesco -grab the filings for each of those companies -trawl through the filings looking for director appointments or terminations - store a row for each directorial appointment or termination including the company name and the director.

62.

63.

64. Visualising Structure and Visual Signatures

65.

66.

67. http://www.neoformix.com/2008/DirectedSentenceDiagrams.html

68.

69. Data Driven Storytelling

70.

71.

72. Many eyes word tree

73.

74. @psychemedia blog.ouseful.info

Notas do Editor

Let pi,j be the rate at which word i occurs in document j, and pj be the average across documents( sum Pij/ndocs)The size of each word is mapped to its maximum deviation ( maxi(pi,j- pj ) ), and its angular position is determined by the document where that maximum occurs.
“Using Google's enormous bigram dataset, I produced a series of visualizations that explore word associations. Each visualization pits two primary terms against each other. Then, the use frequency of words that follow these two terms are analyzed. For example, "war memorial" occurs 531,205 times, while "peace memorial" occurs only 25,699. A position for each word is generated by looking at the ratio of the two frequencies. If they are equal, the word is placed in the middle of the scale. However, if there is a imbalance in the uses, the word is drawn towards the more frequently related term. This process is repeated for thousands of other word combinations, creating a spectrum of word associations. Font size is based on a inverse power function (uniquely set for each visualization, so you can't compare across pieces). Vertical positioning is random.To better achieve a even distribution, I normalized the frequencies of bigrams based on total primary term frequency. So, for example, in the case of war vs. peace, there are 81,839,381 bigrams starting with war and 31,263,375 bigrams starting with peace. If I render the spectrum without normalization, it ends up lopsided toward war (since the usage totals are so much higher). To compensate, I scale down all of war's bigrams so that the overall frequencies are even.”
“it shows the relative weight of the phrases starting with 'business' that get used in books, in a way so that all the lines add up to 100% for any given year. This gives a good picture of how the use of 'business' changes relative to itself, without the overall trends for the individual words and with stopwords filtered out:”
“Bookworm demonstrates a new way of interacting with the millions of recently digitized library books. The Harvard Cultural Observatory already collaborated with Google Books on the Google ngramsviewerthat has data for years. Bookworm doesn't work so closely with Google Books: instead, it uses texts in the public domain, in this case, books from the Open Library and Internet Archive. They have gathered millions of digital texts, and the descriptions of them librarians have made over the last two centuries. Bookworm uses that information to let you search for trends in any corpus you can create out of the library metadata, and to link to the underlying books so you can read them.”
http://www.datarevelations.com/the-likert-question-question.html
http://www.datarevelations.com/the-likert-question-question.html
Also how you position marks on a canvas in relation to each other
See also http://ieg.ifs.tuwien.ac.at/~aigner/teaching/ws06/infovis_ue/papers/spiralgraph_weber01visualizing.pdf http://www-users.cs.umn.edu/~carlis/spiral.pdf
http://eagereyes.org/techniques/spirals
Emergent Social Positioning: origins: 1.5 degree egonet (how followers follow each other, how hashtaggers follow each other)- projection maps from followers to folk they commonly follow;-- projection maps from hashtaggers to folk they commonly follow- projection maps from friends to folk who commonly follow them
AP Wikileaks Iraq war: http://www.guardian.co.uk/news/datablog/2010/dec/16/wikileaks-iraq-visualisation “Each report is a dot, labeled by its key words. Reports with similar key words have edges drawn between them. The location of the dot has nothing to do with geography. Instead, we ran an algorithm that pulls dots with edges between them closer together. Then we labeled each cluster by the key words that are common to the reports in that cluster, and colored each report/dot by the "incident type," as entered by military personnel. The result is an abstract map of the bloodiest month of the war.”
Issuecrawler http://www.govcom.org/Issuecrawler_instructions.htm#4“I wanted to see if weeks later, I could identify #ididnotreport as a visible issue in web sphere. More specifically, I wanted see if I could use network analysis techniques to see to see if this sombering meme was still a cross-platform issue, prevalent across Twitter, Tumblr, Pinterest and other websites.One of the key tools to use for defining issues in the web sphere is the Digital Methods Initiative’s “Issue Crawler“, a web network location and visualization software that consists of crawlers, analysis engines and visualisation modules, that crawl specified sites and capture the outlinks from those specified sites.I used Issue Crawler’s Co-link analysis module to crawl the seed URLs by page from the query term “#ididnotreport” through 3 iterations – this then retained the pages that received at least two links (at a crawl depth of 2) from the seeds. I then used a Cluster Map to plot my issuecrawl result as a spring map.”
“Google Scraper allows you to harvest the top 100 Google search returns for your chosen issue, and then input these into it’s processing tool, which then outputs a measured result in the form of “issue clouds” that you can use to analyse the prevalence of the perceived issue/key words you have scraped.For this study of Tumblr, I scraped and harvested the top 100 Google search returns for “ididnotreportsite:tumblr.com”.”
“I call my diagrams Directed Sentence Drawings because the direction of the line segments are a function of their topic. As before, each sentence is assigned a topic or remains neutral based on the vocabulary it contains. I place a neutral point in the middle of the diagram and four other topic points form a diamond shape around it (see below). For the State of the Union diagrams produced below I used the four topics Government, Domestic, Economy, and Security. The algorithm is as follows:start at the neutral pointfind the topic for the sentence and use it to set the color for the linedraw the line from the current position towards the topic that it is aboutthe length of the line is proportional to the length of the sentenceif the line is continuing in the same direction as the last segment, draw a small circle at the starting pointif the line is reversing direction, use a small arc to shift it over so it doesn't overlay the previous segment”
“Rather than using arcs to connect identical patterns within a document I'm connecting instead segments that contain similar words. Here is the algorithm:break the document up into a stream of wordsthrow away any 'stop words' (a, at, of, the ...)divide the remaining stream of more interesting words into 50 equal segments based on linear positioncalculate a similarity metric between each pair of segments based on the amount of overlapping wordsdraw a diagram where the document segments are connected by arcs with the transparency determined by the similarity between the segments. Use a threshold so that weakly connected arcs don't get drawn at all.show the top two words for each arc drawn at both segment endpoints”
http://www.improving-visualisation.org/visuals/tag=qualitative
Unlike most Many Eyes visualizations, the word tree starts with a blank slate instead of a full visualization of the data. You must choose a search term to display a word tree. After a word or phrase is typed, the computer finds all the occurrences of that term, along with the phrases that appear after it. For instance, the word "word" occurs a number of times in the previous paragraphs.You will notice that in the words following "word" there are many repeated phrases. For instance, "tree" follows "word" five times, and "or phrase" follows three times. To create a word tree, the computer merges all the matching phrases.You can manipulate the tree in several ways. To zoom into a particular branch, clicking on a word in the tree. If you control-click on a word, the diagram will use that new word as the main search term. And if you wish to see the context occurring before rather than after a phrase, select End. As you navigate the word tree, you can use the Back and Forward buttons just as you would in a browser to quickly step through your history of views.
Many eyes phrase net“Phrase net analyzes a text by looking for pairs of words that fit particular patterns. You can specify this pattern by using asterisks as wildcard characters. For instance, the pattern "* and *" will match phrases like "play and sing" or "vexation and regret." Punctuation matters, so it will not match "left, and then". You can choose from some useful defaults or you can type your own patterns in the field below the list.After you specify a pattern, the program creates a network diagram of the words it finds as matches. Two words are connected if they occur in the same phrase. The size of a word is proportional to the number of times it occurs in a match; the thickness of an arrow between words tells you how many times those two words occur in the same phrase. The color of a word indicates whether it is more likely to be found in the first or second slot of a pattern. The darker the word, the more often it appears in the first position.DefiningpatternsMatching different patterns gives different views of the text. Each text is unique, so it is worth experimenting. For instance, looking for the pattern "* and *" will often highlight key related concepts. In contrast, the pattern "* 's *" will often result in a diagram of the main people and the things they possess. The simplest pattern is "* *" which links words if they come in immediate succession; this is often provides a surprisingly clear view, especially for short documents. Sometimes there is a special pattern that will provide information on a particular document. For example, applying "* begat *" to the King James Bible yields a rough family tree.There are three ways to specify a pattern. The easiest is to choose one of the defaults from the list on the left. A second way is to type a pattern with two asterisks for the "slots" of the pattern. Note that you need exactly two asterisks for the pattern to work. Finally, there's an advanced programmers-only option, which is to use a "regular expression" with two capturing groups. For an introduction to regular expressions, read this tutorial (java.sun.com/docs/books/tutorial/essential/regex/).”

Quant viz

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Quant viz

Semelhante a Quant viz (20)

Mais de Tony Hirst

Mais de Tony Hirst (20)

Último

Último (20)

Quant viz

Notas do Editor