Scoda project companygraph

•Transferir como PPTX, PDF•

0 gostou•961 visualizações

Tony Hirst

Negócios Tecnologia

Visualising Companies Connected
by Common Projects

https://finances.worldbank.org/Procurement/Nigeria-Major-Contracts-Energy-and-Mining-Sector/fay4-hpn4

How might we
look at this data
as a network?

Download the data
as CSV
Open it as a new
OpenRefine project

So what can we
represent in
visual terms?

Companies and Projects

A Ltd

B Ltd

XY01

KT98

C Ltd

PZ53

PZ02

D Ltd

Companies

Projects

Use Layouts to Reveal Structure

Turn
labels on

Zoom
label size

Select what
to use as labels

Edge-Thickness by Value

A Ltd

B Ltd

XY01

KT98

C Ltd

PZ53

PZ02

D Ltd

B Ltd

Size by degree

A Ltd

XY01

KT98

C Ltd
PZ53

PZ02

D Ltd

Set node size

Network
statistics

Prevent
labels
overlapping

Label size
Proportional
to node size

Co-Companies (by project)

A Ltd

Edge weight = no. of shared projects
A
Lt
d

B Ltd

C Ltd

B
Lt
d

XY
01

KT
98

C
Lt
d

PZ
53

PZ
02

D
Lt
d

D Ltd

(Remember to turn off
any filters so we have
the whole network
in view…)

Join “blue”” company nodes
connected by common red
project nodes

Select the edit tool,
Click on a project node,
Note the Node Color Multimode

Co-projects (by company)

XY01

A
Lt
d

KT98

PZ53

PZ02

B
Lt
d

XY
01

KT
98

C
Lt
d

PZ
53

Edge weight = no. of common companies
PZ
02

D
Lt
d

Go back to the original, complete network
in workspace 1 and duplicate it again

This time find the projects that are
connected by common companies

So what have you learned?
- How to export selected columns from OpenRefine
- How to import CSV data into Gephi

- How to visualise a simple network in Gephi
- How to map a bipartitie network to show relations
between entities connected by a common element
-…

Mais conteúdo relacionado

Semelhante a Scoda project companygraph

Visualizing Open Data with Neo4JScott Sosna

14.02.2017 Business model innovation slidesGODAN Secretariat

Successful cloud partners idc (en)Jarek Sokolnicki

Tech Scouting (Companies) Workflowquidsupport

Tech Scouting (Companies & Patents)quidsupport

Irish Software Association/ Open Data Bisiness ModelsJonathan Raper

Developing A Big Data Analytics Framework for Industry IntelligenceGene Moo Lee

How Graphs Continue to Revolutionize The Prevention of Financial Crime & Frau...Connected Data World

GPS: Starting Out with the AWS Partner Network - GPSBUS223 - re:Invent 2017Amazon Web Services

GPSBUS223-Starting Out with the AWS Partner NetworkAmazon Web Services

T Bytes Agile & AI OperationsEGBG Services

Intro To The Calais Web Service @ OpenCalais.comKrista Thomas

Wait, What? It’s Already Done? The State of Colorado’s Effective Migration to...Amazon Web Services

Open government international garry lloydGarry Lloyd

Amazon Case Study PaperOlga Bautista

addressing-need-based-consumerism-for-cloud-servicesRobert Bates

Cloud Computing and Edge Computing(CTO Kieun Park) - Edge Computing SeminarNAVER CLOUD PLATFORMㅣ네이버 클라우드 플랫폼

Catalysing Sector AdvantageChandan Rajah

How to Utilize Analytics to Better Understand Your Donors.pdfTechSoup

AWS Hong Kong Partner ConneXions - Welcome & UpdateAmazon Web Services

Semelhante a Scoda project companygraph (20)

Visualizing Open Data with Neo4J

14.02.2017 Business model innovation slides

Successful cloud partners idc (en)

Tech Scouting (Companies) Workflow

Tech Scouting (Companies & Patents)

Irish Software Association/ Open Data Bisiness Models

Developing A Big Data Analytics Framework for Industry Intelligence

How Graphs Continue to Revolutionize The Prevention of Financial Crime & Frau...

GPS: Starting Out with the AWS Partner Network - GPSBUS223 - re:Invent 2017

GPSBUS223-Starting Out with the AWS Partner Network

T Bytes Agile & AI Operations

Intro To The Calais Web Service @ OpenCalais.com

Wait, What? It’s Already Done? The State of Colorado’s Effective Migration to...

Open government international garry lloyd

Amazon Case Study Paper

addressing-need-based-consumerism-for-cloud-services

Cloud Computing and Edge Computing(CTO Kieun Park) - Edge Computing Seminar

Catalysing Sector Advantage

How to Utilize Analytics to Better Understand Your Donors.pdf

AWS Hong Kong Partner ConneXions - Welcome & Update

Mais de Tony Hirst

15 in 20 research fiestaTony Hirst

Dev8d jupyterTony Hirst

Ili 16 robotTony Hirst

Jupyternotebooks ou.pptxTony Hirst

Virtual computing.pptxTony Hirst

ouseful-parlihacksTony Hirst

Gors appropriateTony Hirst

Robotlab jupyterTony Hirst

Notes on the Future - ILI2015 WorkshopTony Hirst

Community Journalism Conf - hyperlocal data wireTony Hirst

Residential school 2015_robotics_interestTony Hirst

Data Mining - Separating Fact From Fiction - NetIKXTony Hirst

Week4Tony Hirst

A Quick Tour of OpenRefineTony Hirst

Conversations with dataTony Hirst

Data reuse OU workshop bingoTony Hirst

Inspiring content - You Don't Need Big Data to Tell Good Data Stories Tony Hirst

Lincoln jun14datajournalismTony Hirst

Lincoln Journalism Research Day - Data JournalismTony Hirst

Calrg14 tm351Tony Hirst

Mais de Tony Hirst (20)

15 in 20 research fiesta

Dev8d jupyter

Ili 16 robot

Jupyternotebooks ou.pptx

Virtual computing.pptx

ouseful-parlihacks

Gors appropriate

Robotlab jupyter

Notes on the Future - ILI2015 Workshop

Community Journalism Conf - hyperlocal data wire

Residential school 2015_robotics_interest

Data Mining - Separating Fact From Fiction - NetIKX

Week4

A Quick Tour of OpenRefine

Conversations with data

Data reuse OU workshop bingo

Inspiring content - You Don't Need Big Data to Tell Good Data Stories

Lincoln jun14datajournalism

Lincoln Journalism Research Day - Data Journalism

Calrg14 tm351

Último

B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptxpriyanshujha201

RSA Conference Exhibitor List 2024 - Exhibitors DataExhibitors Data

Dr. Admir Softic_ presentation_Green Club_ENG.pdfAdmir Softic

Pharma Works Profile of Karan Communicationskarancommunications

Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...amitlee9823

Forklift Operations: Safety through CartoonsForklift Trucks in Minnesota

👉Chandigarh Call Girls 👉9878799926👉Just Call👉Chandigarh Call Girl In Chandiga...rajveerescorts2022

Ensure the security of your HCL environment by applying the Zero Trust princi...Roland Driesen

Call Girls Pune Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora

Uneak White's Personal Brand Exploration Presentationuneakwhite

Call Girls In Panjim North Goa 9971646499 Genuine Serviceritikaroy0888

Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...amitlee9823

FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756dollysharma2066

Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Dipal Arora

It will be International Nurses' Day on 12 MayNZSG

0183760ssssssssssssssssssssssssssss00101011 (27).pdfRenandantas16

MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLSeo

Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best ServicesDipal Arora

unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE AbudhabiAbortion pills in Kuwait Cytotec pills in Kuwait

Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...lizamodels9

Scoda project companygraph

1. Visualising Companies Connected by Common Projects

2. https://finances.worldbank.org/Procurement/Nigeria-Major-Contracts-Energy-and-Mining-Sector/fay4-hpn4

3. How might we look at this data as a network?

4. What can we connect together?

5. Companies to Projects?

6. Companies to Contracts?

7. Download the data as CSV Open it as a new OpenRefine project

8. We want numbers… openrefine.org

9. value.replace('$','').toNumber()

10.

11. Source, Target, Weight

12. So what can we represent in visual terms?

13. Companies and Projects A Ltd B Ltd XY01 KT98 C Ltd PZ53 PZ02 D Ltd

14. gephi.org

15.

16. Companies Projects Use Layouts to Reveal Structure

17. Turn labels on Zoom label size Select what to use as labels

18. Edge-Thickness by Value A Ltd B Ltd XY01 KT98 C Ltd PZ53 PZ02 D Ltd

19. B Ltd Size by degree A Ltd XY01 KT98 C Ltd PZ53 PZ02 D Ltd

20. Set node size Network statistics Prevent labels overlapping Label size Proportional to node size

21. Companies on multiple projects

22. Finding things with common partners

23. Co-Companies (by project) A Ltd Edge weight = no. of shared projects A Lt d B Ltd C Ltd B Lt d XY 01 KT 98 C Lt d PZ 53 PZ 02 D Lt d D Ltd

24. (Remember to turn off any filters so we have the whole network in view…)

25. Join “blue”” company nodes connected by common red project nodes Select the edit tool, Click on a project node, Note the Node Color Multimode

26. Co-projects (by company) XY01 A Lt d KT98 PZ53 PZ02 B Lt d XY 01 KT 98 C Lt d PZ 53 Edge weight = no. of common companies PZ 02 D Lt d

27. Go back to the original, complete network in workspace 1 and duplicate it again This time find the projects that are connected by common companies

28. So what have you learned? - How to export selected columns from OpenRefine - How to import CSV data into Gephi - How to visualise a simple network in Gephi - How to map a bipartitie network to show relations between entities connected by a common element -…

29. SchoolOfData.org @SchoolOfData

Notas do Editor

This tutorial describes how to use network analysis tools to visually explore the links between companies working on the same contract.
The example dataset we will use comes from the World Bank.Each row represents a contract. Inspecting the column names tells us what data we have available about each contract.Looking at the data, we can see how we could order the companies based on the value of the total contract amount; or we might order the contracts by time; or we might look to see which contracts were awarded in a particular project, or to a particular company in the event of the same company being awarded more than one contract.
We might also wish to look for patterns in the data that show us how the things described in one row might connect to things described in other rows.For example, can we organise the data somehow to see which companies are associated with which projects? Could a network style visualisation help us do this?
But if we were to draw a network, what sort of thing should we connect to what? And how would would know what to connect to each other?One way is to look at the data… at which point we might notice that some of entries within a column take on the same value. This means that we can “connect” the data that appears in different rows using these common elements…
So what columns have usefully repeating elements? The projects column certainly has repeating elements, so if we should be able to draw diagrams that show all the companies that connect to each project. And if a company is associated with more than one project, it should in a certain sense be seen to join those projects together…
A few of the contract numbers repeat, so it might be interesting to explore the extent to which companies connect to contracts. If two different companies are associated with the same contracts, that might be interesting.
Let’s get some data so we can start to explore the network…
We just need to do a little bit of tidying of the data before we make use of it.The major problem is that the Total Contract Amount column does not contain numbers, as such… In particular, we need to get rid of the dollar sign. Let’s create a new column into which we can put the cleaned values.
This little bit of code says: take the value of each cell in the original column and replace the $ symbol with nothing (that is, an empty string). In other words, delete the dollar sign… Put this value in the corresponding cell of the new column, and make the cell a number type.
Now we can export the data using the Custom Tabular Exporter, which allows us to select just those columns we want to export. (This can be very handy when a table has a large number of columns that we are not interested in!)I have rearranged the cells in the Custom Tabular Exporter simply by clicking on them and dragging them around. We just want three columns for now: Project ID, Supplier, and our new Amount column.Now that you know how to export the data just a few columns at a time, once you are comfortable with the process of visualising the data, you should be able to take other slices through the data (such as companies related to contracts) and visualise them yourself.You might also like to try using a similar method on a data set of your own…
There’s a final bit of tidying to do before we can use this data in Gephi, the application we’ll be using to visualise the network.In particular, Gephi expects the data to be presented to it with particular column names.Open the exported CSV data in a text editor and rename the columns: Source,Target,Weight (no spaces?)Note – you could have also renamed the columns in OpenRefine before exporting them…
We might also wish to look for patterns in the data that show us how the things described in one row might connect to things described in other rows.For example, can we organise the data somehow to see which companies are associated with which projects? Could a network style visualisation help us do this?
Network diagrams allow us to show relationships between different things. Networks are referred to in mathematical terms as graph structures, or graphs. You may be more familiar with thinking of things like line charts and bar charts as graphs, but when it comes to network, we use the term graph to describe the mathematical structure that defines the network.The circles – or nodes – represent “things” in the network, in this case, particular companies or projects.The lines – or edges – represent relationships between the things in the network. In this example, the edges represent contracts that associate a particular company with one or more projects, (or conversely, associate a project with one or more companies).Where nodes are placed in the diagram can be used to convey information about the structure of the network. Many different algorithms exist to lay out (that is, place, or position) the nodes at specific points in the diagram. Typically, we try to place nodes that are heavily interconnected by edges close to each other. Nodes that are grouped closely together on the page might then be assumed to be associated in some way because of the increasing number of links that connect them to each other.Note that we may use colour to represent that a node is a member of a particular group. In this case, we use colour to depict whether or not a node represents a company or a project.
Launch Gephi and from the File menu select New Project. Click on the Data Laboratory tab, and then Import Spreadsheet.Load in the file (with amended column names) as an Edges Table. The default settings should be fine…
Click on the Overview tab – you should see the network that connects Companies to Project IDs displayed there…But what does it mean? And can we tidy it up a little?!
I used the Yifan Hu layout to generate this view over the network.Yifan Hu is a good all round layout engine that works particularly well when the data is hierarchically structured.Another good general purpose layout algorithm is ForeceAtlas2.
Whilst we might get a feeling for the structure and shape of the dataset as a whole from the overall visualisation, we often want to inspect one or more of the nodes in detail.The quickest way of doing this is to look at the labels…You may also have noticed that the edge thickness is thicker for some lines than others. In this case, the line thicknesses are proportional to the contract value, which we set in the weight column. If a company is associated with more than a single contract on a particular project, the edge weight well be proportional to the overall (total) sum of values of all the contracts relating that company to that project.
As well as using space (or position) and colour to represent structural elements of the network, we can also use edge weight (that is the thickness, or width) of the lines connecting nodes to each other to represent some feature of the network.In this case, we might use edge weight to represent the value of contract that connects a company with a project, or the number of contracts that a company has on a particular project.When placing nodes, we might also use edge weight to contribute to the determination of how closely two connected nodes should be placed to each other. If you think of the edge thickness in terms of the size, thickness or strength of a mechanical spring, you might perhaps start to imagine how nodes connected by thick springs will be pulled closer to each other than nodes connected by much weaker springs.
As well as edge thickness, we might also make use of node size to highlight some feature of the network.In this example, we use node size to represent the degree of each node, that is, the number of edges connected to it. Sometimes, we might want to highlight nodes that have small numbers of connections, for example to identify projects with very few companies contracted to them. In this case, we might make nodes with only a single incoming edge very large, and nodes with large number of edges much smaller.The node size thus represents how well connected a node is. In this case, the size of the project nodes indicates how many companies are associated with it, and the size of the company nodes depicts how many project contracts the company is engaged with.Note that we can combine edge weight and node size, for example, by setting node size proportional to the summed weights of edges that are connected to the node.Hopefully, you are already starting to see how a network diagram can provide a range of powerful visual representations for helping us explore the structure of network and identify key elements of it.
We can size the nodes according to statistical values calculated over the network.In this case, we might want to highlight nodes according to the total value of contracts flowing into them (for companies) or out of them (for projects). The weighted average statistic calculates the corresponding value for each node in the network.The spline operator in the Ranking tab – where we set the node size – allows us to tweak the relationship between the value used to size the node and the node size. The default is a simple linear proportional map. However, we may find that the range of values we want to map are “clumped” together (for example, one very large value and a range of smaller values clumped together at the other end of the overall range). In such a case, we might want to tweak the mapping to provide a little more salience when it comes to distinguishing between the values that are otherwise clumped together.As well as making node size proportional to some quantity, we can also set the label size to be proportional to the node size.
There are several other tools available to us that allow us to explore other properties of the network. For example, there is a wide selection of filters that allow us to select particular filtered views of the network.In this case, we use the degree range filter to show only nodes that have degree of two or more. This filters out nodes that have degree 1 – for example, companies that are only associated with a single project. The result is a view over the network that shows which companies are associated with two or more projects, and which projects they are. The node sizes are indicative of the total overall vale of contracts associated with each particular node.So for example, we see that Siemens AG is associated with contracts from projects P072018 and P090104. The large node size suggests that the sum total of contracts Siemens AG has received via this projects is quite significant. In addition, the line from P072018 to Siemens AG suggests that the total value of contracts (or maybe just a single contract) Siemens AG has received from that project is quite large.
So far, out network diagram has shown us how companies relate to projects, and conversely, how projects relate to companies.But sometimes we may want to know rather more directly the extent to which two things are connected by virtue of having a common partner – for example, which companies worked on the same projects together, or which projects are linked by virtue of having used the same companies.When the data is represented as a graph, we can manipulate the graph in order to generate derived graphs that can capture these sorts of relationship directly.
When we have a dataset represented in the form of a network, we can start to analyse it by looking at additionalnetwork properties.For example, for the projects and companies graph, we might process the graph so as to remove project nodes and replace the edges with edges that connect companies that were on one or more project with each other. We might even use edge weight to depict how many projects there were in common between two companies.
From the workspace menu, duplicate the original network (remember to turn off all the filters! We want the whole network.)You will automatically be moved to a new workspace containing a copy of the original network. (Navigate between workspaces from the workspace selector at the bottom right hand corner of the whole application window.)In the Multimode Networks Projection panel, click on Graph Coloring to try to split the network into complementary types of node (companies and projects). Hopefully, the tool will return with the report that Bipartitie:true. That is, two complementary sets of nodes have been found (nodes in the first group are only ever connected to nodes in the second group.)Click on Load attributes and select the Node Color Multimode option.
To check what the multimode tool has called nodes of each type, click on the edit button in the palette toolbar, and click on a project node. An edit panel will appear – make a note of what colour the project type node has been labeled.We can now use the multimode network projection tool to process the network by joining together company nodes that are connected by a common project, and deleting the project nodes.That is, we want to connect blue company nodes to blue company nodes if they are connected by edges that pass through a common red project node. One we have made the mapping, we can delete the inner red project nodes.Running the projection results in several distinct clusters of companies that are connected to each other by virtue of being associated with the same project, as well as some companies that bridge different clusters by virtueof being associated with companies from different projects.
Conversely, we might remove the company nodes, and identify a new set of edges that connect projects that shared one or more common contracted companies. Again, edge thickness might be use to show how tightly connected two projects were by virtue of increasing numbers of common contracted companies.
By projecting the original network onto the network that shows links between projects that arise from common companies, we get a much clearer picture about how many projects there are, as well as possible linkages between them.
Here are some of the things you have hopefully learned…feel free to add anything else you might have learned to the list…
For more information, and a wide range of further tutorials on all matters data related, visit the School Of Data at SchoolOfData.org, or on Twitter via @SchoolOfData.

Scoda project companygraph

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a Scoda project companygraph

Semelhante a Scoda project companygraph (20)

Mais de Tony Hirst

Mais de Tony Hirst (20)

Último

Último (20)

Scoda project companygraph

Notas do Editor