Cláudia Viviane Viegas, Roseli Búrigo, José Leomar Todesco, Fernando Alvaro Ostuni Gauthier, Paulo Maurício Selig
Federal University of Santa Catarina, Engineering and Knowledge Management Post-graduation Program, Florianópolis (SC), BRAZIL
Feevale University, Novo Hamburgo (RS), BRAZIL
Santa Catarina Extrem South University, Criciúma (SC), BRAZIL
DevEX - reference for building teams, processes, and platforms
Knowledge Discovery in Environmental Impact Report’s summary texts: an exploratory analysis of four case studies
1. Knowledge Discovery in Environmental Impact Report’s summary texts: an exploratory analysis of four case studies Cláudia Viviane Viegas 1,2 , Roseli Búrigo 1,3 , José Leomar Todesco 1 , Fernando Alvaro Ostuni Gauthier 1 , Paulo Maurício Selig 1 1 Federal University of Santa Catarina, UFSC , Engineering and Knowledge Management Post-graduation Program, Florianópolis (SC), BRAZIL; 2 Feevale University, Novo Hamburgo (RS), BRAZIL; 3 Santa Catarina Extrem South University, Unesc , Criciúma (SC), BRAZIL
2. What? This paper analyses four summary texts from Environmental Impact Reports (EIRs) prepared for hydroelectric facilities, built in Brazil between 1997 and 2005. Documents’ brochures are: Jirau’s (1), Ipueiras’(2), Paulistas’(3), and Barra Grande’s (4) dams. 1 2 3 4
3. How? Knowledge Discovery Texts techniques (KDT), namely stopwords and stemming, are employed. EIRs summarise Environmental Impact Assessment (EIAs) outcomes, which are mandatory by law in order to identify and measure effects from entrepreneurship with high levels of environmental change, and to formulate mitigation measures. A thesaurus is elaborated from the Reference Term (RT) - a document provided by governmental environmental institutions to guide EIAs-EIRs construction. A contextual approach is employed in order to cover the most number of words and expressions which bear similarities with each other. The thesaurus' words and expressions are classified into 22 groups according to the similarities of their meanings. Such words and expressions are compared to EIRs summaries' words and expressions, after these summaries have undergone data preprocessing.
4. Why? Brazilian EIRs are often criticised as incomplete and superficial, but this criticism suffers from a lack of objective support. Major findings Comparison of the results of the thesaurus versus words and expressions acquired from summaries allows us to conclude that the EIRs emphasise: placement issue; impacts; environmental alternatives; and mitigation/compensation procedures. Expressions such as technological resources; financial resources; social and economic context; economic alternatives; impact size; impact relevance; environmental effects; and harm prevention, listed in thesaurus, are not mentioned in the summaries.
5. EIAs-EIRs guidelines - problem’s approach In Brazil, EIAs-EIRs are required by law, which originates generic Reference Term (RT) as a guideline to this kind of study. Zilberman (1995) highlights five generic steps of an EIA-EIR, and we can consider the first three as more relevant to the thesauru’s building: - Step I : identification - Information about project site, technological and financial resources to control project environmental effects, socioeconomic context, objectives of land use and occupation policies, legislation, and size and alternatives for these impacts. - Step II : environmental diagnostic - Evaluation of each impact identified in the previous step. Physical, biological (or biotic), and socieoeconomic environments are evaluated. - Step III : impacts' prognosis - Environmental effects of business are identified and analysed, as well as technological and economic possibilities of prevention and control, mitigation and repair. An alternative is chosen as the basis of the EIA-EIR.
6. Theoretic framework - IR and KDT To better understand the content of EIRs summaries, Information Retrieval (IR) studies can be worthwhile. IR is "(...) an activity which involves aspects of information description (indexation, pattern building) and it encompasses specification for searching, including any technique, system or machine employed to do or support such tasks” (WIVES, 2002). IR is the process or method where a potential information user can change your information necessity in a real list of stored documents' citations which contain useful information to him (SARACEVIC, 1995). Indexation is the first step of IR. It refers to the selection of relevant words in document, and can be done through controlled vocabulary techniques. It has the aim to build access points to a document. It is possible through the use of key words and identification of expressions (WIVES, 2002).
7. Relationship between EIRs and KDT The creation of a thesaurus containing key words and expressions from stages of EIA-EIR, following Zilberman's (1995) guidelines, is a first step in establishing a relationship between EIRs and KDT. It is a necessary precursor to the further process of relevant information identification, called matching. It identifies similarities between relevant information to user query and information stored in the system. EIAs-EIRS major guidelines Semantic treatment thesaurus
8. KDT techniques - stopwords and stemming Semantic analysis was employed in order to deal with EIRs summaries, using techniques such as stopwords and stemming . Stopwords are irrelevant words, and include prepositions, conjunctions, pronouns and others with no meaning in a specific context. It includes "words with no relevant semantic content in their context and irrelevant words in the text analysis” (LOPES, 2004). Morphologic normalisation, called stemming, takes word's radical as being relevant, without taking in account desinences. “With this technique, user does not need to worry with the orthographic shape of a written word in a text. So, an idea, independent of being written as substantive, adjective or verb, is identified by the same (and single) radical” (WIVES, 2002).
10. Matching and weighting After the analysis of the texts' summaries, supported by tools such as stopwords and stemming, the matching technique is employed taking in account words and expressions of the texts' summaries and thesaurus' words and expressions. It means considering the relevance of each key word and expression, which is given by the relative frequency of indexed words - by the number of times they appear in comparison with the number of document's words. This is a weighting process. In order to understand the weights' meaning, a clustering technique is employed. Instead of investigation hypothesis, a proactive approach is used to acquire information, designing an exploratory research, which “(...) is useful to detect potential problems and opportunities (Loh et al., 2000)
11. Thesaurus’ building (I) Following Zilberman's guide to elaborate EIAs-EIRs (1995) as RT, we listed the following steps with respective set of key words and expressions: - Step I: A - placement, place(s), locational alternative(s), area, area(s) of influence area, influenced area(s) , affected area, region, region of influence, where; B - technologic resources, technology; C - financial resources; financing; D- socioeconomic context, socio economic aspect(s) socioeconomic(s), socioeconomy; E - soil using policy, soil use; F - legislation, law(s), resolution(s), legal aspects. - Step II: G - environmental diagnostic; H- environmental impact(s), environmental change(s); I- physical media; J- biological media, biotic media; K- physical-biotic media, physical and biotic media; L- socioeconomic media, socioeconomic aspect(s); M- impacts' dimension; N- impacts' relevance.
13. Semantic treatment results (I) Semantic classification results of EIRs’ summaries texts weighted and compared with thesaurus terms
14. Semantic treatment results (II) Matching and weighting analysis’ aspects according to each facility summary
15. Findings and discussion (I) More common words and expressions More common key words and expressions identified belong to the A, H, Q, and T groups. They represent all steps described by Zilberman (1995): A (I), H (II) e Q, and T (III). More important summary items are placement, impact, alternatives, plans, projects or environmental programs, and mitigation measures. Relevance Considering the total number of key words and expressions for each summary and matching them with the thesaurus' list, we find that the Barra Grande EIR has the best match, as it contains the highest relative proportion of key words and expressions (11,2%) compared with the thesaurus' words and expressions. EIRs summaries of Paulistas (11%), Ipueiras (9,7%) and Jirau (8,4%) facilities perform less well.
16. Findings and discussion (II) Number of words and expressions selected in each summary compared with thesaurus' sets of words and expressions In this analysis, we find that the Ipueiras' summary has the best representativeness: 11 words, or 50% of the whole thesaurus. Barra Grande (45,4%), Paulistas (27,2%), and Jirau (18,1%) all match fewer words in the thesaurus. So, we can conclude the Jirau's summary has the poorest overall match with the thesaurus in terms of both number and relevance of words.
17. Conclusions The most important items of summaries, compared to the thesaurus, are placement, impact, alternatives, plans, environmental projects or programs, and mitigation measures. Regarding the thesaurus’ words or expressions frequency in each summary, and sets of words and expressions – we listed 22 groups –, EIRs with more summaries’ fitness are Barra Grande and Ipueiras, and Jirau’s has the least fitness. This conclusion, even related to summaries with few words – between 119 and 354 –, indicates on which issues must be focused further studies related to EIRs texts’ semantic analysis. The analysed summaries are not concerned to bring up technological and economic issues, for example, or subjects as dimensioning and environmental impacts' relevance. We recommend the analysis of more documents in order to confirm or refute these results, which we consider as primary.
18. References (I) Campos, P.M.P. (org.). 1986. Usinas hidrelétricas de Santo Antônio e Jirau - RIMA . Furnas e Odebrecht, Rio de Janeiro (RJ), 82p. Castro, T.L.C. (org.). 1997. UHE Barra Grande - Relatório de Impacto ao Meio Ambiente . Sumário. Engevix, São Paulo (SP), 59p. Ferneda, E. 2003. Recuperação de Informação: Análise sobre a Contribuição da Ciência da Computação para a Ciência da Informação . Escola de Comunicação e Artes da Universidade de São Paulo/ USP (Tese de Doutorado). Jensen, P.D. (org.). 2005. Usina Hidrelétrica Ipueiras - Relatório de Impacto Ambiental - RIMA . Rede Ipueiras Empresas de Energia Elétrica e Themag Engenharia. São Paulo (SP), 97p.
19. References (II) Loh, S.; Wives, L.K.; Oliveira, J.P.M. 2000. Descoberta proativa de conhecimento em coleções textuais: iniciando sem hipóteses . In: IV Oficina de Inteligência Artificial, Pelotas (RS), p. 143-154. Available in <http://www.inf.ufrgs.br/~palazzo/OAI/00%20OIA.pdf.> Accessed in April 20 th 2006. Lopes, M.C.S. 2004. Mineração de Dados Textuais Utilizando Técnicas de Clustering para o Idioma Português . Universidade Federal do Rio de Janeiro/ UFRJ (Tese ), 191 p. Montano, C.F.B.; Pithan, R.O. (org.). 2005. Relatório de Impacto Ambiental - RIMA - AHE Paulistas , Rio São Marcos (GO/MG). Biodinâmica Engenharia. Rio de Janeiro (RJ), 54p. Moreira, R. 2002. Para que o EIA-RIMA Quase Vinte Anos Depois? In: Verdum, R. e Medeiros, R. M. (org.). RIMA - Relatório de Impacto Ambiental . Ed. UFRGS (4ª edição): Porto Alegre, p.11-21.
20. References (III) Rohde, G. M. 2002. Estudos de Impacto Ambiental: A Situação Brasileira em 2000. In: Verdum, R. e Medeiros, R. M. (org.). RIMA - Relatório de Impacto Ambiental . Ed. UFRGS (4ª edição): Porto Alegre, p. 41-65. Saracevic, T. 1995. Evaluation of Evaluation in Information Retrieval. In: Conference on Research and Development in Information Retrieval. 18th Annual International SIGIR, Seattle, USA. (Proceedings). ACM Press , p. 137-146. Wives, L.K. 2002. Tecnologias de Descoberta de Conhecimento em Textos Aplicadas à Inteligência Competitiva . Programa de Pós-graduação em Computação (Exame de Qualificação), 116 p. Porto Alegre (RS), Universidade Federal do Rio Grande do Sul (UFRGS). Zilberman, Isaac. 1995. Conceitos e Metodologias para Estudos de Impacto Ambiental . Ed. Ulbra: Canoas (RS).
21. Author’s contact Cláudia V. Viegas – claudiav@egc.ufsc.br Roseli Búrigo – rbc@unesc.net J. Leomar Todesco – tite@stela.ufsc.br Fernando O. Gauthier – gauthier@inf.ufsc.br Paulo M. Selig – selig@egc.ufsc.br Acknowledgement We thank to advices coming from Dr. Alan J. Bond , senior lecturer, Environmental Sciences School, University of East Anglia (UEA), Norwich, UK.