Our team will present our latest graph learning work, "VCR-Graphormer: A Mini-batch Graph Transformer via Virtual Connections," at ICLR 2024 in Vienna, Austria. The paper introduces a novel approach for mini-batch based graph transformers, enabling them to efficiently process billion-scale industry graph datasets. This advancement opens new possibilities for research into large graph foundation models and has potential applications in fields like social network analysis and graph-based recommendation systems. We eagerly anticipate engaging discussions at ICLR 2024. Additionally, we invite you to visit us at the AI at Meta booth to explore more exciting AI advancements from Meta. @Dongqi Fu, Zhigang Hua, Yan Xie, Jin Fang, Si Zhang, Kaan Sancak, Hao Wu, Andrey Malevich, Jingrui He
Bo Long’s Post
More Relevant Posts
-
AI Research Director @J.P. Morgan | Inventor (50+ patents) | Scientist (100+ papers - 30 H-index) | Engineer (15+ AI systems) | Speaker
Great step towards #trustworthy #LLMs, which are crucial for LLMs adoption at scale, specially when critical decisions are required to be made (#Finance domains…). Plenty of direct applications e.g., LLMs inputs attribution for #hallucinations detection, #FactChecker !! Even broader: think LLMs outputs now directly attributing LLMs inputs. Great work, Sanjay Kariyappa and team. LLMs inputs and outputs nicely connected with #XAI. #TrustworthyAI
Excited to announce that our paper, "Progressive Inference: Explaining Decoder-Only Sequence Classification Models Using Intermediate Predictions," has been accepted to #ICML2024! In this work, we introduce a novel explainable AI (XAI) framework called "Progressive Inference". Our approach leverages intermediate predictions from decoder-only Transformer models to generate high-quality SHAP-like input attributions for sequence classification tasks. Our method significantly outperforms prior XAI techniques and is a step towards improving the trustworthiness of large language models. For more details, check out our arXiv preprint: https://lnkd.in/gC3z3t2J Joint work with Freddy Lecue, Saumitra Mishra, PhD, Chris Pond, Daniele Magazzeni and Manuela Veloso. #AI #MachineLearning #XAI #ICML2024
Progressive Inference: Explaining Decoder-Only Sequence Classification Models Using Intermediate Predictions
arxiv.org
To view or add a comment, sign in
-
Excited to announce that our paper, "Progressive Inference: Explaining Decoder-Only Sequence Classification Models Using Intermediate Predictions," has been accepted to #ICML2024! In this work, we introduce a novel explainable AI (XAI) framework called "Progressive Inference". Our approach leverages intermediate predictions from decoder-only Transformer models to generate high-quality SHAP-like input attributions for sequence classification tasks. Our method significantly outperforms prior XAI techniques and is a step towards improving the trustworthiness of large language models. For more details, check out our arXiv preprint: https://lnkd.in/gC3z3t2J Joint work with Freddy Lecue, Saumitra Mishra, PhD, Chris Pond, Daniele Magazzeni and Manuela Veloso. #AI #MachineLearning #XAI #ICML2024
Progressive Inference: Explaining Decoder-Only Sequence Classification Models Using Intermediate Predictions
arxiv.org
To view or add a comment, sign in
-
Great library for tensor decomposition on large data in python!
Our tensor decomposition based machine learning toolbox, Tensor Extraction of Latent Features (T-ELF), has been released (https://lnkd.in/gQJNEdEE)! T-ELF is one of the machine learning software packages developed as part of the R&D 100 winning SmartTensors AI project at Los Alamos National Laboratory (LANL). T-ELF presents an array of customizable software solutions crafted for analysis of large datasets. Acting as a comprehensive toolbox, T-ELF includes tools for data pre-processing, extraction of latent features, and structuring results to facilitate informed decision-making. Leveraging high-performance computing and cutting-edge GPU architectures, our toolbox is optimized for analyzing large datasets from diverse set of problems. Central to T-ELF's core capabilities lie non-negative matrix and tensor factorization solutions for discovering multi-faceted hidden details in data, featuring automated model determination facilitating the estimation of latent factors or rank. This pivotal functionality ensures precise data modeling and the extraction of concealed patterns. Additionally, our software suite incorporates cutting-edge modules for both pre-processing and post-processing of data, tailored for diverse tasks including text mining, Natural Language Processing, and robust tools for matrix and tensor analysis and construction. T-ELF's adaptability spans across a multitude of disciplines, positioning it as a AI and data analytics solution. Its application extends across various fields such as Large-scale Text Mining, High Performance Computing, Computer Security, Applied Mathematics, Dynamic Networks and Ranking, Biology, Material Science, Medicine, Chemistry, Data Compression, Climate Studies, Relational Databases, Data Privacy, Economy, and Agriculture. T-ELF: https://lnkd.in/gQJNEdEE SmartTensors AI: https://lnkd.in/g_PDDVER #machinelearning #gpu #hpc #highperformancecomputing #dataanalysis #artificialintelligence
To view or add a comment, sign in
-
-
🚀Key Takeaways from "An Image is Worth 1/2 Tokens After Layer 2"🚀 The study reveals inefficiencies in attention computation over visual tokens in Large Vision-Language Models (LVLMs). FastV, a versatile plug-and-play method, optimizes computational efficiency and significantly reduces costs without sacrificing performance across image and video tasks. Fine-tune the trade-off between computational efficiency and performance with FastV. This customizable approach allows for superior performance while compressing models for deployment on edge devices and commercial use. Read more about this innovative approach to AI, machine learning, and computer vision in the full article here: https://lnkd.in/gJ-DQ46M #AI #MachineLearning #ComputerVision #Efficiency #Innovation
An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
arxiv.org
To view or add a comment, sign in
-
Over the past few years, I’ve worked with a variety of startups on advancing their computer vision capabilities. With most, their team made great progress. But with a few of them, I wish I’d been able to help them more. Some of the challenges they had included: - A limited amount of labeled data - Weakly labeled data (labels applied to large images without detailed annotations) - Multispectral or multiplex images that are unsuited to typical transfer learning approaches We ended up relying on handcrafted features and classical machine learning because deep learning models overfit no matter what we tried. With 20/20 hindsight and recent advances in self-supervised learning, I now see a potential solution: a 𝐝𝐨𝐦𝐚𝐢𝐧-𝐬𝐩𝐞𝐜𝐢𝐟𝐢𝐜 𝐟𝐨𝐮𝐧𝐝𝐚𝐭𝐢𝐨𝐧 𝐦𝐨𝐝𝐞𝐥. In many cases, the dataset may not be large enough to train a traditional foundation model with 100s of millions of parameters (or more). But a smaller model trained with a self-supervised objective (like reconstructing masked out regions) could work great. Such a model could form the foundation from which many other task-specific models are fine tuned – which also makes it quicker to experiment with new tasks. In January, I'll be launching a new assessment for computer vision teams to help get a clear perspective on whether they can benefit from a foundation model. Though this assessment, I will answer the following questions: - Would you benefit from a foundation model trained on your proprietary images? - Do you have the capability to train one yourselves? - What are some of the factors you’d need to consider when training one? 𝐋𝐞𝐚𝐫𝐧 𝐦𝐨𝐫𝐞 𝐚𝐧𝐝 𝐚𝐩𝐩𝐥𝐲 𝐧𝐨𝐰 𝐭𝐨 𝐛𝐞 𝐟𝐢𝐫𝐬𝐭 𝐢𝐧 𝐥𝐢𝐧𝐞 𝐟𝐨𝐫 𝐉𝐚𝐧𝐮𝐚𝐫𝐲: https://lnkd.in/g3MkE4Y4 #Pathology #RemoteSensing #EarthObservation #MedicalImaging #MachineLearning #DeepLearning #ComputerVision
Custom Vision Model Assessment
pixelscientia.com
To view or add a comment, sign in
-
Enhanced Stochastic Optimization Algorithms with Power for Large-Scale Machine Learning Improved Powered Stochastic Optimization Algorithms for Large-Scale Machine Learning Zhuang Yang; 24- 241 -:1−29, 2023. Abstract Stochastic optimization, particularly stochastic gradient descent - SGD -, has become the most commonly used method for solving machine learning problems. In order to enhance the performance of the traditional SGD algorithm, which has a slow convergence rate and poor generalization, several strategies have been developed, such as control variates, adaptive learning rate, and momentum technique. Most of these strategies focus on controlling the updating direction - e.g., gradient descent or gradient ascent - or manipulating the learning rate. In this study, we propose a novel type of improved powered stochastic gradient descent algorithms that use the Powerball function to determine the updating direction. We also address the issue of the learning rate in powered stochastic optimization - PSO - by introducing an adaptive mechanism based on the Barzilai-Borwein - BB - like scheme, not only for the proposed algorithm but also for classical PSO algorithms. The theoretical properties of these algorithms for non-convex optimization problems are analyzed. Empirical tests using various benchmark datasets demonstrate the efficiency and robustness of our proposed algorithms. https://lnkd.in/dHrsPDMt
Enhanced Stochastic Optimization Algorithms with Power for Large-Scale Machine Learning
https://instadatahelpainews.com
To view or add a comment, sign in
-
🚀 What is Support Vector Classification (SVC): Unveiling the Power of Support Vectors for Classification 📊 🔍 Support Vector Classification (SVC) is a robust and versatile machine learning algorithm used for classification tasks. It's based on the concept of Support Vector Machines (SVM) and is widely employed in various fields such as finance, healthcare, marketing, and more. 🎯 Objective: The primary goal of SVC is to classify data points into different classes by finding the optimal hyperplane that maximizes the margin between classes while minimizing misclassifications. 📈 Margin Maximization: SVC focuses on maximizing the margin, which is the distance between the hyperplane and the nearest data points (support vectors) of each class. This leads to better generalization and robustness against noise. 📊 Kernel Trick: SVC can efficiently handle non-linear decision boundaries by using kernel functions, such as linear, polynomial, Gaussian (RBF), or sigmoid kernels. This allows SVC to learn complex patterns and achieve high accuracy. 🔑 Key Features: SVC is effective in high-dimensional spaces, robust to overfitting, and capable of handling both linearly separable and non-linearly separable data. 📚 Training: SVC learns the optimal hyperplane by maximizing the margin and minimizing the classification error. Various optimization algorithms, such as gradient descent or quadratic programming, can be used for training. 🔍 Evaluation: Performance metrics such as accuracy, precision, recall, F1-score, and area under the ROC curve (AUC-ROC) are commonly used to evaluate SVC models. These metrics provide insights into the model's ability to correctly classify instances. 💡 Applications: SVC finds applications in diverse domains, including image classification, text categorization, sentiment analysis, medical diagnosis, and spam detection. 🚀 Conclusion: Support Vector Classification (SVC) is a powerful algorithm for classification tasks, offering robustness, flexibility, and high accuracy. Understanding its principles and applications can empower data scientists and analysts to tackle complex real-world classification problems effectively. #machinelearning #deeplearning #ai #math #mathematic #statistics #artificalintelligence #classification #regression
To view or add a comment, sign in
-
-
🌟 𝐃𝐞𝐞𝐩 𝐃𝐢𝐯𝐞 𝐢𝐧𝐭𝐨 𝐂𝐨𝐧𝐯𝐨𝐥𝐮𝐭𝐢𝐨𝐧𝐚𝐥 𝐍𝐞𝐮𝐫𝐚𝐥 𝐍𝐞𝐭𝐰𝐨𝐫𝐤𝐬 (𝐂𝐍𝐍𝐬) 🚀 Building on our previous understanding of CNN basics, let's explore the inner workings. Often referred to as covnets, these networks utilize convolutional layers as their core building block. 𝐂𝐨𝐯𝐧𝐞𝐭𝐬 is a sequence of layers, and every layer transforms one volume to another through a differentiable function. In CNNs, the process involves input data, a filter, and a feature map. Assuming a color image, the input has three dimensions (height, width, and depth for RGB). A feature detector (kernel/filter) moves across the image's receptive fields, conducting a convolution process. The detector, typically a 3x3 matrix, applies a dot product to input pixels, creating a feature map. This process, repeated as the filter shifts with a stride, produces the final output known as a feature map or activation map. Weights remain constant within feature detector as it process across the image, also known as parameter sharing. Some parameters, like the weight values, adjust during training through the process of backpropagation and gradient descent. However, there are three hyperparameters which affect the volume size of the output that need to be set before the training of the neural network begins. These include: > 𝐍𝐮𝐦𝐛𝐞𝐫 𝐨𝐟 𝐅𝐢𝐥𝐭𝐞𝐫𝐬: Affects the depth of the output, creating distinct feature maps. > 𝐒𝐭𝐫𝐢𝐝𝐞: Determines the distance the kernel moves over the input matrix, influencing output size. > 𝐙𝐞𝐫𝐨-𝐩𝐚𝐝𝐝𝐢𝐧𝐠: It is used when filters do not fit the input image. Three types of padding are: 1) 𝐕𝐚𝐥𝐢𝐝 𝐏𝐚𝐝𝐝𝐢𝐧𝐠: No padding, dropping the last convolution if dimensions don't align. 2) 𝐒𝐚𝐦𝐞 𝐏𝐚𝐝𝐝𝐢𝐧𝐠: Ensures the output layer size matches the input layer. 3) 𝐅𝐮𝐥𝐥 𝐏𝐚𝐝𝐝𝐢𝐧𝐠: Increases output size by adding zeros to the boundary of the input. Each plays a crucial role in defining the depth and size of the output. Hope this helps for learning CNN architecture. Let's defer further topics within CNN for next posts. 💡 #neuralnetworks #convolutionalneuralnetworks
To view or add a comment, sign in
-
Matrix Extension for Pathological Radar Clutter Machine Learning Abstract This paper deals with radar clutter statistical learning based on spatial Doppler fluctuation. In articles [1]–[4], data is clustered cell by cell. In this article, we generalize the previous model to extract information not only from each cell independently, but also from the cells spatial correlation. We first introduce the radar data, then the model and efficient tools to estimate the model parameters. The model parameters will be shown to be Hermitian Positive Definite BlockToeplitz matrices. Next we endow the manifold of Hermitian Positive Definite Block-Toeplitz matrices with a Riemannian metric coming from information geometry. Finally, we adapt a supervised classification algorithm (the k-Nearest Neighbors) and an unsupervised classification algorithm (the Agglomerative Hierarchical Clustering) to this Riemannian manifold. Index Terms—Radar clutter, multidimensional signals, spatiotemporal correlation, machine learning, Information geometry, Riemannian manifold, Block-Toeplitz matrices, Siegel disk.
To view or add a comment, sign in
-
Co-Founder, Chief AI & Analytics Advisor @ InstaDataHelp | Innovator and Patent-Holder in Gen AI and LLM | Data Science Thought Leader and Blogger | FRSS(UK) FSASS FRIOASD | 16+ Years of Excellence
Enhanced Stochastic Optimization Algorithms with Power for Large-Scale Machine Learning Improved Powered Stochastic Optimization Algorithms for Large-Scale Machine Learning Zhuang Yang; 24- 241 -:1−29, 2023. Abstract Stochastic optimization, particularly stochastic gradient descent - SGD -, has become the most commonly used method for solving machine learning problems. In order to enhance the performance of the traditional SGD algorithm, which has a slow convergence rate and poor generalization, several strategies have been developed, such as control variates, adaptive learning rate, and momentum technique. Most of these strategies focus on controlling the updating direction - e.g., gradient descent or gradient ascent - or manipulating the learning rate. In this study, we propose a novel type of improved powered stochastic gradient descent algorithms that use the Powerball function to determine the updating direction. We also address the issue of the learning rate in powered stochastic optimization - PSO - by introducing an adaptive mechanism based on the Barzilai-Borwein - BB - like scheme, not only for the proposed algorithm but also for classical PSO algorithms. The theoretical properties of these algorithms for non-convex optimization problems are analyzed. Empirical tests using various benchmark datasets demonstrate the efficiency and robustness of our proposed algorithms. https://lnkd.in/dEZM5WFt
Enhanced Stochastic Optimization Algorithms with Power for Large-Scale Machine Learning
https://instadatahelpainews.com
To view or add a comment, sign in